Pandas: How To Convert From 'Frequency Table' To Flat Dataframe Format?

How do I convert a Pandas dataframe from a 'frequency table' format to a flat dataframe format and back again using idiomatic Python?

From:

        H     E     K
0       B     B    12
1       B     G     3
2       G     B    17
3       G     G    68

to:

        H     E
0       B     B
1       B     B
2       B     B
3       B     B
4       B     B
5       B     B
6       B     B
7       B     B
8       B     B
9       B     B
10      B     B
11      B     B
12      B     G
13      B     G
14      B     G
...

and back again!

        H     E     K
0       B     B    12
1       B     G     3
2       G     B    17
3       G     G    68

Please advise?

asked Feb 25, 2022 at 20:31

matekus

7883 silver badges15 bronze badges

Scale up new_df = df.loc[df.index.repeat(df['K'])].reset_index(drop=True) like this answer

Henry Ecker
– Henry Ecker ♦

2022-02-25 20:39:39 +00:00
Commented Feb 25, 2022 at 20:39
Scale back down df = new_df.groupby(['H', 'E']).size().reset_index(name='K') like this answer.

Henry Ecker
– Henry Ecker ♦

2022-02-25 20:39:53 +00:00
Commented Feb 25, 2022 at 20:39
@henry-dcker, Thanks for the benefit of your expertise. Can I drop the 'K' column as part of the conversion?

matekus
– matekus

2022-02-25 20:46:56 +00:00
Commented Feb 25, 2022 at 20:46
Yeah. Just drop the column new_df = df.loc[df.index.repeat(df['K'])].drop(columns='K').reset_index(drop=True)

Henry Ecker
– Henry Ecker ♦

2022-02-25 20:49:54 +00:00
Commented Feb 25, 2022 at 20:49
@henry-ecker, When I dump 'new_df' to a csv file, there are 11 'B-B' rows, 2 'B-G' rows and so on instead of 12, 3, 17, and 68 respectively?

matekus
– matekus

2022-02-25 20:59:40 +00:00
Commented Feb 25, 2022 at 20:59

| Show 3 more comments

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Pandas: How To Convert From 'Frequency Table' To Flat Dataframe Format? [duplicate]

0

Linked

Hot Network Questions