how to convert groupby object which has no aggregate function applied on it, to a new dataframe

Question

Updated:

I have a huge dataframe, providing small version of it.

header = [np.array([' ',' ',' ','X','X','Y','Y']),
         np.array(['A','B','C','D','E','F','G'])]
df = pd.DataFrame(columns=header)
df[' ','A'] = ['n','n','m','m','m','p']
df[' ','B'] = ['q','r','s','t','u','v']
df[' ','C'] = [5,6,7,8,9,4]
df['X','D'] = ['1.5','2.9','3.6','2.5','7.1','0.4']
df['X','E'] = ['0.7%','3.9%','3.2%','1.5%','4.1%','2.4%']
df['Y','F'] = ['ab','bc','cd','de','ef','gh']
df['Y','G'] = ['5.5','2.6','8.6','4.5','0.1','3.4']

df =df.style.hide_index()

In real, 'B' is getting dynamically generated from another dataframe and depending on value of 'B', 'A' is being populated manually.

I want to group my dataframe on column 'A' and sort the dataframe on column 'A' too

I tried this code:

def func(x):
   return x.sort_values([('','A')],ascending=False)

dfResult = df.groupby([('','A')])
dfResult1 = dfResult.apply(func)
dfResult1

     |   |   |   |   |  X        |    Y 
(,A) |   | A | B | C |  D |  E   |    F | G
-----|---|---|---|---|----|------|------|----
  n  | 0 | n | q | 5 |1.5 | 0.7% |   ab | 5.5
     | 1 | n | r | 6 |2.9 | 3.9% |   bc | 2.6
--- -|---|---|---|---|----|------|------|----
  m  | 2 | m | s | 7 |3.6 | 3.2% |   cd | 8.6
     | 3 | m | t | 8 |2.5 | 1.5% |   de | 4.5
     | 4 | m | u | 9 |7.1 | 4.1% |   ef | 0.1
-----|---|---|---|---|----|------|------|---- 
  p  | 5 | p | v | 4 |0.4 | 2.4% |   gh | 3.4

Expected output:

dfExpected = pd.DataFrame({(' ','C'): [5,6,7,8,9,4],
                    ('X', 'D'): ['1.5','2.9','3.6','2.5','7.1','0.4'],
                    ('X', 'E'): ['0.7%','3.9%','3.2%','1.5%','4.1%','2.4%'],
                    ('Y', 'F'): ['ab','bc','cd','de','ef','gh'],
                   ('Y', 'G'): ['5.5','2.6','8.6','4.5','0.1','3.4']},
              index=pd.MultiIndex.from_arrays([['n','n','m','m','m','p'],
                                               ['q','r','s','t','u','v']], 
                                              names=['A', 'B']))

printing dfResult1 does not gives me the desired dataframe.

Also when I am applying styles on dfResult1, the grouping doesn't not exists anymore, it's taking the form of original dataframe after applying styles. I need to apply styles on my dataframe for the dashboard.

can anyone pls help?

Please do not post images of your data. You can include code that creates a dataframe or the output of print(df) (or of a few rows and columns that allow to reproduce the example) — Cimbali
– Cimbali, Commented Jul 7, 2021 at 13:19
Assuming (a,b) is a multiindex, there is no difference between your two dataframes in python. Unless you want to replace the values with empty strings? If this is not what you want, please provide your input/output data as text. — mozway
– mozway, Commented Jul 7, 2021 at 13:27
actually i have multi-indexed columns which i was not able to form in table here, so provided image — Maleficent
– Maleficent, Commented Jul 7, 2021 at 13:27

mozway · Accepted Answer · 2021-07-09 15:25:50Z

0

applicable answer

You can reset_index and rename_axis:

df.set_index([(' ', 'A'),(' ', 'B')]).rename_axis(['A', 'B'])

output:

          X         Y     
     C    D     E   F    G
A B                       
n q  5  1.5  0.7%  ab  5.5
  r  6  2.9  3.9%  bc  2.6
m s  7  3.6  3.2%  cd  8.6
  t  8  2.5  1.5%  de  4.5
  u  9  7.1  4.1%  ef  0.1
p v  4  0.4  2.4%  gh  3.4

previous answer

Your two dataframes are identical.

Here is how to create your dataframe:

pd.DataFrame({('x', 'c'): ['ab', 'bc', 'cd', 'ef', 'gh', 'ij', 'kl'],
              ('y', 'd'): [1.5, 2.3, 2.4, 1.2, 3.1, 5.2, 7.3]},
              index=pd.MultiIndex.from_arrays([[0,0,1,1,1,2,2],
                                               [5,6,7,8,9,3,4]], 
                                              names=['a', 'b'])
            )

output:

      x    y
      c    d
a b         
0 5  ab  1.5
  6  bc  2.3
1 7  cd  2.4
  8  ef  1.2
  9  gh  3.1
2 3  ij  5.2
  4  kl  7.3

edited Jul 9, 2021 at 15:25

answered Jul 7, 2021 at 13:34

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Maleficent Over a year ago

sorry for the confusion, have corrected my question

mozway Over a year ago

Can you provide the input and output dataframes as pd.DataFrame(…) commands? This way the dataframe identities will be non ambiguous. See my answer on how to make the dataframes.

Maleficent Over a year ago

yes the dataframe is same, the way you have generated through command pd.Datframe() except the index 'b' is getting dynamically generated from another dataframe and depending on value of 'b', 'a' is being populated

mozway Over a year ago

As I said, can you use my way of generating the dataframe to produce your input and output dataframe and amend your question with this code? Then I (and others) can have a look to try solving your problem. See this post on how to make a good pandas question, especially when working with multiindexes. Right now, your dataframes are ambiguous and it is impossible to know what you want to do.

mozway Over a year ago

Yes, perfect, I provided an answer. In case you want to keep A/B columns in addition to the index you can use the drop=False option in set_index

|

Collectives™ on Stack Overflow

how to convert groupby object which has no aggregate function applied on it, to a new dataframe

1 Answer 1

applicable answer

previous answer

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

applicable answer

previous answer

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related