0

Updated:

I have a huge dataframe, providing small version of it.

header = [np.array([' ',' ',' ','X','X','Y','Y']),
         np.array(['A','B','C','D','E','F','G'])]
df = pd.DataFrame(columns=header)
df[' ','A'] = ['n','n','m','m','m','p']
df[' ','B'] = ['q','r','s','t','u','v']
df[' ','C'] = [5,6,7,8,9,4]
df['X','D'] = ['1.5','2.9','3.6','2.5','7.1','0.4']
df['X','E'] = ['0.7%','3.9%','3.2%','1.5%','4.1%','2.4%']
df['Y','F'] = ['ab','bc','cd','de','ef','gh']
df['Y','G'] = ['5.5','2.6','8.6','4.5','0.1','3.4']

df =df.style.hide_index()

In real, 'B' is getting dynamically generated from another dataframe and depending on value of 'B', 'A' is being populated manually.

I want to group my dataframe on column 'A' and sort the dataframe on column 'A' too

I tried this code:

def func(x):
   return x.sort_values([('','A')],ascending=False)

dfResult = df.groupby([('','A')])
dfResult1 = dfResult.apply(func)
dfResult1

     |   |   |   |   |  X        |    Y 
(,A) |   | A | B | C |  D |  E   |    F | G
-----|---|---|---|---|----|------|------|----
  n  | 0 | n | q | 5 |1.5 | 0.7% |   ab | 5.5
     | 1 | n | r | 6 |2.9 | 3.9% |   bc | 2.6
--- -|---|---|---|---|----|------|------|----
  m  | 2 | m | s | 7 |3.6 | 3.2% |   cd | 8.6
     | 3 | m | t | 8 |2.5 | 1.5% |   de | 4.5
     | 4 | m | u | 9 |7.1 | 4.1% |   ef | 0.1
-----|---|---|---|---|----|------|------|---- 
  p  | 5 | p | v | 4 |0.4 | 2.4% |   gh | 3.4

Expected output:

dfExpected = pd.DataFrame({(' ','C'): [5,6,7,8,9,4],
                    ('X', 'D'): ['1.5','2.9','3.6','2.5','7.1','0.4'],
                    ('X', 'E'): ['0.7%','3.9%','3.2%','1.5%','4.1%','2.4%'],
                    ('Y', 'F'): ['ab','bc','cd','de','ef','gh'],
                   ('Y', 'G'): ['5.5','2.6','8.6','4.5','0.1','3.4']},
              index=pd.MultiIndex.from_arrays([['n','n','m','m','m','p'],
                                               ['q','r','s','t','u','v']], 
                                              names=['A', 'B']))

printing dfResult1 does not gives me the desired dataframe.

Also when I am applying styles on dfResult1, the grouping doesn't not exists anymore, it's taking the form of original dataframe after applying styles. I need to apply styles on my dataframe for the dashboard.

can anyone pls help?

4
  • Please do not post images of your data. You can include code that creates a dataframe or the output of print(df) (or of a few rows and columns that allow to reproduce the example) Commented Jul 7, 2021 at 13:19
  • Assuming (a,b) is a multiindex, there is no difference between your two dataframes in python. Unless you want to replace the values with empty strings? If this is not what you want, please provide your input/output data as text. Commented Jul 7, 2021 at 13:27
  • actually i have multi-indexed columns which i was not able to form in table here, so provided image Commented Jul 7, 2021 at 13:27
  • @mozway I have updated my question Commented Jul 7, 2021 at 14:18

1 Answer 1

0

applicable answer

You can reset_index and rename_axis:

df.set_index([(' ', 'A'),(' ', 'B')]).rename_axis(['A', 'B'])

output:

          X         Y     
     C    D     E   F    G
A B                       
n q  5  1.5  0.7%  ab  5.5
  r  6  2.9  3.9%  bc  2.6
m s  7  3.6  3.2%  cd  8.6
  t  8  2.5  1.5%  de  4.5
  u  9  7.1  4.1%  ef  0.1
p v  4  0.4  2.4%  gh  3.4

previous answer

Your two dataframes are identical.

Here is how to create your dataframe:

pd.DataFrame({('x', 'c'): ['ab', 'bc', 'cd', 'ef', 'gh', 'ij', 'kl'],
              ('y', 'd'): [1.5, 2.3, 2.4, 1.2, 3.1, 5.2, 7.3]},
              index=pd.MultiIndex.from_arrays([[0,0,1,1,1,2,2],
                                               [5,6,7,8,9,3,4]], 
                                              names=['a', 'b'])
            )

output:

      x    y
      c    d
a b         
0 5  ab  1.5
  6  bc  2.3
1 7  cd  2.4
  8  ef  1.2
  9  gh  3.1
2 3  ij  5.2
  4  kl  7.3
Sign up to request clarification or add additional context in comments.

6 Comments

sorry for the confusion, have corrected my question
Can you provide the input and output dataframes as pd.DataFrame(…) commands? This way the dataframe identities will be non ambiguous. See my answer on how to make the dataframes.
yes the dataframe is same, the way you have generated through command pd.Datframe() except the index 'b' is getting dynamically generated from another dataframe and depending on value of 'b', 'a' is being populated
As I said, can you use my way of generating the dataframe to produce your input and output dataframe and amend your question with this code? Then I (and others) can have a look to try solving your problem. See this post on how to make a good pandas question, especially when working with multiindexes. Right now, your dataframes are ambiguous and it is impossible to know what you want to do.
Yes, perfect, I provided an answer. In case you want to keep A/B columns in addition to the index you can use the drop=False option in set_index
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.