2

I would be grateful if someone could help me with a pandas dataframe problem I am having.

I am trying to group a pandas dataframe by columns but am unsure how to proceed. I have a dataframe with duplicate column names (A's and B's) and would like to group these to return the maximum value of A's and B's

Duplicate Column Dataframe

index      | A | A | A | B | B |
--------------------------------
2015-01-01 |   | 1 | 7 | 1 |   |
--------------------------------
2015-01-02 | 3 |   |   |   | 5 |

Dataframe after processing

index      | A | B |
--------------------
2015-01-01 | 7 | 1 |
---------------------
2015-01-02 | 3 | 5 |

unique_cols = [A,B]   
df.groupby(by = cols, axis = 1).max()

This does not work as I get an error message about the Grouper not 1-dimensional. I have also tried transposing the dataframe and grouping by the rows. However I get an IndexError message (index 0 is out of bounds for axis 0 with size 0)

Question:

How do you group a dataframe with duplicate columns to return the maximum from them from the group?

1 Answer 1

4

I think you need first filter columns by subset and then groupby by all columns values by level=0 with axis=1:

cols = ['A','B']   
df = df[cols].groupby(level = 0, axis = 1).max()
print (df)
              A    B
index               
2015-01-01  7.0  1.0
2015-01-02  3.0  5.0

Last if necessary cast to int:

df = df[cols].groupby(level = 0, axis = 1).max().astype(int)
print (df)
            A  B
index           
2015-01-01  7  1
2015-01-02  3  5
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, I spent a long time trying to figure that out and you made it look very easy!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.