1

I'd like to print unique values in each column of a grouped dataframe and the following code snippet doesn't work as expected:

df = pd.DataFrame({'a' : [1, 2, 1, 2], 'b' : [5, 5, 5, 5], 'c' : [11, 12, 13, 14]})
print(
  df.groupby(['a']).apply(
    lambda df: df.apply(
      lambda col: col.unique(), axis=0))
)

I'd expect it to print

1 [5] [11, 13]
2 [5] [12, 14]

While there are other ways of doing so, I'd like to understand what's wrong with this approach. Any ideas?

1 Answer 1

2

This should do the trick:

print(df.groupby(['a', 'b'])['c'].unique())

a | b |
--+---+---------
1 | 5 | [11, 13]
2 | 5 | [12, 14]

As to what's wrong with your approach - when you groupby on df and then apply some function f, the input for f will be a DataFrame with all of df's columns, unless otherwise specified (as is in my code snippet with ['c']). So your first apply is passing a DataFrame with 3 columns, and so is your second apply. Then your function also_print iterates over each of those 3 columns and prints them out, so you get 3 prints for every group.

Sign up to request clarification or add additional context in comments.

4 Comments

this doesn't do what I want unfortunately, I'd like to group by 'a' only, and then in each of the groups get unique values in every column, like in the expected output I gave above.
also, forget about the also_print function, i removed it from the question since its not related. the question is about the final dataframe.
how about this? df.groupby('a').apply(lambda df: pd.Series([df[col].unique() for col in df.columns[1:]], index=df.columns[1:]))
thanks, but as I said I'd like to understand what's wrong with my approach. what you suggest is a little more verbose.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.