8

This is my dataframe:

> df
       a             b
    0  1         set([2, 3])
    1  2         set([2, 3])
    2  3      set([4, 5, 6])
    3  1  set([1, 34, 3, 2])

Now when I groupby, I want to update sets. If it was a list there was no problem. But the output of my command is:

> df.groupby('a').sum()

a         b                
1             NaN
2     set([2, 3])
3  set([4, 5, 6])  

What should I do in groupby to update sets? The output I'm looking for is as below:

a         b                
1     set([2, 3, 1, 34])
2     set([2, 3])
3     set([4, 5, 6])  

1 Answer 1

17

This might be close to what you want

df.groupby('a').apply(lambda x: set.union(*x.b))

In this case it takes the union of the sets.

If you need to keep the column names you could use:

df.groupby('a').agg({'b':lambda x: set.union(*x)}).reset_index('a')

Result:

    a   b
0   1   set([1, 2, 3, 34])
1   2   set([2, 3])
2   3   set([4, 5, 6])
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, It solved set problem, but column name renamed to 0. Why that happened?
It's because the result is a Series so no column name. I've added a method for keeping the column name if you need it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.