21

I have a pandas DataFrame df:

+------+---------+  
| team | user    |  
+------+---------+  
| A    | elmer   |  
| A    | daffy   |  
| A    | bugs    |  
| B    | dawg    |  
| A    | foghorn |  
| B    | speedy  |  
| A    | goofy   |  
| A    | marvin  |  
| B    | pepe    |  
| C    | petunia |  
| C    | porky   |  
+------+---------  

I want to find or write a function to return a DataFrame that I would return in MySQL using the following:

SELECT
  team,
  GROUP_CONCAT(user)
FROM
  df
GROUP BY
  team

for the following result:

+------+---------------------------------------+  
| team | group_concat(user)                    |  
+------+---------------------------------------+  
| A    | elmer,daffy,bugs,foghorn,goofy,marvin |  
| B    | dawg,speedy,pepe                      |  
| C    | petunia,porky                         |  
+------+---------------------------------------+  

I can think of nasty ways to do this by iterating over rows and adding to a dictionary, but there's got to be a better way.

2 Answers 2

42

Do the following:

df.groupby('team').apply(lambda x: ','.join(x.user))

to get a Series of strings or

df.groupby('team').apply(lambda x: list(x.user))

to get a Series of lists of strings.

Here's what the results look like:

In [33]: df.groupby('team').apply(lambda x: ', '.join(x.user))
Out[33]:
team
a       elmer, daffy, bugs, foghorn, goofy, marvin
b                               dawg, speedy, pepe
c                                   petunia, porky
dtype: object

In [34]: df.groupby('team').apply(lambda x: list(x.user))
Out[34]:
team
a       [elmer, daffy, bugs, foghorn, goofy, marvin]
b                               [dawg, speedy, pepe]
c                                   [petunia, porky]
dtype: object

Note that in general any further operations on these types of Series will be slow and are generally discouraged. If there's another way to aggregate without putting a list inside of a Series you should consider using that approach instead.

Sign up to request clarification or add additional context in comments.

6 Comments

If I need this type of thing to be loaded back in to a dataframe (and, ideally, be able to specify the column name of the grouped column) how would I do that?
You could try df.groupby('team').apply(lambda x: list(x.user)).to_pickle('pickle.pkl').
could yo add the new column?
What if we have multiple columns like user?
No. It's 10 years later.
|
15

A more general solution if you want to use agg:

df.groupby('team').agg({'user' : lambda x: ', '.join(x)})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.