Replicating GROUP_CONCAT for pandas.DataFrame

Question

I have a pandas DataFrame df:

+------+---------+  
| team | user    |  
+------+---------+  
| A    | elmer   |  
| A    | daffy   |  
| A    | bugs    |  
| B    | dawg    |  
| A    | foghorn |  
| B    | speedy  |  
| A    | goofy   |  
| A    | marvin  |  
| B    | pepe    |  
| C    | petunia |  
| C    | porky   |  
+------+---------

I want to find or write a function to return a DataFrame that I would return in MySQL using the following:

SELECT
  team,
  GROUP_CONCAT(user)
FROM
  df
GROUP BY
  team

for the following result:

+------+---------------------------------------+  
| team | group_concat(user)                    |  
+------+---------------------------------------+  
| A    | elmer,daffy,bugs,foghorn,goofy,marvin |  
| B    | dawg,speedy,pepe                      |  
| C    | petunia,porky                         |  
+------+---------------------------------------+

I can think of nasty ways to do this by iterating over rows and adding to a dictionary, but there's got to be a better way.

Phillip Cloud · Accepted Answer · 2013-08-09 01:21:26Z

42

Do the following:

df.groupby('team').apply(lambda x: ','.join(x.user))

to get a Series of strings or

df.groupby('team').apply(lambda x: list(x.user))

to get a Series of lists of strings.

Here's what the results look like:

In [33]: df.groupby('team').apply(lambda x: ', '.join(x.user))
Out[33]:
team
a       elmer, daffy, bugs, foghorn, goofy, marvin
b                               dawg, speedy, pepe
c                                   petunia, porky
dtype: object

In [34]: df.groupby('team').apply(lambda x: list(x.user))
Out[34]:
team
a       [elmer, daffy, bugs, foghorn, goofy, marvin]
b                               [dawg, speedy, pepe]
c                                   [petunia, porky]
dtype: object

Note that in general any further operations on these types of Series will be slow and are generally discouraged. If there's another way to aggregate without putting a list inside of a Series you should consider using that approach instead.

edited Aug 9, 2013 at 1:21

answered Aug 9, 2013 at 1:16

Phillip Cloud

25.8k12 gold badges72 silver badges91 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Chad Over a year ago

If I need this type of thing to be loaded back in to a dataframe (and, ideally, be able to specify the column name of the grouped column) how would I do that?

Phillip Cloud Over a year ago

You could try df.groupby('team').apply(lambda x: list(x.user)).to_pickle('pickle.pkl').

user1532587 Over a year ago

could yo add the new column?

Yaser Sakkaf Over a year ago

What if we have multiple columns like user?

Phillip Cloud Over a year ago

No. It's 10 years later.

|

Kamil Sindi · Accepted Answer · 2015-09-20 20:21:03Z

15

A more general solution if you want to use agg:

df.groupby('team').agg({'user' : lambda x: ', '.join(x)})

answered Sep 20, 2015 at 20:21

Kamil Sindi

23k19 gold badges101 silver badges122 bronze badges

Collectives™ on Stack Overflow

Replicating GROUP_CONCAT for pandas.DataFrame

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related