Python Pandas: Groupby Sum AND Concatenate Strings

Question

Sample Pandas Dataframe:

ID Name COMMENT1 COMMENT2 NUM
1  dan  hi       hello    1
1  dan  you      friend   2
3  jon  yeah     nope     3
2  jon  dog      cat      .5
3  jon  yes      no       .1

I am trying to create a dataframe that groups by ID and NAME that concatenates COMMENT1 and COMMENT2 that also sums NUM.

This is what I'm looking for:

ID Name COMMENT1     COMMENT2        NUM
1  dan  hi you       hello friend    3
3  jon  yeah yes     nope no         3.1
2  jon  dog          cat             .5

I tried using this:

input_df = input_df.groupby(['ID', 'NAME', 'COMMENT1', 'COMMENT2']).sum().reset_index()

But it doesn't work.

If I use this:

input_df = input_df.groupby(['ID']).sum().reset_index()

It sums the NUM column but leaves out all other columns.

Possible duplicate of Pandas groupby: How to get a union of strings - the accepted answer there shows how to use a lambda to get what you want — Patrick Artner
– Patrick Artner, Commented Dec 1, 2017 at 20:15

BENY · Accepted Answer · 2017-12-01 20:21:39Z

20

Let us make it into one line

df.groupby(['ID','Name'],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
Out[1510]: 
   ID Name  COMMENT1      COMMENT2  NUM
0   1  dan    hi you  hello friend  3.0
1   2  jon       dog           cat  0.5
2   3  jon  yeah yes       nope no  3.1

answered Dec 1, 2017 at 20:21

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Yuca Over a year ago

if there's a NaN in the group this doesn't work, correct?

BENY Over a year ago

@Yuca you mean the group key ?

Yuca Over a year ago

if instead of 'cat' there was NaN, then it looks like the code wouldn't work, no?

BENY Over a year ago

@Yuca you can replace the NaN to'NaN' for future adjust

bernando_vialli Over a year ago

@WeNYoBen, thank you. Does this preserve the order of the strings in the pandas dataframe column that is being concatenated?

|

hamx0r · Accepted Answer · 2019-07-18 23:55:14Z

5

You can also just tell .agg() which aggregator functions to use for each column, and for the string columns, pass ' '.join (notice there're no parenthesis since you don't want to call .join but rather pass it as the argument itself):

df.groupby(['ID','Name'],as_index=False).agg({'COMMENT1': ' '.join, 'COMMENT2': ' '.join, 'NUM': 'sum'})

answered Jul 18, 2019 at 23:55

hamx0r

4,3481 gold badge38 silver badges53 bronze badges

Comments

Thom Ives · Accepted Answer · 2017-12-01 21:11:18Z

Converting your data example into a csv file, we can do the following:

import pandas as pd

def grouping_Cols_by_Cols(DF, grouping_Columns, num_Columns):
    # numerical columns can mess us up ...
    column_Names = DF.columns.tolist()
    # so, convert all columns' values to strings
    for column_Name in column_Names:
        DF[column_Name] = DF[column_Name].map(str) + ' '
    DF = DF.groupby(by=grouping_Columns).sum()

    # NOW, convert the numerical string columns to an expression ...
    for num_Col in num_Columns:
        column_Names = DF.columns.tolist()
        num_Col_i = column_Names.index(num_Col)
        for i in range(len(DF)):
            String = DF[num_Col].iloc[i] 
            value = eval(String.rstrip(' ').replace(' ','+'))
            DF.iat[i,num_Col_i] = value

    return DF

###############################################################
### Operations Section
###############################################################

df = pd.read_csv("UnCombinedData.csv")

grouping_Columns = ['ID','Name']
num_Columns = ['NUM']
df = grouping_Cols_by_Cols(df,grouping_Columns, num_Columns)

print df

With a little more work, the defined function could auto detect, which columns have numbers in them and add them to a numerical columns list.

I think this is similar, but not exact, to problems and challenges encountered in this post.

Collectives™ on Stack Overflow

Python Pandas: Groupby Sum AND Concatenate Strings

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related