Pandas groupby and agg values in new column

Question

import pandas as pd    
df = pd.DataFrame({'company' : [ABC, ABC , XYZ, XYZ],
                   'tin': ['5555', '1111', '5555', '2222']                   
                   })

I don't know how to get the column with group by column 'tin' if values is equal from the large dataset.

Desirable result:

df = pd.DataFrame({'company' : [ABC, ABC , XYZ, XYZ],                   
                   'tin': ['5555', '1111', '5555', '2222'],                     
                   'column' : ['text' ABC and XYZ, None,'text' ABC and XYZ, None]

               })

if values is equal from the large dataset. - How looks large df? — jezrael
– jezrael, Commented Nov 12, 2020 at 9:15

jezrael · Accepted Answer · 2020-11-12 13:21:48Z

1

I believe you need:

df1 = pd.DataFrame({ 'tin': ['5555', '5555'], 
                   'name' : 'AAA,BBB'.split(',')})

print (df1)
    tin name
0  5555  AAA
1  5555  BBB

df2 = pd.DataFrame({'company' : 'ABC,ABC,XYZ,XYZ,ABC,ABC,XYZ,XYZ'.split(','), 
                   'tin': ['5555', '1111', '5555', '2222', '5555', '1111', '5555', '2222'], 
                   'name' : 'AAA,AAA,AAA,AAA,BBB,BBB,BBB,BBB'.split(',')})

print (df2)
  company   tin name
0     ABC  5555  AAA
1     ABC  1111  AAA
2     XYZ  5555  AAA
3     XYZ  2222  AAA
4     ABC  5555  BBB
5     ABC  1111  BBB
6     XYZ  5555  BBB
7     XYZ  2222  BBB

First use DataFrame.merge for test if match by first DataFrame called df1 with parameter indicator=True and how='left' for left join:

df = df2.merge(df1, on=['tin','name'], how='left', indicator=True)
print (df)
  company   tin name     _merge
0     ABC  5555  AAA       both
1     ABC  1111  AAA  left_only
2     XYZ  5555  AAA       both
3     XYZ  2222  AAA  left_only
4     ABC  5555  BBB       both
5     ABC  1111  BBB  left_only
6     XYZ  5555  BBB       both
7     XYZ  2222  BBB  left_only

Then filter only both rows by boolean indexing:

df = df[df['_merge'].eq('both')]
print (df)
  company   tin name _merge
0     ABC  5555  AAA   both
2     XYZ  5555  AAA   both
4     ABC  5555  BBB   both
6     XYZ  5555  BBB   both

Last aggregate by both columns and assign back by DataFrame.join:

s = df.groupby(['tin','name'])['company'].agg(' and '.join).rename('new')
df = df2.join(s, on=['tin','name'])
print (df)
  company   tin name          new
0     ABC  5555  AAA  ABC and XYZ
1     ABC  1111  AAA          NaN
2     XYZ  5555  AAA  ABC and XYZ
3     XYZ  2222  AAA          NaN
4     ABC  5555  BBB  ABC and XYZ
5     ABC  1111  BBB          NaN
6     XYZ  5555  BBB  ABC and XYZ
7     XYZ  2222  BBB          NaN

edited Nov 12, 2020 at 13:21

answered Nov 12, 2020 at 9:20

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Alpha2020 Over a year ago

Thanks. In the dfbig in 'column' writing dublicates values 'company'. How to adding the second columns for "uniques" rows? df['column'] = (df['tin'].map(df[df['tin'].isin([vals, vals_2])] .groupby('tin')['company'].agg(' and '.join)))

Alpha2020 Over a year ago

If we have two columns 'tin' and 'name' for group import pandas as pd df = pd.DataFrame({'company' : [ABC, ABC , XYZ, XYZ, ABC, ABC , XYZ, XYZ], 'tin': ['5555', '1111', '5555', '2222', '5555', '1111', '5555', '2222'], 'name' : [AAA, AAA , AAA, AAA, BBB , BBB , BBB , BBB ], })

Alpha2020 Over a year ago

I think 'column' : [ABC and XYZ, Nan , Nan, ABC and XYZ, ABC and XYZ , Nan , Nan , ABC and XYZ ] Thus, ABC with 5555 AAA don't intersect XYZ with 5555 BBB

Alpha2020 Over a year ago

Dear jezrael, yes. But, more precisely, it's not dublicated, just it's not target. We should save all row and adding new column where aggrigation info from column 'company' given that 'tin' and 'name' is matches. I tryed df['column'] = (df['tin','name'].map(df[df['tin','name'].isin{'tin':vals ,'name': vals2}] .groupby('tin','name')['company'].agg(' and '.join))) not work

Collectives™ on Stack Overflow

Pandas groupby and agg values in new column

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related