Aggregating Rows Pandas

Question

I am quite new to pandas. I need to aggregate 'Names' if they have the same name and then make an average for 'Rating' and 'NumsHelpful' (without counting NaN). 'Review' should get concatenated whilst 'Weight(Pounds)'should remain untouched:

col names: ['Brand', 'Name', 'NumsHelpful', 'Rating', 'Weight(Pounds)', 'Review']

Name             'Brand'                             'Name'
1534             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1535             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1536             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1537             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1538             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1539             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1540             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   

        'NumsHelpful'     'Rating'       'Weight'
1534          NaN            2              4.5   
1535          NaN            2              4.5   
1536          NaN            NaN            4.5   
1537          NaN            NaN            4.5   
1538          2              NaN            4.5   
1539          3              5              4.5   
1540          5              NaN            4.5   

                        'Review'
1534                                     Yummy - Delish  
1535  The best Bloody Mary mix! - The best Bloody Ma...  
1536  Best Taste by far - I've tried several if not ...  
1537  Best bloody mary mix ever - This is also good ...  
1538  Outstanding - Has a small kick to it but very ...  
1539   OMG! So Good! - Spicy, terrific Bloody Mary mix!  
1540                      Good stuff - This is the best

So the output should be something like this:

 'Brand'                'Name'                   'NumsHelpful'    'Rating' 
Zing Zang    Zing Zang Bloody Mary Mix, 32 fl oz     3.33             3

 'Weight'               'Review'
   4.5      Review1 / Review2 / ... / ReviewN

How shall I procede? Thanks.

jezrael · Accepted Answer · 2018-07-08 10:50:40Z

10

Use DataFrameGroupBy.agg with dictionary of columns and aggregated functions - columns Weight and Brand are agregated by first - it means first values per groups:

d = {'NumsHelpful':'mean', 
     'Review':'/'.join, 
     'Weight':'first',
     'Brand':'first', 
     'Rating':'mean'}
df = df.groupby('Name').agg(d).reset_index()
print (df)
                                  Name  NumsHelpful  \
0  Zing Zang Bloody Mary Mix, 32 fl oz     3.333333   

                                              Review  Weight      Brand  \
0  Yummy - Delish/The best Bloody Mary mix! - The...     4.5  Zing Zang   

   Rating  
0     3.0

Also in pandas 0.23.1 pandas version get:

FutureWarning: 'Name' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version

Solution is remove index name Name:

df.index.name = None

Or:

df = df.rename_axis(None)

Another possible solution is not aggregate by first, but add these column to groupby:

d = {'NumsHelpful':'mean',  'Review':'/'.join, 'Rating':'mean'}
df = df.groupby(['Name', 'Weight','Brand']).agg(d).reset_index()

Both solutions return same output if per groups there are same values.

EDIT:

If need convert string (object) column to numeric first try convert by astype:

df['Weight(Pounds)'] = df['Weight(Pounds)'].astype(float)

And if it failed use to_numeric with parameter errors='coerce' for convert non parseable strings to NaNs:

df['Weight(Pounds)'] = pd.to_numeric(df['Weight(Pounds)'], errors='coerce')

edited Jul 8, 2018 at 10:50

answered Jul 8, 2018 at 10:19

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Stefano Pozzi Over a year ago

I tried using your suggestions but the my error still persists :/

jezrael Over a year ago

@StefanoPozzi - are data confidental?

Stefano Pozzi Over a year ago

No but managed to fix it :) had to cast columns that I wanted to contain strings to the string type. Thx for help!

jpp · Accepted Answer · 2018-07-08 10:16:16Z

2

You can aggregate with a different function for each column using groupby + agg, together with a dictionary mapping series to functions. For example:

d = {'Rating': 'mean',
     'NumsHelpful': 'mean',
     'Review': ' | '.join,
     'Weight(Pounds)': 'first'}

res = df.groupby('Name').agg(d)

answered Jul 8, 2018 at 10:16

jpp

166k37 gold badges301 silver badges363 bronze badges

8 Comments

Stefano Pozzi Over a year ago

where 'mean' is np.mean or simply the string? Coz when I try to run the code it gives me a 'TypeError: sequence item 0: expected str instance, float found'

Stefano Pozzi Over a year ago

this is the output 'Brand object - Name object - NumsHelpful float64 - Rating float64 - Weight(Pounds) object - Review object' does this mean that weight has some non float arguments?

jezrael Over a year ago

@StefanoPozzi - What is your pandas version?

jezrael Over a year ago

@StefanoPozzi - object means it is obviously string column.

Stefano Pozzi Over a year ago

version = 0.22.0 / I tried using this -> pd.to_numeric(df['Weight(Pounds)']) but doesn't seem to make any change (still object)

|

2 revs, 2 users 62% · Accepted Answer · 2020-08-15 01:16:24Z

0

I've seen this happen because when creating the index you chose to keep the column in the list, usually the column goes to the index is excluded from the table, so do the following:

# dataset_A was created with the option # drop = False
df_dataset_new = dataset_A.copy()
index_df = ['month', 'scop']

# dataset_new will be create`enter code here`d with the option # drop = True
df_dataset_new.set_index(index_df, drop=True, inplace=True, verify_integrity=True)

edited Aug 15, 2020 at 1:16

community wiki

2 revs, 2 users 62%
Carmo Melo

Collectives™ on Stack Overflow

Aggregating Rows Pandas

3 Answers 3

3 Comments

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related