How do I aggregate this data and create a new column with python & pandas?

Question

I am attempting to use pandas to aggregate column data in order to calculate the CPC of ads in my dataset based upon a variable in the dataset such as ad-size, ad-category ad-placement etc. So in the case below I am aggregating the adCost and adClicks grouping by the adSize (Which is a categorical variable of 1-5). How do I generate a new column into the dataset which will take the now aggregated adCost per adSize and adClick per adSize and calculate the cost per click per adSize? I saved the aggregation into a variable but it isn't saving it into a DataFrame or an object that I can later further manipulate. What am I missing or doing wrong?

import pandas as pd
import numpy as np

df = pd.DataFrame(data)

from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()

## Convert 'adSize' to categorial values
df['adSize'] = df['adSize']
df['adSize'] = label_encoder.fit_transform(df['adSize'])

agg_calc = {
    'adCost':{
     # work on the "calculation" column
        'total_cost': 'sum', 
        'avg_cost': 'mean'  
    },
    'adClicks':{
        'total_clicks': 'sum',
        'avg_click': 'mean',
        'count': 'count'
    }
}

## Aggregate by adSize
y= df.groupby(['adSize']).aggregate(agg_calc)

Thanks for your assistance

DrTRD · Accepted Answer · 2016-07-14 16:58:28Z

4

You should be able to simply use groupby. I don't have your data and I'm not entirely certain I understand your question, but something like the following should work:

df['total_cost'] = df.groupby('adSize')['adCost'].transform('sum')
df['avg_cost'] = df.groupby('adSize')['adCost'].transform('mean')
df['total_clicks'] = df.groupby('adSize')['adClicks'].transform('sum')
df['avg_click'] = df.groupby('adSize')['adClicks'].transform('mean')
df['count'] = df.groupby('adSize')['adClicks'].transform('count')

Is that what you're asking?

answered Jul 14, 2016 at 16:58

DrTRD

1,7281 gold badge13 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

OAK Over a year ago

I believe your functions solve what I was trying to achieve. My dilemma was that it seems as if when using the aggregate function on the DataFrame the result is not saved or kept even when saving it into a variable. Such that I can not later manipulate the data created by the aggregate function rather only print out the output. I wanted to further manipulate the results of the aggregate function such as divide one by the other. I think your solution works well though and is more simple. I wonder what then is the purpose or difference between the two methods.

Collectives™ on Stack Overflow

How do I aggregate this data and create a new column with python & pandas?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related