I am attempting to use pandas to aggregate column data in order to calculate the CPC of ads in my dataset based upon a variable in the dataset such as ad-size, ad-category ad-placement etc. So in the case below I am aggregating the adCost and adClicks grouping by the adSize (Which is a categorical variable of 1-5). How do I generate a new column into the dataset which will take the now aggregated adCost per adSize and adClick per adSize and calculate the cost per click per adSize? I saved the aggregation into a variable but it isn't saving it into a DataFrame or an object that I can later further manipulate. What am I missing or doing wrong?
import pandas as pd
import numpy as np
df = pd.DataFrame(data)
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
## Convert 'adSize' to categorial values
df['adSize'] = df['adSize']
df['adSize'] = label_encoder.fit_transform(df['adSize'])
agg_calc = {
'adCost':{
# work on the "calculation" column
'total_cost': 'sum',
'avg_cost': 'mean'
},
'adClicks':{
'total_clicks': 'sum',
'avg_click': 'mean',
'count': 'count'
}
}
## Aggregate by adSize
y= df.groupby(['adSize']).aggregate(agg_calc)
Thanks for your assistance