3

I am attempting to use pandas to aggregate column data in order to calculate the CPC of ads in my dataset based upon a variable in the dataset such as ad-size, ad-category ad-placement etc. So in the case below I am aggregating the adCost and adClicks grouping by the adSize (Which is a categorical variable of 1-5). How do I generate a new column into the dataset which will take the now aggregated adCost per adSize and adClick per adSize and calculate the cost per click per adSize? I saved the aggregation into a variable but it isn't saving it into a DataFrame or an object that I can later further manipulate. What am I missing or doing wrong?

import pandas as pd
import numpy as np

df = pd.DataFrame(data)

from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()

## Convert 'adSize' to categorial values
df['adSize'] = df['adSize']
df['adSize'] = label_encoder.fit_transform(df['adSize'])

agg_calc = {
    'adCost':{
     # work on the "calculation" column
        'total_cost': 'sum', 
        'avg_cost': 'mean'  
    },
    'adClicks':{
        'total_clicks': 'sum',
        'avg_click': 'mean',
        'count': 'count'
    }
}

## Aggregate by adSize
y= df.groupby(['adSize']).aggregate(agg_calc)

Thanks for your assistance

1 Answer 1

4

You should be able to simply use groupby. I don't have your data and I'm not entirely certain I understand your question, but something like the following should work:

df['total_cost'] = df.groupby('adSize')['adCost'].transform('sum')
df['avg_cost'] = df.groupby('adSize')['adCost'].transform('mean')
df['total_clicks'] = df.groupby('adSize')['adClicks'].transform('sum')
df['avg_click'] = df.groupby('adSize')['adClicks'].transform('mean')
df['count'] = df.groupby('adSize')['adClicks'].transform('count')

Is that what you're asking?

Sign up to request clarification or add additional context in comments.

1 Comment

I believe your functions solve what I was trying to achieve. My dilemma was that it seems as if when using the aggregate function on the DataFrame the result is not saved or kept even when saving it into a variable. Such that I can not later manipulate the data created by the aggregate function rather only print out the output. I wanted to further manipulate the results of the aggregate function such as divide one by the other. I think your solution works well though and is more simple. I wonder what then is the purpose or difference between the two methods.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.