0

I need to do a lot of aggregation on data and I was hoping to write a function that would allow me to pass

1) The string to use for grouping 2) The fields that would constitute the numerator/denominator/ and formula

As I will be doing a lot of cuts on the data using different groupings and different numerators and denominators, it would be easier for me to create a generic group by and pass it what I need

So lets take the following example:

import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
(df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum().apply(lambda r: r.tip/r.total_bill, axis = 1))

Now, I would want to create a function that would allow me to pass a group by value and a numerator denominator field

So, for example

groupbyvalue=['sex', 'smoker']
fieldstoaggregate=['tip','total_bill']

And plug them into something like

(df.groupby(groupbyvalue)[fieldstoaggregate].sum().apply(lambda r: r.tip/r.total_bill, axis = 1))

That works fine, but when I tried to replace the formula with something like:

dfformula="r.tip/r.total_bill"

And then placed it in the formula as follows

(df.groupby(groupbyvalue)[fieldstoaggregate].sum().apply(lambda r: dfformula, axis = 1)*10000)

My output looks as follows:

sex     smoker
Female  No        r.tip/r.total_billr.tip/r.total_billr.tip/r.to...
        Yes       r.tip/r.total_billr.tip/r.total_billr.tip/r.to...
Male    No        r.tip/r.total_billr.tip/r.total_billr.tip/r.to...
       Yes       r.tip/r.total_billr.tip/r.total_billr.tip/r.to...
dtype: object

Is there any way to create the calculation dynamically then use it in the formula rather than having it interpreted as a string?

Thanks

1 Answer 1

2

You can achieve this using eval() function

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')

groupbyvalue = ['sex', 'smoker']
fieldstoaggregate = ['tip','total_bill']
dfformula = "r.tip/r.total_bill"

(df.groupby(groupbyvalue)[fieldstoaggregate].sum().apply(lambda r: eval(dfformula), axis = 1))

The output would be as follows

sex     smoker
Female  No        0.153189
        Yes       0.163062
Male    No        0.157312
        Yes       0.136919
dtype: float64
Sign up to request clarification or add additional context in comments.

1 Comment

Awesome!! Thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.