Applying a dataframe function to a pandas groupby object

Question

I am trying to apply a function to each group in a pandas dataframe where the function requires access to the entire group (as opposed to just one row). For this I am iterating over each group in the groupby object. Is this the best way to achieve this?

import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,2,2,2], 
                   'value': [70,10,20,100,50,5,33], 
                   'other_value': [2.3, 3.3, 7.4, 1.1, 5, 10.3, 12]})
def clean_df(df, v_col, other_col):
    '''This function is just a made up example and might 
       get more complex in real life. ;)
    '''
    prev_points = df[v_col].shift(1)
    next_points = df[v_col].shift(-1)
    return df[(prev_points > 50) | (next_points < 20)]  
grouped = df.groupby('id')
pd.concat([clean_df(group, 'value', 'other_value') for _, group in grouped])

The original dataframe is

    id  other_value value
0   1   2.3         70
1   1   3.3         10
2   1   7.4         20
3   1   1.1         100
4   2   5.0         50
5   2   10.3        5
6   2   12.0        33

The code will reduce it to

    id  other_value value
0   1   2.3         70
1   1   3.3         10
4   2   5.0         50

Graipher · Accepted Answer · 2019-04-09 11:54:36Z

You can directly use apply on the grouped dataframe and it will be passed the whole group:

def clean_df(df, v_col='value', other_col='other_value'):
    '''This function is just a made up example and might 
       get more complex in real life. ;)
    '''
    prev_points = df[v_col].shift(1)
    next_points = df[v_col].shift(-1)
    return df[(prev_points > 50) | (next_points < 20)]  

df.groupby('id').apply(clean_df).reset_index(level=0, drop=True)
#    id  other_value  value
# 0   1          2.3     70
# 1   1          3.3     10
# 4   2          5.0     50

Note that I had to give the other arguments default values, since the function that is applied needs to have only one argument. Another way around this is to make a function that returns the function:

def clean_df(v_col, other_col):
    '''This function is just a made up example and might 
       get more complex in real life. ;)
    '''
    def wrapper(df):
        prev_points = df[v_col].shift(1)
        next_points = df[v_col].shift(-1)
        return df[(prev_points > 50) | (next_points < 20)]  
    return wrapper

Which you can use like this:

df.groupby('id').apply(clean_df('value', 'other_value')).reset_index(level=0, drop=True)

Or you can use functools.partial with your clean_df:

from functools import partial

df.groupby('id') \
  .apply(partial(clean_df, v_col='value', other_col='other_value')) \
  .reset_index(level=0, drop=True)

Stack Exchange Network

Applying a dataframe function to a pandas groupby object

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Applying a dataframe function to a pandas groupby object

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions