6

I am trying to define an aggregation function with more than one OUTPUT columns, which i would like to use as follows

df.groupby(by=...).agg(my_aggregation_function_with_multiple_columns)

any idea how to do it ?

i tried things like

def my_aggregation_function_with_multiple_columns(slice_values):
    return {'col_1': -1,'col_2': 1}

but this will logically output the dictionary {'col_1': -1,'col_2': 1} in a single column...

2 Answers 2

3

It is not possible, because agg working with all columns separately - first process first column, then second.... to the end.

Solution is flexible apply and for return multiple output add Series if output is more scalars.

def my_aggregation_function_with_multiple_columns(slice_values):
    return pd.Series([-1, 1], index=['col_1','col_2'])

df.groupby(by=...).apply(my_aggregation_function_with_multiple_columns)

Sample:

df = pd.DataFrame(dict(A=[1,1,2,2,3], B=[4,5,6,7,2], C=[1,2,4,6,9]))
print (df)

def my_aggregation_function_with_multiple_columns(slice_values):
    #print each group
    #print (slice_values)
    a = slice_values['B'] + slice_values['C'].shift()
    print (type(a))
    return a

<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>

df = df.groupby('A').apply(my_aggregation_function_with_multiple_columns)
print (df)
A   
1  0     NaN
   1     6.0
2  2     NaN
   3    11.0
3  4     NaN
dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

1

The question can be interpreted in multiple ways. The following offers a solution for computing more than one output column, giving the possibility to use a different function for each column.

The example uses the same Pandas DataFrame df as the answer above:

import pandas as pd
df = pd.DataFrame(dict(A=[1,1,2,2,3], B=[4,5,6,7,2], C=[1,2,4,6,9]))

As a function of the groups in A the sum of the values in B is computed and put in one column, and the number of values (count) in B is computed and put in another column.

df.groupby(['A'], as_index=False).agg({'B': {'B1':sum, 'B2': "count"}})

Because dictionaries with renaming will be deprecated in future versions the following code may be better:

df.groupby(['A'], as_index=False).agg({'B': {sum, "count"}})

The next example shows how to do this if you want to have different computations on different columns, for computing the sum of B and mean of C:

df.groupby(['A'], as_index=False).agg({'B': sum, 'C': "mean"})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.