pandas apply function to each group (output is not really an aggregation)

Question

I have a list of time-series (=pandas dataframe) and want to calculate for each time-series (of a device) the matrixprofile. One option is to iterate all the devices - which seems to be slow. A second option would be to group by the devices - and apply a UDF. The problem is now, that the UDF will return 1:1 rows i.e. not a single scalar value per group but the same number of rows will be outputted as the input.

Is it still possible to somehow vectorize this calculation for reach group when 1:1 (or at least non scalar values) are returned?

import pandas as pd
df = pd.DataFrame({
    'foo':[1,2,3], 'baz':[1.1, 0.5, 4], 'bar':[1,2,1]
})
display(df)

print('***************************')
# slow version retaining all the rows
for g in df.bar.unique():
    print(g)
    
    this_group = df[df.bar == g]
    # perform a UDF which needs to have all the values per group
    # i.e. for real I want to calculate the matrixprofile for each time-series of a device
    this_group['result'] = this_group.baz.apply(lambda x: 1)
    display(this_group)

print('***************************')

def my_non_scalar1_1_agg_function(x):
    display(pd.DataFrame(x))
    return x

# neatly vectorized application of a non_scalar function
# but this fails as:  Must produce aggregated value
df = df.groupby(['bar']).baz.agg(my_non_scalar1_1_agg_function)
display(df)

Sure: gist.github.com/geoHeil/7344932b27f05bfaab551b3b948ac2c5 see for code which generates an exaample dataset and uses the stumpy.stump UDF. — Georg Heiler
– Georg Heiler, Commented Nov 9, 2020 at 13:31
I guess that the second (non accepted) answer: stackoverflow.com/questions/42171132/… should work here as well and give it a try — Georg Heiler
– Georg Heiler, Commented Nov 9, 2020 at 14:24
Does stumpy.stump return a single scalar value? Docs indicates it returns an ndarray of 4 columns. Please post example output of one call and what single scalar value you need to extract. — Parfait
– Parfait, Commented Nov 9, 2020 at 15:07

Parfait · Accepted Answer · 2020-11-09 15:22:20Z

4

For non-aggregated functions applied to each distinct group that does not return a non-scalar value, you need to iterate method across groups and then compile together.

Therefore, consider a list or dict comprehension using groupby(), followed by concat. Be sure method inputs and returns a full data frame, series, or ndarray.

# LIST COMPREHENSION
df_list = [ myfunction(sub) for index, sub in df.groupby(['group_column']) ]
final_df = pd.concat(df_list)

# DICT COMPREHENSION
df_dict = { index: myfunction(sub) for index, sub in df.groupby(['group_column']) }
final_df = pd.concat(df_dict, ignore_index=True)

answered Nov 9, 2020 at 15:22

Parfait

108k19 gold badges103 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Georg Heiler · Accepted Answer · 2020-11-09 14:30:48Z

0

Indeed this (see also the link above in the comment) is a way to get it to work in a faster/more desired way. Perhaps there is even a better alternative

import pandas as pd
df = pd.DataFrame({
    'foo':[1,2,3], 'baz':[1.1, 0.5, 4], 'bar':[1,2,1]
})
display(df)

grouped_df = df.groupby(['bar'])

altered = []
for index, subframe in grouped_df:
    display(subframe)
    subframe = subframe# obviously we need to apply the UDF here - not the idempotent operation (=doing nothing)
    altered.append(subframe)
    print (index)
    #print (subframe)
   
pd.concat(altered, ignore_index=True)
#pd.DataFrame(altered)

answered Nov 9, 2020 at 14:30

Georg Heiler

17.9k44 gold badges176 silver badges319 bronze badges

Collectives™ on Stack Overflow

pandas apply function to each group (output is not really an aggregation)

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related