0

So far my approach to the task described in the title is quite straightforward, yet it seems somewhat inefficient/unpythonic. An example of what I usually do is as follows:


The original Pandas DataFramedf has 6 columns: 'open', 'high', 'low', 'close', 'volume', 'new dt'

import pandas as pd

df_gb = df.groupby('new dt')

arr_high = df_gb['high'].max()
arr_low = df_gb['low'].min()
arr_open = df_gb['open'].first()
arr_close = df_gb['close'].last()
arr_volumne = df_gb['volume'].sum()

df2 = pd.concat([arr_open,
                 arr_high,
                 arr_low,
                 arr_close,
                 arr_volumne], axis = 'columns')

It may seem already efficient at first glance, but when I have 20 functions waiting to apply on 20 different columns, it quickly becomes unpythonic/inefficient.

Is there any way to make it more efficient/pythonic? Thank you in advance

2 Answers 2

1

If you have 20 different functions you will have to properly match columns with functions anyways. The term pythonic can be subjective so this is not the correct answer but potentially useful. Your approach is pythonic in my opinion and it kinda details what is happening properly

# as long as the columns are ordered with the proper functions
# you may have to change the ordering here
columns_to_agg = (column for column in df.columns if column != 'new dt')

# if the functions are all methods of pandas.Series just use strings
agg_methods = ['first', 'max', 'min', 'last', 'sum']

# construct a dictionary and use it as aggregator
agg_dict = dict((el[0], el[1]) for el in zip(columns_to_agg, agg_methods))
df_gb = df.groupby('new dt', as_index=False).agg(agg_dict)

If you have custom functions you wanted to apply to, say volume, you could do


def custom_f(series):
    return pd.notnull(series).sum()
agg_methods = ['first', 'max', 'min', 'last', custom_f]

Everything else will be fine. You could even do this to apply sum and custom_f to your volume column

agg_methods = ['first', 'max', 'min', 'last', ['sum', custom_f]]
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for your reply. I will try to see if it works.
np. let me know if it does
what happens if some functions are user-defined(as in, not defined in pandas as a string like 'first', 'max')?
You can have that function as part of the dictionary. Let me update my answer
Really appreciate it
|
1
In [3]: import pandas as pd                                                     
In [4]: import numpy as np                                                      
In [5]: df = pd.DataFrame([[1, 2, 3],[4, 5, 6],[7, 8, 9], 
...: [np.nan, np.nan, np.nan]],columns=['A', 'B', 'C']) 

In [6]: df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})                    
Out[6]: 
        A    B
max   NaN  8.0
min   1.0  2.0
sum  12.0  NaN

For functions as column:

In [11]: df.agg({'A' : ['sum'], 'B' : ['min', 'max']}).T                        
Out[11]: 
   max  min   sum
A  NaN  NaN  12.0
B  8.0  2.0   NaN

For using custom functions you can do like this:

In [12]: df.agg({'A' : ['sum',lambda x:x.mean()], 'B' : ['min', 'max']}).T      
Out[12]: 
   <lambda>  max  min   sum
A       4.0  NaN  NaN  12.0
B       NaN  8.0  2.0   NaN

1 Comment

Thanks, very nice to know custom functions can be used in this way

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.