Pandas Dataframe Groupby Apply Lambda Function With Multiple Column Returns

Question

I couldn't find anything on SO on this. What I'm trying to do is generate 4 new columns on my existing dataframe, by applying a separate function with 4 specific columns as inputs and return 4 output columns that are not the 4 initial columns. However, the function requires me to slice the dataframe by conditions before usage. I have been using for loops and appending, but it is extremely slow. I was hoping that there was a way to do a MapReduce-esque operation, where it would take my DataFrame, do a groupby and apply a function I separately wrote.

The function has multiple outputs, so just imagine a function like this:

    def func(a,b,c,d):
        return f(a),g(b),h(c),i(d)

where f,g,h,i are different functions performed on the inputs. I am trying to do something like:

    import pandas as pd

    df = pd.DataFrame({'a': range(10),
                       'b': range(10),
                       'c': range(10),
                       'd':range(10},
                       'e': [0,0,0,0,0,1,1,1,1,1])

    df.groupby('e').apply(lambda df['x1'],df['x2'],df['x3'],df['x4'] =
                          func(df['a'],df['b'],df['c'],df['d']))

Wondering if this is possible. If there are other functions out there in the library/ more efficient ways to go about this, please do advise. Thanks.

EDIT: Here's a sample output

   a  b  c  d  e  f  g  h  i 
   --------------------------
   0  0  0  0  0  f1 g1 h1 i1
   1  1  1  1  1  f2 g2 h2 i2
    ... and so on

The reason why I'd like this orientation of operations is due to the function's operations being reliant on structures within the data (hence the groupby) before performing the function. Previously, I obtained the unique values and iterated over them while slicing the dataframe up, before appending it to a new dataframe. Runs in quadratic time.

Victor Chubukov · Accepted Answer · 2017-03-31 04:27:49Z

2

You could do something like this:

def f(data):
    data['a2']=data['a']*2 #or whatever function/calculation you want
    data['b2']=data['b']*3 #etc etc
    #e.g. data['g']=g(data['b'])
    return data

df.groupby('e').apply(f)

edited Mar 31, 2017 at 4:27

answered Mar 31, 2017 at 4:16

Victor Chubukov

1,3751 gold badge11 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

tmpcoder Over a year ago

This works! My final solution nested the function within the new function, but works all the same. Thanks for this

Victor Chubukov Over a year ago

Btw, another good alternative might be to use the groupby.transform function. In that case, you'd call it on one column at a time and then append the columns to your dataframe. It may be significantly more efficient.

Collectives™ on Stack Overflow

Pandas Dataframe Groupby Apply Lambda Function With Multiple Column Returns

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related