0

I couldn't find anything on SO on this. What I'm trying to do is generate 4 new columns on my existing dataframe, by applying a separate function with 4 specific columns as inputs and return 4 output columns that are not the 4 initial columns. However, the function requires me to slice the dataframe by conditions before usage. I have been using for loops and appending, but it is extremely slow. I was hoping that there was a way to do a MapReduce-esque operation, where it would take my DataFrame, do a groupby and apply a function I separately wrote.

The function has multiple outputs, so just imagine a function like this:

    def func(a,b,c,d):
        return f(a),g(b),h(c),i(d)

where f,g,h,i are different functions performed on the inputs. I am trying to do something like:

    import pandas as pd

    df = pd.DataFrame({'a': range(10),
                       'b': range(10),
                       'c': range(10),
                       'd':range(10},
                       'e': [0,0,0,0,0,1,1,1,1,1])

    df.groupby('e').apply(lambda df['x1'],df['x2'],df['x3'],df['x4'] =
                          func(df['a'],df['b'],df['c'],df['d']))

Wondering if this is possible. If there are other functions out there in the library/ more efficient ways to go about this, please do advise. Thanks.

EDIT: Here's a sample output

   a  b  c  d  e  f  g  h  i 
   --------------------------
   0  0  0  0  0  f1 g1 h1 i1
   1  1  1  1  1  f2 g2 h2 i2
    ... and so on 

The reason why I'd like this orientation of operations is due to the function's operations being reliant on structures within the data (hence the groupby) before performing the function. Previously, I obtained the unique values and iterated over them while slicing the dataframe up, before appending it to a new dataframe. Runs in quadratic time.

1 Answer 1

2

You could do something like this:

def f(data):
    data['a2']=data['a']*2 #or whatever function/calculation you want
    data['b2']=data['b']*3 #etc etc
    #e.g. data['g']=g(data['b'])
    return data

df.groupby('e').apply(f)
Sign up to request clarification or add additional context in comments.

2 Comments

This works! My final solution nested the function within the new function, but works all the same. Thanks for this
Btw, another good alternative might be to use the groupby.transform function. In that case, you'd call it on one column at a time and then append the columns to your dataframe. It may be significantly more efficient.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.