I couldn't find anything on SO on this. What I'm trying to do is generate 4 new columns on my existing dataframe, by applying a separate function with 4 specific columns as inputs and return 4 output columns that are not the 4 initial columns. However, the function requires me to slice the dataframe by conditions before usage. I have been using for loops and appending, but it is extremely slow. I was hoping that there was a way to do a MapReduce-esque operation, where it would take my DataFrame, do a groupby and apply a function I separately wrote.
The function has multiple outputs, so just imagine a function like this:
def func(a,b,c,d):
return f(a),g(b),h(c),i(d)
where f,g,h,i are different functions performed on the inputs. I am trying to do something like:
import pandas as pd
df = pd.DataFrame({'a': range(10),
'b': range(10),
'c': range(10),
'd':range(10},
'e': [0,0,0,0,0,1,1,1,1,1])
df.groupby('e').apply(lambda df['x1'],df['x2'],df['x3'],df['x4'] =
func(df['a'],df['b'],df['c'],df['d']))
Wondering if this is possible. If there are other functions out there in the library/ more efficient ways to go about this, please do advise. Thanks.
EDIT: Here's a sample output
a b c d e f g h i
--------------------------
0 0 0 0 0 f1 g1 h1 i1
1 1 1 1 1 f2 g2 h2 i2
... and so on
The reason why I'd like this orientation of operations is due to the function's operations being reliant on structures within the data (hence the groupby) before performing the function. Previously, I obtained the unique values and iterated over them while slicing the dataframe up, before appending it to a new dataframe. Runs in quadratic time.