0

I am trying to preprocess a dataset with pandas. I want to use a function with multiple arguments (one from a column of the dataframe, others are variables) which returns several outputs like this:

def preprocess(Series,var1,var2,var3,var4):
   return 1,2,3,4

I want to use the native pandas.apply to use this function on one column of my dataframe like this:

import pandas as pd

df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])

df['C'], df['D'], df['E'], df['F'] = df.apply(lambda x: preprocess(x['A'], 1, 2, 3, 4), axis=1)

But the last line gives me the following error:

ValueError: not enough values to unpack (expected 4, got 3)

I understand my last line returns one tuple of 4 values (1,2,3,4) per line whereas I wanted to get each of these values in the columns C, D, etc.

How can I perform this?

1 Answer 1

1

You need to re-write your function to return a series, that way, apply returns a dataframe:

def preprocess(Series,var1,var2,var3,var4):
    return pd.Series([1,2,3,4])

Then your code would run and return

   A  B  C  D  E  F
0  4  9  0  1  2  3
1  4  9  0  1  2  3
2  4  9  0  1  2  3

Update: Without rewrite of the function:

processed = df.apply(lambda x: preprocess(x['A'], 1, 2, 3, 4), axis=1)
df['C'], df['D'], df['E'], df['F'] = np.array(processed.to_list()).T
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, there is a way to avoid re-write my function? I have other data from json file where the function preprocess is already working. I would prefer to generalize my preprocessing with one function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.