1

I am trying to use apply function to my DataFrame. The apply use a custom function that returns 2 values and that needs to populate the row of 2 columns on my DataFrame.

I put a simple example below:

df = DataFrame ({'a' : 10})

I wish to create two columns: b and c. b equals 1 if a is above 0. c equals 1 if a is above 0.

def compute_b_c(a):
   if a > 0:
      return 1, 1
   else:
      return 0,0

I tried this but it returns key error:

df[['b', 'c']] = df.a.apply(compute_b_c)

2 Answers 2

1

It is possible with DataFrame constructor,also 1,1 and 0,0 are like tuples (1,1) and (0,0):

df = pd.DataFrame ({'a' : [10, -1, 9]})

def compute_b_c(a):
   if a > 0:
      return (1,1)
   else:
      return (0,0)

df[['b', 'c']] = pd.DataFrame(df.a.apply(compute_b_c).tolist())
print (df)
    a  b  c
0  10  1  1
1  -1  0  0
2   9  1  1

Performance:

#10k rows
df = pd.DataFrame ({'a' : [10, -1, 9] * 10000})

In [79]: %timeit df[['b', 'c']] = pd.DataFrame(df.a.apply(compute_b_c).tolist())
22.6 ms ± 285 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [80]: %timeit df[['b', 'c']] = df.apply(lambda row: compute_b_c(row['a']), result_type='expand', axis=1)
5.25 s ± 84.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! I just tested the first solution and it works great. I am now checking the second one and I am wondering if it is an efficient solution to create a new DataFrame each time and if it is not a costly operation
@Solal - I think more complicated like seems. Generaly best is use pandas native functions, these are fastest (obviosly). If need custom fuction, it depends of each of them individually.
@Solal - tested functions with sample function and my second solution was slow, so remove it.
actually the second function was not working because it kept only the data of the first row computed
0

Use result_type parameter of pandas.DataFrame.apply. Applicable only if you use apply function on df(DataFrame) and not df.a(Series)

df[['b', 'c']] = df.apply(lambda row: compute_b_c(row['a']), result_type='expand', axis=1)

4 Comments

result_type is an unexpected keyword argument
Notice the difference, its df.apply and not df.a.apply
@nishant - Unfortunately this method is slow, maybe not very optimalized because rarest used or because DataFrame.apply with axis=1.
@jezrael Yes..in case of large dfs, I will prefer your solution. However just out of curiosity why is series.apply.tolist() faster than df.apply?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.