Pandas dataframe apply to multiple column

Question

I am trying to use apply function to my DataFrame. The apply use a custom function that returns 2 values and that needs to populate the row of 2 columns on my DataFrame.

I put a simple example below:

df = DataFrame ({'a' : 10})

I wish to create two columns: b and c. b equals 1 if a is above 0. c equals 1 if a is above 0.

def compute_b_c(a):
   if a > 0:
      return 1, 1
   else:
      return 0,0

I tried this but it returns key error:

df[['b', 'c']] = df.a.apply(compute_b_c)

jezrael · Accepted Answer · 2020-03-24 07:07:02Z

1

It is possible with DataFrame constructor,also 1,1 and 0,0 are like tuples (1,1) and (0,0):

df = pd.DataFrame ({'a' : [10, -1, 9]})

def compute_b_c(a):
   if a > 0:
      return (1,1)
   else:
      return (0,0)

df[['b', 'c']] = pd.DataFrame(df.a.apply(compute_b_c).tolist())
print (df)
    a  b  c
0  10  1  1
1  -1  0  0
2   9  1  1

Performance:

#10k rows
df = pd.DataFrame ({'a' : [10, -1, 9] * 10000})

In [79]: %timeit df[['b', 'c']] = pd.DataFrame(df.a.apply(compute_b_c).tolist())
22.6 ms ± 285 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [80]: %timeit df[['b', 'c']] = df.apply(lambda row: compute_b_c(row['a']), result_type='expand', axis=1)
5.25 s ± 84.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Mar 24, 2020 at 7:07

answered Mar 24, 2020 at 6:52

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Solal Over a year ago

Thanks! I just tested the first solution and it works great. I am now checking the second one and I am wondering if it is an efficient solution to create a new DataFrame each time and if it is not a costly operation

jezrael Over a year ago

@Solal - I think more complicated like seems. Generaly best is use pandas native functions, these are fastest (obviosly). If need custom fuction, it depends of each of them individually.

jezrael Over a year ago

@Solal - tested functions with sample function and my second solution was slow, so remove it.

Solal Over a year ago

actually the second function was not working because it kept only the data of the first row computed

nishant · Accepted Answer · 2020-03-24 07:50:23Z

0

Use result_type parameter of pandas.DataFrame.apply. Applicable only if you use apply function on df(DataFrame) and not df.a(Series)

df[['b', 'c']] = df.apply(lambda row: compute_b_c(row['a']), result_type='expand', axis=1)

edited Mar 24, 2020 at 7:50

answered Mar 24, 2020 at 6:59

nishant

9351 gold badge10 silver badges28 bronze badges

4 Comments

Solal Over a year ago

result_type is an unexpected keyword argument

nishant Over a year ago

Notice the difference, its df.apply and not df.a.apply

jezrael Over a year ago

@nishant - Unfortunately this method is slow, maybe not very optimalized because rarest used or because DataFrame.apply with axis=1.

nishant Over a year ago

@jezrael Yes..in case of large dfs, I will prefer your solution. However just out of curiosity why is series.apply.tolist() faster than df.apply?

Collectives™ on Stack Overflow

Pandas dataframe apply to multiple column

2 Answers 2

4 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related