Why is pandas apply lambda slower than loop here?

Question

I have a pandas dataframe which I'd like to filter based on if certain conditions are met. I ran a loop and a .apply() and used %%timeitto test for speed. The dataset has around 45000 rows. The code snippet for loop is:

%%timeit
qualified_actions = []
for row in all_actions.index:
    if all_actions.ix[row,'Lower'] <= all_actions.ix[row, 'Mid'] <= all_actions.ix[row,'Upper']:
        qualified_actions.append(True)
    else:
        qualified_actions.append(False)

1.44 s ± 3.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And for .apply() is:

%%timeit
qualified_actions = all_actions.apply(lambda row: row['Lower'] <= row['Mid'] <= row['Upper'], axis=1)

6.71 s ± 54.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I thought .apply() is supposed to be much faster than looping through rows in pandas. Can someone explain why it's slower in this case?

I don't know for sure, but I guess it's due to the fact that apply must construct a dict for every row. Meanwhile, your for method efficiently accesses the data using ix, without constructing any new objects. I believe this only happens when you apply a Python function; applying numpy functions you stay in C-land and things go fast. — Amadan
– Amadan, Commented Dec 11, 2017 at 8:43
.apply is not suppose to be faster when iterating over rows. An .apply is essentially a for-loop underneath the hood, if you go on axis=1. See here — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Dec 11, 2017 at 8:55
@Amadan it constructs a Series out of each row, actually, but yeah, same effect. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Dec 11, 2017 at 9:01

Emi OB · Accepted Answer · 2022-05-19 09:08:06Z

4

apply uses loops under the hood, so if you need better performance the best and the fastest methods are vecorized alternatives.

No loops, only chain 2 conditions vectorized solution:

m1 = all_actions['Lower'] <= all_actions['Mid']
m2 = all_actions['Mid'] <= all_actions['Upper']
qualified_actions = m1 & m2

Thanks Jon Clements for another solution:

all_actions.Mid.between(all_actions.Lower, all_actions.Upper)

Timings:

np.random.seed(2017)
N = 45000
all_actions=pd.DataFrame(np.random.randint(50, size=(N,3)),columns=['Lower','Mid','Upper'])

#print (all_actions)

In [85]: %%timeit
    ...: qualified_actions = []
    ...: for row in all_actions.index:
    ...:     if all_actions.ix[row,'Lower'] <= all_actions.ix[row, 'Mid'] <= all_actions.ix[row,'Upper']:
    ...:         qualified_actions.append(True)
    ...:     else:
    ...:         qualified_actions.append(False)
    ...: 
    ...: 
__main__:259: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
1 loop, best of 3: 579 ms per loop

In [86]: %%timeit
    ...: (all_actions.apply(lambda row: row['Lower'] <= row['Mid'] <= row['Upper'], axis=1))
    ...: 
1 loop, best of 3: 1.17 s per loop

In [87]: %%timeit
    ...: ((all_actions['Lower'] <= all_actions['Mid']) & (all_actions['Mid'] <= all_actions['Upper']))
    ...: 
1000 loops, best of 3: 509 µs per loop


In [90]: %%timeit
    ...: (all_actions.Mid.between(all_actions.Lower, all_actions.Upper))
    ...: 
1000 loops, best of 3: 520 µs per loop

edited May 19, 2022 at 9:08

Emi OB

3,3953 gold badges20 silver badges40 bronze badges

answered Dec 11, 2017 at 8:48

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jon Clements Over a year ago

Or just all_actions.Mid.between(all_actions.Lower, all_actions.Upper)

jezrael Over a year ago

@JonClements - Thank you, I add it to answer. In sample data it is a bit slowier, but difference is only 11us, so it is approx same.

Collectives™ on Stack Overflow

Why is pandas apply lambda slower than loop here?

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related