0

So I have quite big DataFrame and I create a new column by some equation based on other columns:

df['F'] = (params.a * params.b * df.A/1000 - param.C * (df.B + df.C - df.D) + param.D * df.E

and it works perfectly fine. Except I want to repeat this function throughout the code, so instead of error-prone copying and pasting I want to cast it into a reusable function.

So I casted it into lambda:

def fun(r):
     return (params.a * params.b * r.A/1000 - param.C * (r.B + r.C - r.D) + param.D * r.E   
df['F'] = r.apply(funy,axis =1)

yet this is 5x slower now (1.2s vs 6s for 10k rows).

What should I do if I want to have a neat function and still benefit from speed?

3
  • Second one is not recommendable. first one is vectorized approach. Commented Aug 29, 2019 at 7:18
  • Try lambda inside apply function Commented Aug 29, 2019 at 7:19
  • @AniketDixit but this means repeating the code everytime I use it, which I want to avoid. Commented Aug 29, 2019 at 7:26

1 Answer 1

1

What's wrong with:

def fun():
    return params.a * params.b * df.A/1000 - param.C * (df.B + df.C - df.D) + param.D * df.E

df['F'] = fun()

So you get a reusable vectorized function.

Sign up to request clarification or add additional context in comments.

2 Comments

nice shortcut, let me try it.
works like charm, %timeit gives exactly the same computation times. Only limitation (not for my case) is that it has to be in the same namespace and using variables that are already there. So best way is to define it inside a function from which you call

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.