6

I have a data set, df, with two variables, x and y. I want to write a function that does the following:

x if x>100 and y<50 else y

I am used to doing data analysis in STATA so I'm relatively new to pandas for data analysis. If it helps, in stata it would look like:

replace x = cond(x>100 & y<50, x, y)

In other words, the function is conditional on two columns in df and will return a value from one variable or the other in each row depending on whether the condition is met.

So far I have been creating new variables through new functions like:

df.dummyVar = df.x.apply(lambda x: 1 if x>100 else 0)

Using StackOverflow and the documentation I have only been able to find how to apply a function dependent on a single variable to more than one column (using the axis option). Please help.

3 Answers 3

14

Use where:

df['dummyVar '] = df['x'].where((df['x'] > 100) & (df['y'] < 50), df['y'])

This will be much faster than performing an apply operation as it is vectorised.

Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly what I needed. And this is great because I can already see how I can expand it to conditionals on 3 or more variables. Thank you!
6

Like this:

f = lambda x, y: x if x>100 and y<50 else y

Lambda(s) in Python are equivalent to a normal function definition.

def f(x, y):
    return x if x>100 and y<50 else y

NB: The body of a Lambda must be a valid expression. This means you cannot use things like: return for example; a Lambda will return the last expression evaluated.

For some good reading see:

2 Comments

I had actually written a function like this but was unable to implement so that it would run through each row without a loop. The answer provided by EdChum does exactly that. If you know of a way to accomplish that using this defined function I'm sure I could make use of that in the future. Thank you for you input :)
@seeiespi You originally asked for "How to create a lambda function that takes two arguments?" -- This is how :) -- EdChum provided you with an answer that is more aligned with what you're intentions are/were with your dataset(s) and pandas.
2

There's now an pretty easy way to do this. Just use apply on the dataset:

df['dummy'] = df.apply(lambda row: row['x'] if row['x'] > 100 and row['y'] < 50 else row['y'])

1 Comment

I need to write the axis = 1 to make it work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.