1

Given an input dataframe and string:

df = pd.DataFrame({"A" : [10, 20, 30], "B" : [0, 1, 8]})
colour = "green" #or "red", "blue" etc.

I want to add a new column df["C"] conditional on the values in df["A"], df["B"] and colour so it looks like:

df = pd.DataFrame({"A" : [4, 2, 10], "B" : [1, 4, 3], "C" : [True, True, False]})

So far, I have a function that works for just the input values alone:

def check_passing(colour, A, B):
    if colour == "red":
        if B < 5:
            return True
        else:
            return False
    if colour == "blue":
        if B < 10:
            return True
        else:
            return False
    if colour == "green":
        if B < 5:
            if A < 5:
                return True
            else:
                return False
        else:
            return False

How would you go about using this function in df.assign() so that it calculates this for each row? Specifically, how do you pass each column to check_passing()?

df.assign() allows you to refer to the columns directly or in a lambda, but doesn't work within a function as you're passing in the entire column:

df = df.assign(C = check_passing(colour, df["A"], df["B"]))

Is there a way to avoid a long and incomprehensible lambda? Open to any other approaches or suggestions!

1 Answer 1

4

Applying a function like that can be inefficient, especially when dealing with dataframes with many rows. Here is a one-liner:

colour = "green" #or "red", "blue" etc.

df['C'] = ((colour == 'red') & df['B'].lt(5)) | ((colour == 'blue') & df['B'].lt(5)) | ((colour == 'green') & df['B'].lt(5) & df['A'].lt(5))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.