0

I have a DataFrame with a column name that includes string data-type. I want to check if entries of this column exist in a Reference list. I tried pandas.apply, but it doesn't work.

Sample data:

import pandas as pd

data = [('A', '10'),
        ('B', '10'),
        ('C', '10'),
        ('D', '10'),
        ('E', '20'),
        ('F', '20'),
        ('G', '25') ]

data_df = pd.DataFrame(data, columns = ['name', 'value'])

Sample code:

reference = ['A', 'B', 'Z']


def is_in_reference(x, reference):
    if x in reference:
        return 'Yes'
    else:
        return 'No'
    

data_df['is_in_reference'] = data_df['name'].apply(is_in_reference, args=(reference))

But, I get the error:

TypeError: is_in_reference() takes 2 positional arguments but 4 were given

I appreciate it if you could help me on this.

1 Answer 1

2

You can actually use the built-in Series.isin function as in

data_df['is_in_reference'] = data_df['name'].isin(reference)

But since you asked about apply, the fix is actually a small yet nefarious Python syntax issue, you MUST add a trailing comma in the args tuple:

data_df['is_in_reference'] = data_df['name'].apply(is_in_reference, args=(reference,))

NOTE the , in (reference,), otherwise Python does not turn this into a tuple.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.