How to check if entries in Pandas DataFrame are in a List using pandas.apply

Question

I have a DataFrame with a column name that includes string data-type. I want to check if entries of this column exist in a Reference list. I tried pandas.apply, but it doesn't work.

Sample data:

import pandas as pd

data = [('A', '10'),
        ('B', '10'),
        ('C', '10'),
        ('D', '10'),
        ('E', '20'),
        ('F', '20'),
        ('G', '25') ]

data_df = pd.DataFrame(data, columns = ['name', 'value'])

Sample code:

reference = ['A', 'B', 'Z']


def is_in_reference(x, reference):
    if x in reference:
        return 'Yes'
    else:
        return 'No'
    

data_df['is_in_reference'] = data_df['name'].apply(is_in_reference, args=(reference))

But, I get the error:

TypeError: is_in_reference() takes 2 positional arguments but 4 were given

I appreciate it if you could help me on this.

tankthinks · Accepted Answer · 2022-02-03 03:10:12Z

2

You can actually use the built-in Series.isin function as in

data_df['is_in_reference'] = data_df['name'].isin(reference)

But since you asked about apply, the fix is actually a small yet nefarious Python syntax issue, you MUST add a trailing comma in the args tuple:

data_df['is_in_reference'] = data_df['name'].apply(is_in_reference, args=(reference,))

NOTE the , in (reference,), otherwise Python does not turn this into a tuple.

answered Feb 3, 2022 at 3:10

tankthinks

1,0017 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to check if entries in Pandas DataFrame are in a List using pandas.apply

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related