Check if a column's integer is in another column's string of integers

Question

A dataframe has two columns. One has a single integer per row. The other has a string of multiple integers, separated by ',', per row:

import pandas as pd
duck_ids = ["1, 4, 5, 7", "3, 11, 14, 27"]
ducks_of_interest = [4,15]
duck_df = pd.DataFrame(
    {
        "DucksOfInterests": ducks_of_interest,
        "DuckIDs": duck_ids
    }
)
print(f"The starting dataframe:\n{duck_df}")


   DucksOfInterests        DuckIDs
0                 4     1, 4, 5, 7
1                15  3, 11, 14, 27

A new column is required that returns a True if the Duck of Interest is within the set of Duck IDs. This is attempted using a simple lambda function with the .apply method:

duck_df['DoIinDIDs'] = duck_df.apply(lambda x: str(x['DuckIDs']) in [x['DucksOfInterests']], axis=1)

This was expected to return a True for the first row, as 4 is a number in "1, 4, 5, 7", and False for the second row. However, the result is False for both rows:

print(f"The dataframe with the additional column:\n{duck_df}")

   DucksOfInterests        DuckIDs  DoIinDIDs
0                 4     1, 4, 5, 7      False
1                15  3, 11, 14, 27      False

What is the error in the code or the approach?

mozway · Accepted Answer · 2023-10-05 07:53:00Z

3

You were almost there but unnecessarily used a list and swapped the names:

duck_df['DoIinDIDs'] = duck_df.apply(lambda x: str(x['DucksOfInterests'])
                                     in x['DuckIDs'], axis=1)

Output:

   DucksOfInterests        DuckIDs  DoIinDIDs
0                 4     1, 4, 5, 7       True
1                15  3, 11, 14, 27      False

Note, however, that this approach might fail as you rely on the whole string and 4 would be found in 1, 14, 20.

You can instead split the string:

duck_df['DoIinDIDs'] = duck_df.apply(lambda x: str(x['DucksOfInterests'])
                                     in x['DuckIDs'].split(', '), axis=1)

Finally, as apply on axis=1 is slow, you can replace the whole thing by a list comprehension:

duck_df['DoIinDIDs'] = [str(a) in b.split(', ')
                        for a, b in zip(duck_df['DucksOfInterests'],
                                        duck_df['DuckIDs'])]

answered Oct 5, 2023 at 7:53

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

acolls_badger Over a year ago

I hadn't realised the error you noted, thank you for pointing this out and explaining the use of split to avoid this issue.

mozway Over a year ago

You're welcome. Also, if you don't have a reliable separator you could use a regex instead (str(a) in re.findall('\d+', b)). Or even bool(re.search(fr'\b{a}\b', b)) in place of str(a) in b.split(', ')

Guy · Accepted Answer · 2023-10-05 07:59:43Z

1

You have two issues, you need to replace the order of DucksOfInterests and DuckIDs and you need to convert the string to list of ints rather than the int to string, "4" in "3, 11, 14, 27" will return True

duck_df['DoIinDIDs'] = duck_df.apply(lambda x: x['DucksOfInterests'] in map(int, x['DuckIDs'].split(',')), axis=1)

answered Oct 5, 2023 at 7:59

Guy

51.2k10 gold badges49 silver badges96 bronze badges

Collectives™ on Stack Overflow

Check if a column's integer is in another column's string of integers

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related