A dataframe has two columns. One has a single integer per row. The other has a string of multiple integers, separated by ',', per row:
import pandas as pd
duck_ids = ["1, 4, 5, 7", "3, 11, 14, 27"]
ducks_of_interest = [4,15]
duck_df = pd.DataFrame(
{
"DucksOfInterests": ducks_of_interest,
"DuckIDs": duck_ids
}
)
print(f"The starting dataframe:\n{duck_df}")
DucksOfInterests DuckIDs
0 4 1, 4, 5, 7
1 15 3, 11, 14, 27
A new column is required that returns a True if the Duck of Interest is within the set of Duck IDs. This is attempted using a simple lambda function with the .apply method:
duck_df['DoIinDIDs'] = duck_df.apply(lambda x: str(x['DuckIDs']) in [x['DucksOfInterests']], axis=1)
This was expected to return a True for the first row, as 4 is a number in "1, 4, 5, 7", and False for the second row. However, the result is False for both rows:
print(f"The dataframe with the additional column:\n{duck_df}")
DucksOfInterests DuckIDs DoIinDIDs
0 4 1, 4, 5, 7 False
1 15 3, 11, 14, 27 False
What is the error in the code or the approach?