I have a dataframe like this,
col1 col2 col3
1 apple a,b
2 car c
3 dog a,c
4 dog NaN
I tried to create three new columns, a,b and c, which give '1' if it contains a specific string, otherwise, '0'.
df['a']= np.where(df['col3'].str.contains('a'),1,0)
df['b']= np.where(df['col3'].str.contains('b'),1,0)
df['c']= np.where(df['col3'].str.contains('c'),1,0)
But it seems NaN values were not handled correctly. It gives me a result like,
col1 col2 col3 a b c
1 apple a,b 1 1 0
2 car c 0 0 1
3 dog a,c 1 0 1
4 dog NaN 1 1 1
It should be all '0's in the 4th row. How can I change my code to get the right answer?
dropnawas done first.df.join(df['col2'].str.get_dummies(','))get_dummies, but for the sake of your question,NaNs areTruevalues, so don't trust numpy judgement on that - explicitly fill at the end to avoid ambiguity:df.col2.str.contains('a').fillna(False)NaNs are True can be found on the docs - you have very limited number of objects which are cast toFalse, and all rest isTrue