I am trying to populate a new column in a pandas dataframe based on whether another column in that row contains a string.
For example, I have a list of possible colors:
possible_colors = ['red', 'blue', 'green', orange', 'purple']
A dataframe contains sales data for a hypothetical product. The product name contains a color in its product code, and I would to create a column labeling that product as its proper color.
df = {'product': ['123red309','20424green098','2purple09183'],
'sales_qty': [20, 5, 10]}
If the product column contains the string 'green' I want to populate a new column Color with the string 'green'.
I tried doing so with the code:
for color in possible_colors:
df['Color'] = np.where(df.product.str.contains(color),color)
This gives me the warning ValueError: either both or neither of x and y should be given.
My actual dataframe is of course thousands of rows and not just 3, and my list of possible colors is dozens of items.
How can I properly complete task? Thank you!