Use string matching to populate new dataframe column [duplicate]

Question

I am trying to populate a new column in a pandas dataframe based on whether another column in that row contains a string.

For example, I have a list of possible colors:

possible_colors = ['red', 'blue', 'green', orange', 'purple']

A dataframe contains sales data for a hypothetical product. The product name contains a color in its product code, and I would to create a column labeling that product as its proper color.

df = {'product': ['123red309','20424green098','2purple09183'],
          'sales_qty': [20, 5, 10]}

If the product column contains the string 'green' I want to populate a new column Color with the string 'green'.

I tried doing so with the code:

for color in possible_colors:
    df['Color'] = np.where(df.product.str.contains(color),color)

This gives me the warning ValueError: either both or neither of x and y should be given.

My actual dataframe is of course thousands of rows and not just 3, and my list of possible colors is dozens of items.

How can I properly complete task? Thank you!

anky · Accepted Answer · 2019-03-13 14:39:05Z

1

You can use series.str.extract():

df['color']=df['product'].str.extract(r'({})'.format('|'.join(possible_colors)))
print(df)

         product  sales_qty   color
0      123red309         20     red
1  20424green098          5   green
2   2purple09183         10  purple

Where : r'({})'.format('|'.join(possible_colors)) yeilds: '(red|blue|green|orange|purple)'

answered Mar 13, 2019 at 14:39

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

yatu · Accepted Answer · 2019-03-13 14:39:24Z

1

Here is one way:

df['color'] = df['product'].apply(lambda x: ''.join(i for i in possible_colors 
                                                    if i in x) or None)

       product     sales_qty   color
0      123red309         20     red
1  20424green098          5   green
2   2purple09183         10  purple

answered Mar 13, 2019 at 14:39

yatu

88.7k12 gold badges93 silver badges148 bronze badges

Collectives™ on Stack Overflow

Use string matching to populate new dataframe column [duplicate]

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related