1

Given a DataFrame df:

   id                                        items
0   1  CRB Spot Index: Grease: First 2 Months: YoY
1   2                     CRB Spot Index: Textiles
2   3                    CRB Spot Index: Livestock
3   4      Korea: Export Value: First 10 Days: YoY
4   5      Korea: Export Value: First 20 Days: YoY
5   6      Korea: Import Value: First 20 Days: YoY
6   7      Korea: Import Value: First 10 Days: YoY
7   8    Vietnam: Import Value: First 2 Weeks: MoM
8   9    Vietnam: Import Value: First 3 Weeks: MoM

I want to filter items which match patterns: First + integer + Days or First + integer + Weeks, then create a new column to indicate them:

   id                                        items indicator
0   1  CRB Spot Index: Grease: First 2 Months: YoY       NaN
1   2                     CRB Spot Index: Textiles       NaN
2   3                    CRB Spot Index: Livestock       NaN
3   4      Korea: Export Value: First 10 Days: YoY         Y
4   5      Korea: Export Value: First 20 Days: YoY         Y
5   6      Korea: Import Value: First 20 Days: YoY         Y
6   7      Korea: Import Value: First 10 Days: YoY         Y
7   8    Vietnam: Import Value: First 2 Weeks: MoM         Y
8   9    Vietnam: Import Value: First 3 Weeks: MoM         Y

How could I do that using Pandas?

I use df.loc[(df['items'].str.contains('First|Days', na=False)) & (df['items'].str.contains('First|Weeks', na=False)), 'indicators']='Y', it doesn't generate an expected result as shown above, also maybe there are other more concise answers.

1 Answer 1

1

Check regex for match First, then \s+ for space(s), then \d+ for integers, \s+ for again space(s) and last Days|Weeks for Days or Weeks:

df.loc[df['items'].str.contains('First\s+\d+\s+Days|Weeks', na=False), 'indicators']='Y'
print (df)
   id                                        items indicators
0   1  CRB Spot Index: Grease: First 2 Months: YoY        NaN
1   2                     CRB Spot Index: Textiles        NaN
2   3                    CRB Spot Index: Livestock        NaN
3   4      Korea: Export Value: First 10 Days: YoY          Y
4   5      Korea: Export Value: First 20 Days: YoY          Y
5   6      Korea: Import Value: First 20 Days: YoY          Y
6   7      Korea: Import Value: First 10 Days: YoY          Y
7   8    Vietnam: Import Value: First 2 Weeks: MoM          Y
8   9    Vietnam: Import Value: First 3 Weeks: MoM          Y
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, is it necessary to add regex=True or it's optional for this case?
@ahbon - regex=True is default value in Series.str.contains, so should be removed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.