Subset pandas dataframe using regex

Question

I have a pandas dataframe that looks like :

>>> df
      product   desc
0        ABCD  desc1
1   ABCD1,XYZ  desc2
2      ABCD1H  desc3
3       ABCD1  desc4
4  ABCD1H,LMN  desc5

I want to filter out rows that have products ABCD1 or ABCD1 followed by any other product ID but not ABCD1H. How to filter out such rows. In the above example , I want the output as :

>>> df
          product   desc
    1   ABCD1,XYZ  desc2
    3       ABCD1  desc4

This is what I have tried so far but that does not work .

df2 = df.loc[df['product'].str.contains('ABCD1')]

It also includes ABCD1H in its results, i don't want that to happen.

Scott Boston · Accepted Answer · 2019-08-07 18:26:46Z

2

Use regex "\b" is word break:

df[df['product'].str.contains(r'ABCD1\b')]

Output:

     product   desc
1  ABCD1,XYZ  desc2
3      ABCD1  desc4

answered Aug 7, 2019 at 18:26

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Subset pandas dataframe using regex

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related