Delete rows from pandas dataframe by using regex

Question

This is my dataframe:

df = pd.DataFrame(
    {
        'a': [
            '#x{LA 0.098:abc}',
            '#x{LA abc:0.31}',
            '#x{BC abc:0.1231}',
            '#x{LA 0.333:abc}',
            '#x{CN 0.031:abc}',
            '#x{YM abc:12345}',
            '#x{YM 1222:abc}',
        ]
    }
)

I have two list of ids that are needed in order to delete rows based on the postion of "abc" from the colon. That is whether abc is on the right side of colon or left side. These are my lists:

labels_that_abc_is_right = ['LA', 'CN']
labels_that_abc_is_left = ['YM', 'BC']

For example I want to omit rows that contain LA and abc is on the right side of colon. The same applies for CN. I want to delete rows that contain YM and abc is on the left side of colon. This is just a sample. I have hundreds of Ids. This is the output that I want after deleting rows:

                 a
1    #x{LA abc:0.31}
6    #x{YM 1222:abc}

I have tried the solutions of these two answers: answer1 and answer2. And I know that I probably need to use df.a.str.contains with a regex. But it still doesn't work

Hanwei Tang · Accepted Answer · 2023-01-19 03:03:43Z

1

Form your criteria into regex first, then do the data filtering:

regex_right = r'\b(LA|CN)\b.+\b:abc\b'
regex_left = r'\b(YM|BC)\b.+\babc:\b'
df[~(df['a'].str.contains(regex_right, regex=True) | df['a'].str.contains(regex_left, regex=True))]

answered Jan 19, 2023 at 3:03

Hanwei Tang

3123 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Delete rows from pandas dataframe by using regex

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related