0

This is my dataframe:

df = pd.DataFrame(
    {
        'a': [
            '#x{LA 0.098:abc}',
            '#x{LA abc:0.31}',
            '#x{BC abc:0.1231}',
            '#x{LA 0.333:abc}',
            '#x{CN 0.031:abc}',
            '#x{YM abc:12345}',
            '#x{YM 1222:abc}',
        ]
    }
)

I have two list of ids that are needed in order to delete rows based on the postion of "abc" from the colon. That is whether abc is on the right side of colon or left side. These are my lists:

labels_that_abc_is_right = ['LA', 'CN']
labels_that_abc_is_left = ['YM', 'BC']

For example I want to omit rows that contain LA and abc is on the right side of colon. The same applies for CN. I want to delete rows that contain YM and abc is on the left side of colon. This is just a sample. I have hundreds of Ids. This is the output that I want after deleting rows:

                 a
1    #x{LA abc:0.31}
6    #x{YM 1222:abc}

I have tried the solutions of these two answers: answer1 and answer2. And I know that I probably need to use df.a.str.contains with a regex. But it still doesn't work

1 Answer 1

1

Form your criteria into regex first, then do the data filtering:

regex_right = r'\b(LA|CN)\b.+\b:abc\b'
regex_left = r'\b(YM|BC)\b.+\babc:\b'
df[~(df['a'].str.contains(regex_right, regex=True) | df['a'].str.contains(regex_left, regex=True))]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.