1

So I have been trying to figure out how to write the simplest if statement in order to check if the string "A" exist in the rootID and "B" exist in the parentID in any of the rows. I then want to remove that row. In the following dataframe I would have wanted to remove row 0 in that case.

                     rootID   parentID    jobID  time
                  0    A         B          D    2019-01-30 14:33:21.339469
                  1    E         F          G    2019-01-30 14:33:21.812381
                  2    A         C          D    2019-01-30 15:33:21.812381
                  3    E         E          F    2019-01-30 15:33:21.812381
                  4    E         F          G    2019-01-30 16:33:21.812381

I know how to check if one element exists such as

   if df['rootID'].str.contains("A").any()

but how do I do it when I need to check for two different strings in two columns?

1 Answer 1

2

Use boolean indexing with masks chained by | for bitwise OR and ~ for invert boolean masks.

If need check substrings:

m1 = ~df['rootID'].str.contains("A")
m2 = ~df['parentID'].str.contains("B")

If need check strings use Series.ne:

m1 = df['rootID'].ne("A")
m2 = df['parentID'].ne("B")

#alternatives
#m1 = df['rootID'] != "A"
#m2 = df['parentID'] != "B"

df = df[m1 | m2]

print (df)
  rootID parentID jobID                        time
1      E        F     G  2019-01-30 14:33:21.812381
2      A        C     D  2019-01-30 15:33:21.812381
3      E        E     F  2019-01-30 15:33:21.812381
4      E        F     G  2019-01-30 16:33:21.812381

Another solution:

df = df.query('rootID != "A" | parentID != "B"')
Sign up to request clarification or add additional context in comments.

10 Comments

Thank you once again for helping me. How do I afterwards select that row and delete it from the original dataframe?
So in order to implement this correctly. Do I first define m1 and m2 and then afterwards test it by "if m1 and m2:"?
@Kspr - No, you can use df = df[~df['rootID'].str.contains("A") | ~df['parentID'].str.contains("B")] and similar, only if more conditions it is more readable
The thing is I need to check every time this occurs and increment a timer every time. So I would want an if case, so that I can increment the counter and then remove the row.
why don't you just count how many such rows are there? (hint: the booleans created can answer that easily) @Kspr
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.