0

Problem Explanation

I have a dataframe with two columns 'A' and 'B'. I also have a list of tuples where the first element of the tuple is an element in the column 'A', and the second is in the column 'B'. I would like to remove all rows of the dataframe coinciding with the tuples.

Of course, I could just use a loop, but I want a smarter solution that would be faster and cleaner.

Minimal Working Example

import pandas as pd
df = pd.DataFrame(
    {
        'A': ['a', 'b', 'c', 'd', 'a', 'd', 'a', 'c'],
        'B': [4, 2, 2, 1, 3, 4, 3, 2],
    }
)
rows_to_remove = [('a', 4), ('c', 2), ('d', 4), ('a', 3)]

5 Answers 5

3

I would use boolean indexing with pandas.Series.isin :

m = df.agg(tuple, axis=1).isin(rows_to_remove)

df = df.loc[~m]

Output :

print(df)

   A  B
1  b  2
3  d  1
Sign up to request clarification or add additional context in comments.

Comments

2

You can use a merge with indicator:

out = (df.merge(pd.DataFrame(rows_to_remove, columns=['A', 'B']), indicator=True, how='left')
         .query('_merge == "left_only"')
         #.drop(columns='_merge') # commented to see the logic
       )

Output:

   A  B     _merge
1  b  2  left_only
3  d  1  left_only

Or, combined with drop:

idx = (df.merge(pd.DataFrame(rows_to_remove, columns=['A', 'B']), how='left', indicator=True)
         .query('_merge == "both"').index
      )

out = df.drop(idx)

Output:

   A  B
1  b  2
3  d  1

Comments

1

If you import numpy as np you can:

df[~(df.values==np.array(rows_to_remove, dtype=object)[:,None]).any(0).all(-1)]

It is entertaining but also useful because I tested on google-colab and the performance is 274 µs ± 11.6 µs vs 1.44 ms ± 37.7µs from @Timeless 's solution, that I think is the best and it's the same one that came to my mind as soon as I read the question.

It would be interesting to see the difference for bigger DataFrames.

Comments

0

You can create a dataframe from rows_to_remove then append to your original dataframe and remove duplicates:

>>> (pd.concat([df, pd.DataFrame(rows_to_remove, columns=['A', 'B'])])
       .drop_duplicates(['A', 'B'], keep=False))

   A  B
1  b  2
3  d  1

2 Comments

This might however delete duplicates in df incorrectly ;)
Yes if in the original dataframe, there are already dupes I suppose.
0

Hope this helps. you can use 'df.drop' to drop the rows/columns by index. please have a look at the below mentioned:

DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.