Remove rows of dataframe contained in list (without using a loop)

Question

Problem Explanation

I have a dataframe with two columns 'A' and 'B'. I also have a list of tuples where the first element of the tuple is an element in the column 'A', and the second is in the column 'B'. I would like to remove all rows of the dataframe coinciding with the tuples.

Of course, I could just use a loop, but I want a smarter solution that would be faster and cleaner.

Minimal Working Example

import pandas as pd
df = pd.DataFrame(
    {
        'A': ['a', 'b', 'c', 'd', 'a', 'd', 'a', 'c'],
        'B': [4, 2, 2, 1, 3, 4, 3, 2],
    }
)
rows_to_remove = [('a', 4), ('c', 2), ('d', 4), ('a', 3)]

Timeless · Accepted Answer · 2023-01-24 14:44:21Z

3

I would use boolean indexing with pandas.Series.isin :

m = df.agg(tuple, axis=1).isin(rows_to_remove)

df = df.loc[~m]

Output :

print(df)

   A  B
1  b  2
3  d  1

answered Jan 24, 2023 at 14:44

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mozway · Accepted Answer · 2023-01-24 14:49:19Z

2

You can use a merge with indicator:

out = (df.merge(pd.DataFrame(rows_to_remove, columns=['A', 'B']), indicator=True, how='left')
         .query('_merge == "left_only"')
         #.drop(columns='_merge') # commented to see the logic
       )

Output:

   A  B     _merge
1  b  2  left_only
3  d  1  left_only

Or, combined with drop:

idx = (df.merge(pd.DataFrame(rows_to_remove, columns=['A', 'B']), how='left', indicator=True)
         .query('_merge == "both"').index
      )

out = df.drop(idx)

Output:

   A  B
1  b  2
3  d  1

edited Jan 24, 2023 at 14:49

answered Jan 24, 2023 at 14:41

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

Salvatore Daniele Bianco · Accepted Answer · 2023-01-24 16:30:01Z

1

If you import numpy as np you can:

df[~(df.values==np.array(rows_to_remove, dtype=object)[:,None]).any(0).all(-1)]

It is entertaining but also useful because I tested on google-colab and the performance is 274 µs ± 11.6 µs vs 1.44 ms ± 37.7µs from @Timeless 's solution, that I think is the best and it's the same one that came to my mind as soon as I read the question.

It would be interesting to see the difference for bigger DataFrames.

answered Jan 24, 2023 at 16:30

Salvatore Daniele Bianco

2,7562 gold badges11 silver badges25 bronze badges

Comments

Corralien · Accepted Answer · 2023-01-24 14:41:32Z

0

You can create a dataframe from rows_to_remove then append to your original dataframe and remove duplicates:

>>> (pd.concat([df, pd.DataFrame(rows_to_remove, columns=['A', 'B'])])
       .drop_duplicates(['A', 'B'], keep=False))

   A  B
1  b  2
3  d  1

answered Jan 24, 2023 at 14:41

Corralien

121k8 gold badges44 silver badges69 bronze badges

2 Comments

mozway Over a year ago

This might however delete duplicates in df incorrectly ;)

Corralien Over a year ago

Yes if in the original dataframe, there are already dupes I suppose.

Shruthi · Accepted Answer · 2023-01-24 14:45:05Z

0

Hope this helps. you can use 'df.drop' to drop the rows/columns by index. please have a look at the below mentioned:

DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html]

answered Jan 24, 2023 at 14:45

Shruthi

35 bronze badges

Collectives™ on Stack Overflow

Remove rows of dataframe contained in list (without using a loop)

Problem Explanation

Minimal Working Example

5 Answers 5

Comments

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Problem Explanation

Minimal Working Example

5 Answers 5

Comments

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related