Deleting data in pandas given a string condition

Question

I am having trouble understanding the mechanics here given the following.

I have a dataframe reading from a .csv :

  a1 b1 c1
1 aa bb cc
2 ab ba ca 

df.drop(df['a1'].str.contains('aa',case = False))

I want to drop all the rows in column a1 that contain 'aa'

I believe to have attempted everything on here but still get the :

ValueError: labels [False False False ... False False False] not contained in axis

Yes, I have also tried

skipinitialspace=True
axis=1

Any help would be appreciated, thank you.

df[~df.a1.str.contains('aa')]

BENY
– BENY

2018-05-14 18:12:36 +00:00
Commented May 14, 2018 at 18:12 — BENY
– BENY, Commented May 14, 2018 at 18:12

satoshi · Accepted Answer · 2018-05-14 20:39:44Z

6

str.contains returns a mask:

df['a1'].str.contains('aa',case = False)

1     True
2    False
Name: a1, dtype: bool

However, drop accepts index labels, not boolean masks. If you open up the help on drop, you may observe this first-hand:

?df.drop

Signature: df.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Docstring:
Return new object with labels in requested axis removed.

Parameters
----------
labels : single label or list-like
    Index or column labels to drop.

You could figure out the index labels from the mask and pass those to drop

idx = df.index[df['a1'].str.contains('aa')]
df.drop(idx)

   a1  b1  c1
2  ab  ba  ca

However, this is too windy, so I'd recommend just sticking to the pandaic method of dropping rows based on conditions, boolean indexing:

df[~df['a1'].str.contains('aa')]

   a1  b1  c1
2  ab  ba  ca

If anyone is interested in removing rows that contain strings in a list

df = df[~df['a1'].str.contains('|'.join(my_list))]

Make sure to strip white spaces. Credit to https://stackoverflow.com/a/45681254/9500464

edited May 14, 2018 at 20:39

satoshi

4394 silver badges16 bronze badges

answered May 14, 2018 at 18:12

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

jpp Over a year ago

A trivial speed improvement, if applicable, is to set regex=False.

Ami Tavory Over a year ago

So you're recommending to drop the mask, basically :-)

satoshi Over a year ago

Thank you for this! I am still so confused I really need to review this in depth.

cs95 Over a year ago

@g_altobelli it's pretty straightforward, it needs to know the index labels, because it will remove those. It doesn't accept a boolean mask because it doesn't need to accept one (there's enough indexers that do that already, __getitem__ and loc are two of them).

satoshi Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ I just got it thank you!!! This makes so much more sense now. I am not sure why I made this so complicated for myself.

Collectives™ on Stack Overflow

Deleting data in pandas given a string condition

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related