2

I am having trouble understanding the mechanics here given the following.

I have a dataframe reading from a .csv :

  a1 b1 c1
1 aa bb cc
2 ab ba ca 

df.drop(df['a1'].str.contains('aa',case = False))

I want to drop all the rows in column a1 that contain 'aa'

I believe to have attempted everything on here but still get the :

ValueError: labels [False False False ... False False False] not contained in axis

Yes, I have also tried

skipinitialspace=True
axis=1

Any help would be appreciated, thank you.

1
  • 2
    df[~df.a1.str.contains('aa')] Commented May 14, 2018 at 18:12

1 Answer 1

6

str.contains returns a mask:

df['a1'].str.contains('aa',case = False)

1     True
2    False
Name: a1, dtype: bool

However, drop accepts index labels, not boolean masks. If you open up the help on drop, you may observe this first-hand:

?df.drop

Signature: df.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Docstring:
Return new object with labels in requested axis removed.

Parameters
----------
labels : single label or list-like
    Index or column labels to drop.

You could figure out the index labels from the mask and pass those to drop

idx = df.index[df['a1'].str.contains('aa')]
df.drop(idx)

   a1  b1  c1
2  ab  ba  ca

However, this is too windy, so I'd recommend just sticking to the pandaic method of dropping rows based on conditions, boolean indexing:

df[~df['a1'].str.contains('aa')]

   a1  b1  c1
2  ab  ba  ca

If anyone is interested in removing rows that contain strings in a list

df = df[~df['a1'].str.contains('|'.join(my_list))]

Make sure to strip white spaces. Credit to https://stackoverflow.com/a/45681254/9500464

Sign up to request clarification or add additional context in comments.

5 Comments

A trivial speed improvement, if applicable, is to set regex=False.
So you're recommending to drop the mask, basically :-)
Thank you for this! I am still so confused I really need to review this in depth.
@g_altobelli it's pretty straightforward, it needs to know the index labels, because it will remove those. It doesn't accept a boolean mask because it doesn't need to accept one (there's enough indexers that do that already, __getitem__ and loc are two of them).
@cᴏʟᴅsᴘᴇᴇᴅ I just got it thank you!!! This makes so much more sense now. I am not sure why I made this so complicated for myself.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.