1

I want to delete rows who have certain values. The values that I want to delete have a "+" and are as follows:

cooperative+parallel
passive+prosocial

My dataset consists of 900000 rows, and about 2000 values contain the problem I mentioned.

I want the code something like this:

df = df[df.columnname != '+']

The above is for one column (its not working well) but I would also like one example for whole dataset.

I prefer the solution in Pandas.

Many thanks

1
  • 1
    df = df[~df.columnname.str.contains('+')] Commented Nov 19, 2020 at 8:29

2 Answers 2

3

Use Series.str.contains with invert mask by ~ and escape +, because special regex character with DataFrame.apply for all object columns selected by DataFrame.select_dtypes with DataFrame.any for test at least one match:

df1 = df[~df.select_dtypes(object).apply(lambda x: x.str.contains('\+')).any(axis=1)]

Or use regex=False:

df1 = df[~df.select_dtypes(object).apply(lambda x: x.str.contains('\+', regex=False)).any(axis=1)]
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this works for one column. Could you please provide me an example for the full dataset also?
0
df = df[~df['columnname'].str.contains('+', regex=False)]

documentation is here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.