0

I am importing a .csv file that someone else made, and they filled in some rows with a '-' character wherever there was missing data. The data frame looks something like this:

        Data1    Data2
0       99       1
1       99       2
2       -        3
3       98       4
4       97       5
5        -       -
6        -       -

Except it's more in the thousands of rows so I don't want to manually search for each row containing a dash to delete it. I have tried the following lines of code but it keeps returning an non altered data frame with the '-' rows still remaining:

import pandas as pd
    
df = pd.read_csv("data.csv")
    
df[df['Data1'] != '-']

print(df)

My logic is here is that any row not containing a '-' should remain, but clearly that's not working. The output I want is:

        Data1    Data2
0       99       1
1       99       2
2       98       4
3       97       5

1 Answer 1

1

The easiest is to use boolean indexing to keep the rows that are not (ne) - in all columns per row:

out = df[df.ne('-').all(axis=1)]

If for some reason you want to drop (e.g. to update in place), you can use:

m = df.eq('-').any(axis=1)
df.drop(df.index[m], inplace=True)

output:

  Data1 Data2
0    99     1
1    99     2
3    98     4
4    97     5
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.