0

I want to convert columns in DataFrame from OBJECT to INT. I need to completely delete the lines that contain the string.

The following expression "saves" the data I care about and converts the column from the OBJECT to INT type:

df["column name"] = df["column name"].astype(str).str.replace(r'/\d+$', '').astype(int)

However,before this, rows that contain letters (A-Z) I want to delete completely.

I tried:

df[~df["column name"].str.lower().str.startswith('A-Z')]

Also I tried a few other expressions, however, no data cleans.

DataFrame looks something like this:

          A         B         C
0       8161       0454   9600
1 -     3780       1773   1450
2       2564       0548   5060
3       1332       9179   2040
4       6010       3263   1050
5   I Forgot       7849   1400/10000

Col C - 1400/10000 - The first expression I wrote simply removes "/ 10000" and remains "1400"

Now I need to remove the word expressions as in the "A5"

1
  • would you care to share a sample from your data ? Commented Jul 24, 2019 at 8:10

1 Answer 1

1

Using regular expression you can create a mask for all rows that contains a character between [a-z]. Then you can drop this rows. Like this:

mask = df['a'].str.lower().str.contains("[a-z]")
idx = df.index[mask]
df = df.drop(idx, axis=0)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.