I have a dataframe with a bunch of floats and numeric values but there are some rows with characters mixed inbetween that I'm trying to remove. I've converted my entire dataframe to strings with data = data.astype(str) and I've tried using X = X[X.var1.isalpha()] but it gives me an error 'Series' object has no attribute 'isalpha'. Thanks.
-
1please share source data and expected output. It will help address the problemJoe Ferndz– Joe Ferndz2021-03-03 02:15:33 +00:00Commented Mar 3, 2021 at 2:15
Add a comment
|
3 Answers
IIUC this is what you want:
df = pd.DataFrame({'a':['1','x',2,7], 'b':[2,3,'y',8]})
# a b
#0 1 2
#1 x 3
#2 2 y
#3 7 8
df.apply(pd.to_numeric, errors = "coerce").dropna()
# a b
#0 1.0 2.0
#3 7.0 8.0
1 Comment
opticwiznub
Yes something like that is good, I should've put an input and output but I'm very new to stackoverflow and not sure how to write it.
Based on this you need to call .str on the series you want to check, i.e.
X.var1.str.isalpha()
Comments
You can use pd.Series.str.replace, which accepts regex, i.e.:
new = df.astype(str).apply(lambda x: x.str.replace('[a-zA-Z]+', '')).astype(float)
3 Comments
Pablo C
I'm pretty sure this doesn't work (the
astype part). Also, OP is asking to remove rows, not charsfsl
What's wrong with
astype? Good point on the rows though.Pablo C
You'll have some empty string (
'') which will give an error while casting to float. It would be something like float('')