wu=pd.DataFrame({'a':['hhh',2,3,4,5],'b':[1,2,np.nan,np.nan,5]}
I want to delete the row with 'hhh', because all datas in 'a' are numbers. The original data size is huge. Thank you very much.
Option 1
Convert a using pd.to_numeric
df.a = pd.to_numeric(df.a, errors='coerce')
df
a b
0 NaN 1.0
1 2.0 2.0
2 3.0 NaN
3 4.0 NaN
4 5.0 5.0
Non-Numeric columns are coerced to NaN. You can then drop this row -
df.dropna(subset=['a'])
a b
1 2.0 2.0
2 3.0 NaN
3 4.0 NaN
4 5.0 5.0
Option 2
Another alternative is using str.isdigit -
df.a.str.isdigit()
0 False
1 NaN
2 NaN
3 NaN
4 NaN
Name: a, dtype: object
Filter as such -
df[df.a.str.isdigit().isnull()]
a b
1 2 2.0
2 3 NaN
3 4 NaN
4 5 5.0
Notes -
If the numbers are also as strings, then drop the isnull bit -
df[df.a.str.isdigit()]
pd.to_numeric(df.a, errors='coerce').notnull() which would work for floats too. Right?import pandas as pd
import numpy as np
wu=pd.DataFrame({'a':['hhh',2,3,4,5],'b':[1,2,np.nan,np.nan,5]})
#wu = wu[wu.a.str.contains('\d+',na=False)]
#wu = wu[wu.a.apply(lambda x: x.isnumeric())]
wu = wu[wu.a.apply(lambda x: isinstance(x, (int, np.int64)))]
print(wu)
Note that you missed out a closing parenthesis when creating your DataFrame.
I tried 3 ways, but only the third one worked. You can always try the other ones (commented out) if that works for you. Do let me know if it works on the larger dataset.
apply is a convenience function that hides a loop. If there are N ways to solve the problem, apply is quite consistently the slowest of them. It provides no vectorisation, and does not assume anything about your function. Furthermore, it has a lot of overhead (just look at the source code), so quite often a simple python loop doing the same thing is faster.