error using astype when NaN exists in a dataframe

Question

df
     A     B  
0   a=10   b=20.10
1   a=20   NaN
2   NaN    b=30.10
3   a=40   b=40.10

I tried :

df['A'] = df['A'].str.extract('(\d+)').astype(int)
df['B'] = df['B'].str.extract('(\d+)').astype(float)

But I get the following error:

ValueError: cannot convert float NaN to integer

And:

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

How do I fix this ?

Firstly NaN can only be represented by float so you can't cast to int in that case, second if you have mixed dtypes for instance string and some other thing then using ``str.extract` will fail, although mixed dtypes are supported, it's not a good idea as it leads to errors. You should decide what the final dtype should be and replace the missing values that makes sense to you — EdChum
– EdChum, Commented Jan 9, 2017 at 15:02

jezrael · Accepted Answer · 2017-01-09 15:09:47Z

82

If some values in column are missing (NaN) and then converted to numeric, always dtype is float. You cannot convert values to int. Only to float, because type of NaN is float.

print (type(np.nan))
<class 'float'>

See docs how convert values if at least one NaN:

integer > cast to float64

If need int values you need replace NaN to some int, e.g. 0 by fillna and then it works perfectly:

df['A'] = df['A'].str.extract('(\d+)', expand=False)
df['B'] = df['B'].str.extract('(\d+)', expand=False)
print (df)
     A    B
0   10   20
1   20  NaN
2  NaN   30
3   40   40

df1 = df.fillna(0).astype(int)
print (df1)
    A   B
0  10  20
1  20   0
2   0  30
3  40  40

print (df1.dtypes)
A    int32
B    int32
dtype: object

edited Jan 9, 2017 at 15:09

answered Jan 9, 2017 at 14:59

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sander van den Oord · Accepted Answer · 2021-01-05 14:25:29Z

39

From pandas >= 0.24 there is now a built-in pandas integer.
This does allow integer nan's, so you don't need to fill na's.
Notice the capital in 'Int64' in the code below.
This is the pandas integer, instead of the numpy integer.

You need to use: .astype('Int64')

So, do this:

df['A'] = df['A'].str.extract('(\d+)', expand=False).astype('float').astype('Int64')
df['B'] = df['B'].str.extract('(\d+)', expand=False).astype('float').astype('Int64')

More info on pandas integer na values:
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions

answered Jan 5, 2021 at 14:25

Sander van den Oord

13.1k5 gold badges72 silver badges126 bronze badges

Collectives™ on Stack Overflow

error using astype when NaN exists in a dataframe

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related