3

With this sample dataframe:

>>> d = pd.DataFrame({'si': ['1', '2', 'NA'], 's': ['a', 'b', 'c']})

>>> d.dtypes
#
si    object
s     object
dtype: object

My first attempt was to use astype and the 'Int64' NA aware int type, but I got a

traceback

>>> d.si.astype('Int64')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-144-ed289e0c95aa> in <module>
----> 1 d.si.astype('Int64')
...

then I try the to_numeric method:

pandas to_numeric integer downcast cast floats

In [112]: d.loc[:, 'ii'] = pd.to_numeric(d.si, errors='coerce', downcast='integer')

In [113]: d.dtypes
Out[113]: 
si     object
s      object
ii    float64
dtype: object

In [114]: d
Out[114]: 
    si  s   ii
0    1  a  1.0
1    2  b  2.0
2   NA  c  NA

In the above I expect to have ii column with integers and integer nan

Documentation say:

downcast : {'integer', 'signed', 'unsigned', 'float'}, default None
    If not None, and if the data has been successfully cast to a
    numerical dtype (or if the data was numeric to begin with),
    downcast that resulting data to the smallest numerical dtype
    possible according to the following rules:

    - 'integer' or 'signed': smallest signed int dtype (min.: np.int8)
    - 'unsigned': smallest unsigned int dtype (min.: np.uint8)
    - 'float': smallest float dtype (min.: np.float32)
2
  • The key word is possible. With errors='coerce', you expect things to get set to NaN, which is a float, so that's the smallest type possible. If you want you can .astype('Int64') to make it the nullable integer type Commented Jun 6, 2022 at 16:55
  • Thank you @Alollz, I've updated my question to illustrate that .astype on this simple dataframe traceback Commented Jun 6, 2022 at 17:32

2 Answers 2

4

Unfortunately, pandas is still adapting/transitioning to fully supporting integer NaN. For that, you have to explicitly convert it to Int64 after your pd.to_numeric operation.

No need to downcast.

# Can also use `'Int64' as dtype below.
>>> pd.to_numeric(df['col'], errors='coerce').astype(pd.Int64Dtype())

# or

>>> pd.to_numeric(df['col'], errors='coerce').astype('Int64')

0       1
1       2
2       3
3    <NA>
Name: col, dtype: Int64
Sign up to request clarification or add additional context in comments.

Comments

2

You have errors='coerce' set, and the documentation for that option says (emphasis mine):

errors : {'ignore', 'raise', 'coerce'}, default 'raise'

  • If 'raise', then invalid parsing will raise an exception.
  • If 'coerce', then invalid parsing will be set as NaN.
  • If 'ignore', then invalid parsing will return the input.

Since your si column contains NaNs, you can't convert it to an integer column because NaN is a float, and therefore all other values in the column are upcasted to the float64 dtype.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.