Convert whole Pandas dataframe containing NaN values from string to float

Question

I would like to convert all the values in a pandas dataframe from strings to floats. My dataframe contains various NaN values (e.g. NaN, NA, None). For example,

import pandas as pd
import numpy as np

my_data = np.array([[0.5, 0.2, 0.1], ["NA", 0.45, 0.2], [0.9, 0.02, "N/A"]])
df = pd.DataFrame(my_data, dtype=str)

I have found here and here (among other places) that convert_objects might be the way to go. However, I get a message that it is deprecated (I am using Pandas 0.17.1) and should instead use to_numeric.

df2 = df.convert_objects(convert_numeric=True)

Output:

FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

But to_numeric doesn't seem to actually convert the strings.

df3 = pd.to_numeric(df, errors='force')

Output:

df2:
     0     1    2
0  0.5  0.20  0.1
1  NaN  0.45  0.2
2  0.9  0.02  NaN

df2 dtypes:
0    float64
1    float64
2    float64
dtype: object

df3:
     0     1    2
0  0.5   0.2  0.1
1   NA  0.45  0.2
2  0.9  0.02  N/A

df3 dtypes:
0    object
1    object
2    object
dtype: object

Should I use convert_objects and deal with the warning message, or is there a proper way to do what I want with to_numeric?

EdChum · Accepted Answer · 2017-10-17 08:56:46Z

2

Strangely this works:

In [11]:
df.apply(lambda x: pd.to_numeric(x, errors='force'))

Out[11]:
     0     1    2
0  0.5  0.20  0.1
1  NaN  0.45  0.2
2  0.9  0.02  NaN

It seems that it's not able to coerce the entire df for some reason which is a little surprising

If you hate typing (thanks to @Zero) then you can just use:

df.apply(pd.to_numeric, errors='force')

edited Oct 17, 2017 at 8:56

answered Mar 11, 2016 at 20:44

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Stop harming Monica Over a year ago

Look at the docstring, to_numeric() expects 1D input. You wouldn't expect that from the message associated to the `FutureWarning.

EdChum Over a year ago

@Goyo yes that's true the convert_objects method was a df method the top level to_numeric only works on Series, tuples, lists and arrays.

Lev Over a year ago

Yes, it would be nice if to_numeric worked on DataFrames in addition to 1D input. Thanks for your thoughts.

Zero Over a year ago

Shorthand as df.apply(pd.to_numeric, errors='force')

EdChum Over a year ago

@Zero sure I can add this, personally I get into the habit of declaring lambda x as it's clearer when reading

jezrael · Accepted Answer · 2016-03-11 20:54:59Z

2

You can try replace and astype:

import pandas as pd
import numpy as np

my_data = np.array([[0.5, 0.2, 0.1], ["NA", 0.45, 0.2], [0.9, 0.02, "N/A"]])
df = pd.DataFrame(my_data, dtype=str)

print df.replace({r'N': np.nan}, regex=True).astype(float)
     0     1    2
0  0.5  0.20  0.1
1  NaN  0.45  0.2
2  0.9  0.02  NaN

edited Mar 11, 2016 at 20:54

answered Mar 11, 2016 at 20:45

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Convert whole Pandas dataframe containing NaN values from string to float

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related