6

I would like to convert all the values in a pandas dataframe from strings to floats. My dataframe contains various NaN values (e.g. NaN, NA, None). For example,

import pandas as pd
import numpy as np

my_data = np.array([[0.5, 0.2, 0.1], ["NA", 0.45, 0.2], [0.9, 0.02, "N/A"]])
df = pd.DataFrame(my_data, dtype=str)

I have found here and here (among other places) that convert_objects might be the way to go. However, I get a message that it is deprecated (I am using Pandas 0.17.1) and should instead use to_numeric.

df2 = df.convert_objects(convert_numeric=True)

Output:

FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

But to_numeric doesn't seem to actually convert the strings.

df3 = pd.to_numeric(df, errors='force')

Output:

df2:
     0     1    2
0  0.5  0.20  0.1
1  NaN  0.45  0.2
2  0.9  0.02  NaN

df2 dtypes:
0    float64
1    float64
2    float64
dtype: object

df3:
     0     1    2
0  0.5   0.2  0.1
1   NA  0.45  0.2
2  0.9  0.02  N/A

df3 dtypes:
0    object
1    object
2    object
dtype: object

Should I use convert_objects and deal with the warning message, or is there a proper way to do what I want with to_numeric?

2 Answers 2

2

Strangely this works:

In [11]:
df.apply(lambda x: pd.to_numeric(x, errors='force'))

Out[11]:
     0     1    2
0  0.5  0.20  0.1
1  NaN  0.45  0.2
2  0.9  0.02  NaN

It seems that it's not able to coerce the entire df for some reason which is a little surprising

If you hate typing (thanks to @Zero) then you can just use:

df.apply(pd.to_numeric, errors='force')
Sign up to request clarification or add additional context in comments.

5 Comments

Look at the docstring, to_numeric() expects 1D input. You wouldn't expect that from the message associated to the `FutureWarning.
@Goyo yes that's true the convert_objects method was a df method the top level to_numeric only works on Series, tuples, lists and arrays.
Yes, it would be nice if to_numeric worked on DataFrames in addition to 1D input. Thanks for your thoughts.
Shorthand as df.apply(pd.to_numeric, errors='force')
@Zero sure I can add this, personally I get into the habit of declaring lambda x as it's clearer when reading
2

You can try replace and astype:

import pandas as pd
import numpy as np

my_data = np.array([[0.5, 0.2, 0.1], ["NA", 0.45, 0.2], [0.9, 0.02, "N/A"]])
df = pd.DataFrame(my_data, dtype=str)

print df.replace({r'N': np.nan}, regex=True).astype(float)
     0     1    2
0  0.5  0.20  0.1
1  NaN  0.45  0.2
2  0.9  0.02  NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.