160

I have a DataFrame that contains numbers as strings with commas for the thousands marker. I need to convert them to floats.

a = [['1,200', '4,200'], ['7,000', '-0.03'], [ '5', '0']]
df=pandas.DataFrame(a)

I am guessing I need to use locale.atof. Indeed

df[0].apply(locale.atof)

works as expected. I get a Series of floats.

But when I apply it to the DataFrame, I get an error.

df.apply(locale.atof)

TypeError: ("cannot convert the series to ", u'occurred at index 0')

and

df[0:1].apply(locale.atof)

gives another error:

ValueError: ('invalid literal for float(): 1,200', u'occurred at index 0')

So, how do I convert this DataFrame of strings to a DataFrame of floats?

1
  • 2
    Old question, but the OP is getting that error because apply on a DataFrame passes a whole column to the function as a series (in this case locale.atof, which expects a string). If you use the applymap method that @AndyHayden does in the answer below, you should be able to do this just fine. Commented Mar 2, 2018 at 0:36

4 Answers 4

244

If you're reading in from csv then you can use the thousands arg:

df.read_csv('foo.tsv', sep='\t', thousands=',')

This method is likely to be more efficient than performing the operation as a separate step.


You need to set the locale first:

In [ 9]: import locale

In [10]: from locale import atof

In [11]: locale.setlocale(locale.LC_NUMERIC, '')
Out[11]: 'en_GB.UTF-8'

In [12]: df.applymap(atof)
Out[12]:
      0        1
0  1200  4200.00
1  7000    -0.03
2     5     0.00
Sign up to request clarification or add additional context in comments.

9 Comments

I should have said that I did set the locale. I still get the error.
But I am using df.read_fwf, and that has the " thousands=',' " option too, which works. Thanks.
I voted this up for the 'thousands' argument tip for the read_csv function. That worked great for me.
I wanted to add that you can also use "decimal=',' " if you're dealing with floats.
Should be pd.read_csv not df.read_csv.
|
69

You can convert one column at a time like this :

df['colname'] = df['colname'].str.replace(',', '').astype(float)

3 Comments

With this, I get a Warning: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions willnot be treated as literal strings when regex=True.. No idea why it assumes that regex=True
That's a horrible idea. It will convert 0,2 to 2 instead of 0.2. There's simply no way one can use replacements to parse localized number literals. What about 10,000.0? What about 10.000,00 ?
Thank you, @PanagiotisKanavos. Your comment prevented me from tumbling in this major pitfall and continuing to work with heavily messed up data. pd.Series('0,5').str.replace(',', '').astype(float) returns 5!
44

You may use the pandas.Series.str.replace method:

df.iloc[:,:].str.replace(',', '').astype(float)

This method can remove or replace the comma in the string.

3 Comments

I'm getting "AttributeError: 'DataFrame' object has no attribute 'str'", no idea why...
But this works: df.apply(lambda x: x.str.replace(',', '').astype(float), axis=1)
What if my number have more than one comma? like: "1,099,99", how can I convert it to "'1099.99'"?
0

This will work for strings such as '-55,00' or '5.500,00' and convert them to floats -55.00 and 5500.00, respectively.

df['colname'] = df['colname'].str.replace('.','', regex=True).str.replace(',', '.', regex=True).astype(float)

1 Comment

str.replace('.','', regex=True) results in empty strings. I think you meant regex=False?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.