Convert number strings with commas in pandas DataFrame to float

Question

I have a DataFrame that contains numbers as strings with commas for the thousands marker. I need to convert them to floats.

a = [['1,200', '4,200'], ['7,000', '-0.03'], [ '5', '0']]
df=pandas.DataFrame(a)

I am guessing I need to use locale.atof. Indeed

df[0].apply(locale.atof)

works as expected. I get a Series of floats.

But when I apply it to the DataFrame, I get an error.

df.apply(locale.atof)

TypeError: ("cannot convert the series to ", u'occurred at index 0')

and

df[0:1].apply(locale.atof)

gives another error:

ValueError: ('invalid literal for float(): 1,200', u'occurred at index 0')

So, how do I convert this DataFrame of strings to a DataFrame of floats?

Old question, but the OP is getting that error because apply on a DataFrame passes a whole column to the function as a series (in this case locale.atof, which expects a string). If you use the applymap method that @AndyHayden does in the answer below, you should be able to do this just fine. — T.C. Proctor
– T.C. Proctor, Commented Mar 2, 2018 at 0:36

jezrael · Accepted Answer · 2018-08-09 16:04:11Z

244

If you're reading in from csv then you can use the thousands arg:

df.read_csv('foo.tsv', sep='\t', thousands=',')

This method is likely to be more efficient than performing the operation as a separate step.

You need to set the locale first:

In [ 9]: import locale

In [10]: from locale import atof

In [11]: locale.setlocale(locale.LC_NUMERIC, '')
Out[11]: 'en_GB.UTF-8'

In [12]: df.applymap(atof)
Out[12]:
      0        1
0  1200  4200.00
1  7000    -0.03
2     5     0.00

edited Aug 9, 2018 at 16:04

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

answered Mar 3, 2014 at 2:54

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

pheon Over a year ago

I should have said that I did set the locale. I still get the error.

pheon Over a year ago

But I am using df.read_fwf, and that has the " thousands=',' " option too, which works. Thanks.

rockfakie Over a year ago

I voted this up for the 'thousands' argument tip for the read_csv function. That worked great for me.

VessoVit Over a year ago

I wanted to add that you can also use "decimal=',' " if you're dealing with floats.

Bill Over a year ago

Should be pd.read_csv not df.read_csv.

|

Arkistarvh Kltzuonstev · Accepted Answer · 2019-07-30 09:56:12Z

69

You can convert one column at a time like this :

df['colname'] = df['colname'].str.replace(',', '').astype(float)

edited Jul 30, 2019 at 9:56

Arkistarvh Kltzuonstev

6,9837 gold badges32 silver badges62 bronze badges

answered Jul 19, 2019 at 6:31

ghollah kioko

8316 silver badges3 bronze badges

3 Comments

Cristian Avendaño Over a year ago

With this, I get a Warning: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions willnot be treated as literal strings when regex=True.. No idea why it assumes that regex=True

Panagiotis Kanavos Over a year ago

That's a horrible idea. It will convert 0,2 to 2 instead of 0.2. There's simply no way one can use replacements to parse localized number literals. What about 10,000.0? What about 10.000,00 ?

jlplenio Over a year ago

Thank you, @PanagiotisKanavos. Your comment prevented me from tumbling in this major pitfall and continuing to work with heavily messed up data. pd.Series('0,5').str.replace(',', '').astype(float) returns 5!

shen ke · Accepted Answer · 2018-04-18 09:49:27Z

44

You may use the pandas.Series.str.replace method:

df.iloc[:,:].str.replace(',', '').astype(float)

This method can remove or replace the comma in the string.

answered Apr 18, 2018 at 9:49

shen ke

6836 silver badges7 bronze badges

3 Comments

krassowski Over a year ago

I'm getting "AttributeError: 'DataFrame' object has no attribute 'str'", no idea why...

krassowski Over a year ago

But this works: df.apply(lambda x: x.str.replace(',', '').astype(float), axis=1)

Abimael Domínguez Over a year ago

What if my number have more than one comma? like: "1,099,99", how can I convert it to "'1099.99'"?

Robert Van Ysendyck · Accepted Answer · 2023-01-10 21:34:59Z

0

This will work for strings such as '-55,00' or '5.500,00' and convert them to floats -55.00 and 5500.00, respectively.

df['colname'] = df['colname'].str.replace('.','', regex=True).str.replace(',', '.', regex=True).astype(float)

edited Jan 10, 2023 at 21:34

answered Jan 10, 2023 at 21:34

Robert Van Ysendyck

92 bronze badges

1 Comment

hfs Dec 13, 2024 at 11:18

str.replace('.','', regex=True) results in empty strings. I think you meant regex=False?

Collectives™ on Stack Overflow

Convert number strings with commas in pandas DataFrame to float

4 Answers 4

9 Comments

3 Comments

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

9 Comments

3 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related