1

I have some strings in a column which originally uses commas as separators from thousands and from decimals and I need to convert this string into a float, how can I do it?

I firstly tried to replace all the commas for dots:

df['min'] = df['min'].str.replace(',', '.')

and tried to convert into float:

df['min']= df['min'].astype(float) 

but it returned me the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-29-5716d326493c> in <module>
----> 1 df['min']= df['min'].astype(float)
      2 #df['mcom']= df['mcom'].astype(float)
      3 #df['max']= df['max'].astype(float)

~\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)
   5544         else:
   5545             # else, only a single dtype is given
-> 5546             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
   5547             return self._constructor(new_data).__finalize__(self, method="astype")
   5548 

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors)
    593         self, dtype, copy: bool = False, errors: str = "raise"
    594     ) -> "BlockManager":
--> 595         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    596 
    597     def convert(

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, **kwargs)
    404                 applied = b.apply(f, **kwargs)
    405             else:
--> 406                 applied = getattr(b, f)(**kwargs)
    407             result_blocks = _extend_blocks(applied, result_blocks)
    408 

~\anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors)
    593             vals1d = values.ravel()
    594             try:
--> 595                 values = astype_nansafe(vals1d, dtype, copy=True)
    596             except (ValueError, TypeError):
    597                 # e.g. astype_nansafe can fail on object-dtype of strings

~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
    993     if copy or is_object_dtype(arr) or is_object_dtype(dtype):
    994         # Explicit copy, or required since NumPy can't view from / to object.
--> 995         return arr.astype(dtype, copy=True)
    996 
    997     return arr.view(dtype)

ValueError: could not convert string to float: '1.199.75'

If it is possible, I would like to remove all dots and commas and then add the dots before the last two characters from the variables before converting into float.

Input:

df['min'].head()
9.50
10.00
3.45
1.095.50
13.25

Expected output:

9.50
10.00
3.45
1095.50
13.25
6
  • so you want to remove all dots and add dot before two characters? Commented Jun 15, 2022 at 13:21
  • df['min'].str.replace('.', '').str.replace(',', '.')? Commented Jun 15, 2022 at 13:21
  • @DemetreDzmanashvili Yes Commented Jun 15, 2022 at 13:24
  • Can you please add an example input and expected output to assist in answering Commented Jun 15, 2022 at 13:24
  • @mozway the dataframe originally has commas as separators from thousands and decimals, this command didn't work Commented Jun 15, 2022 at 13:25

3 Answers 3

1

If you always have 2 decimal digits:

df['min'] = pd.to_numeric(df['min'].str.replace('.', '', regex=False)).div(100)

output (as new column min2 for clarity):

        min     min2
0      9.50     9.50
1     10.00    10.00
2      3.45     3.45
3  1.095.50  1095.50
4     13.25    13.25

Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

df['min'] = df['min'].str.replace(',', '')
df['min'] = df['min'].str[:-2] + '.' + df['min'].str[-2:]

df['min']= df['min'].astype(float) 

Comments

0

I have some strings in a column which originally uses commas as separators from thousands and from decimals and I need to convert this string into a float

So lets produce a reproducible data source which conforms to your description:

df = {'min': '0123,456,78'}

Then splits this on "," into a list:

split_str = df['min'].split(',')

Collects integer and decimal parts separately:

int_str = ''.join(split_str[:-1])
dec_str = split_str[-1]

And finally reconstructs a valid float string; and convert it to an actual float number:

float_number = float(f"{int_str}.{dec_str}")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.