ValueError: could not convert string to float: id

Question

I'm running the following Python script:

#!/usr/bin/python

import os,sys
from scipy import stats
import numpy as np

f = open('data2.txt', 'r').readlines()
for i in range(0, len(f)-1):
    l1 = f[i].split()
    list1 = [float(x) for x in l1]

But I got the error below:

ValueError: could not convert string to float: id

I'm confused by this because when I try this for only one line in interactive section, instead of for loop using script, it works well:

from scipy import stats
import numpy as np

f = open('data2.txt','r').readlines()
l1 = f[1].split()
list1 = [float(x) for x in l1]
list1
# [5.3209183842, 4.6422726719, 4.3788135547]

What is the explanation a little bit about this?

This kind of error ValueError: could not convert string to float: can occur when reading a dataframe from a csv file with types as df = df[['p']].astype({'p': float}). If the csv was recorded with empty spaces, python will not recognize the space character as a nan. You will need to overwrite empty cells with NaN with df = df.replace(r'^\s*$', np.nan, regex=True) — Alfred Wallace
– Alfred Wallace, Commented Apr 21, 2021 at 14:55

Anurag Uniyal · Accepted Answer · 2015-01-04 21:07:27Z

73

Obviously some of your lines don't have valid float data, specifically some line have text id which can't be converted to float.

When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.

#!/usr/bin/python

import os,sys
from scipy import stats
import numpy as np

f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    try:
        list1=[float(x) for x in l1]
        list2=[float(x) for x in l2]
    except ValueError,e:
        print "error",e,"on line",i
    result=stats.ttest_ind(list1,list2)
    print result[1]

edited Jan 4, 2015 at 21:07

answered Dec 7, 2011 at 18:00

Anurag Uniyal

89.2k41 gold badges181 silver badges223 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Zoe - Save the data dump · Accepted Answer · 2018-06-19 07:07:13Z

37

My error was very simple: the text file containing the data had some space (so not visible) character on the last line.

As an output of grep, I had 45 instead of just 45.

edited Jun 19, 2018 at 7:07

Zoe - Save the data dump

28.4k22 gold badges130 silver badges163 bronze badges

answered Nov 13, 2015 at 21:01

Sopalajo de Arrierez

3,9105 gold badges39 silver badges55 bronze badges

2 Comments

Oleg Melnikov Over a year ago

Spaces and tabs are visible ;) End-of-lines and alikes are not, for example, characters \n,\r.

Edgard Knive Over a year ago

I guess this is the point in time when most people figure out that Lib/re.py and .replace(' ', '') exist.

dimakin · Accepted Answer · 2025-10-11 12:34:17Z

23

This error is pretty verbose:

ValueError: could not convert string to float: id

Somewhere in your text file, a line has the word id in it, which can't really be converted to a number.

Your test code works because the word id isn't present in line 2.

If you want to catch that line, try this code. I cleaned your code up a tad:

#!/usr/bin/python

import os, sys
from scipy import stats
import numpy as np

for index, line in enumerate(open('data2.txt', 'r').readlines()):
    w = line.split(' ')
    l1 = w[1:8]
    l2 = w[8:15]

    try:
        list1 = map(float, l1)
        list2 = map(float, l2)
    except ValueError:
        print 'Line {i} is corrupt!'.format(i = index)
        break

    result = stats.ttest_ind(list1, list2)
    print result[1]

edited Oct 11 at 12:34

dimakin

2,0583 gold badges15 silver badges28 bronze badges

answered Dec 7, 2011 at 17:59

Blender

300k55 gold badges462 silver badges512 bronze badges

Comments

Contango · Accepted Answer · 2021-03-12 12:08:46Z

19

For a Pandas dataframe with a column of numbers with commas, use this:

df["Numbers"] = [float(str(i).replace(",", "")) for i in df["Numbers"]]

So values like 4,200.42 would be converted to 4200.42 as a float.

Bonus 1: This is fast.

Bonus 2: More space efficient if saving that dataframe in something like Apache Parquet format.

edited Mar 12, 2021 at 12:08

answered Mar 12, 2021 at 11:49

Contango

81k59 gold badges283 silver badges324 bronze badges

Comments

Tom Roth · Accepted Answer · 2018-03-02 06:53:32Z

9

Perhaps your numbers aren't actually numbers, but letters masquerading as numbers?

In my case, the font I was using meant that "l" and "1" looked very similar. I had a string like 'l1919' which I thought was '11919' and that messed things up.

answered Mar 2, 2018 at 6:53

Tom Roth

2,09421 silver badges26 bronze badges

Comments

Matt Fenwick · Accepted Answer · 2011-12-07 18:02:49Z

7

Your data may not be what you expect -- it seems you're expecting, but not getting, floats.

A simple solution to figuring out where this occurs would be to add a try/except to the for-loop:

for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    try:
      list1=[float(x) for x in l1]
      list2=[float(x) for x in l2]
    except ValueError, e:
      # report the error in some way that is helpful -- maybe print out i
    result=stats.ttest_ind(list1,list2)
    print result[1]

answered Dec 7, 2011 at 18:02

Matt Fenwick

49.3k24 gold badges130 silver badges198 bronze badges

Comments

João Vitor Gomes · Accepted Answer · 2021-04-26 13:46:46Z

5

Shortest way:

df["id"] = df['id'].str.replace(',', '').astype(float) - if ',' is the problem

df["id"] = df['id'].str.replace(' ', '').astype(float) - if blank space is the problem

answered Apr 26, 2021 at 13:46

João Vitor Gomes

3634 silver badges12 bronze badges

Comments

cottontail · Accepted Answer · 2023-11-10 05:48:54Z

In pandas

This error (or a very similar error) commonly appears when changing the dtype of a pandas column from object to float using astype() or apply(). The cause is there are non-numeric strings that cannot be converted into floats. One solution is to use pd.to_numeric() instead, with errors='coerce' passed. This replaces non-numeric values such as the literal string 'id' to NaN.

df = pd.DataFrame({'col': ['id', '1.5', '2.4']})

df['col'] = df['col'].astype(float)                     # <---- ValueError: could not convert string to float: 'id'
df['col'] = df['col'].apply(lambda x: float(x))         # <---- ValueError

df['col'] = pd.to_numeric(df['col'], errors='coerce')   # <---- OK
#                                    ^^^^^^^^^^^^^^^ <--- converts non-numbers to NaN


0    NaN
1    1.5
2    2.4
Name: col, dtype: float64

pd.to_numeric() works only on individual columns, so if you need to change the dtype of multiple columns in one go (similar to how .astype(float) may be used), then passing it to apply() should do the job.

df = pd.DataFrame({'col1': ['id', '1.5', '2.4'], 'col2': ['10.2', '21.3', '20.6']})
df[['col1', 'col2']] = df.apply(pd.to_numeric, errors='coerce')


   col1  col2
0   NaN  10.2
1   1.5  21.3
2   2.4  20.6

Sometimes there are thousands separator commas, which throws a similar error:

ValueError: could not convert string to float: '2,000.4'

in which case, first removing them before the pd.to_numeric() call solves the issue.

df = pd.DataFrame({'col': ['id', '1.5', '2,000.4']})
df['col'] = df['col'].replace(regex=',', value='')
#                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^  <--- remove commas
df['col'] = pd.to_numeric(df['col'], errors='coerce')


0       NaN
1       1.5
2    2000.4
Name: col, dtype: float64

In scikit-learn

This error is also raised when you fit data containing strings to models that expects numeric data. One example is various scalers e.g. StandardScaler(). In that case, the solution is to process the data by one-hot or label encoding the text input into a numeric input. Below is an example where a string input is one-hot encoded first and fed into a scaler model.

from sklearn.preprocessing import StandardScaler, OneHotEncoder
data = [['a'], ['b'], ['c']]
sc = StandardScaler().fit(data)  # <--- ValueError: could not convert string to float: 'a'


data = OneHotEncoder().fit_transform(data).toarray()
sc = StandardScaler().fit(data)  # <--- OK

Ramesh Ponnusamy · Accepted Answer · 2021-11-24 07:42:42Z

2

Update empty string values with 0.0 values: if you know the possible non-float values then update it.

df.loc[df['score'] == '', 'score'] = 0.0


df['score']=df['score'].astype(float)

answered Nov 24, 2021 at 7:42

Ramesh Ponnusamy

1,81716 silver badges24 bronze badges

Comments

mkrieger1 · Accepted Answer · 2024-08-25 13:36:08Z

1

I solved the similar situation with basic technique using pandas. First load the csv or text file using pandas.It's pretty simple

data = pd.read_excel('link to the file')

Then set the index of data to the respected column that needs to be changed. For example, if your data has ID as one attribute or column, then set index to ID.

data = data.set_index("ID")

Then delete all the rows with "id" as the value instead of number using following command.

data = data.drop("id", axis=0)

edited Aug 25, 2024 at 13:36

mkrieger1

24.2k7 gold badges68 silver badges84 bronze badges

answered Oct 3, 2019 at 14:44

Kapilfreeman

1,23511 silver badges11 bronze badges

Comments

mkrieger1 · Accepted Answer · 2024-08-25 13:37:28Z

0

For a pandas data frame or series when you get this error do this:

import pandas as pd

df["columns1"] = pd.to_number(df["column1"] , errors='coerce')

edited Aug 25, 2024 at 13:37

mkrieger1

24.2k7 gold badges68 silver badges84 bronze badges

answered Nov 7, 2023 at 13:01

Harsh Chitaliya

114 bronze badges

Comments

Sherry · Accepted Answer · 2023-07-19 12:29:11Z

A good option to handle these types of erroneous values in the data is to remove it at the read_csv step by specifying na_values. This will identify strings to recognize as NA/NaN.

By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘None’, ‘n/a’, ‘nan’, ‘null’. So in your case, since it's complaining about the string 'id' in the data. you could do the following:

df = pd.read_csv('file.csv', na_values = ['id'])

This will specify values the columns with 'id' in them as null and resolve the value error when running analysis on the column of interest

Collectives™ on Stack Overflow

ValueError: could not convert string to float: id

12 Answers 12

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

In pandas

In scikit-learn

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

In pandas

In scikit-learn

Comments

Comments

Comments

Comments

Comments

Linked

Related