Float conversion failing in for loop (python)

Question

I'm trying to extract anomalous data points from a large csv file (~1e6 lines) in which most of the data points are at constant value. I've written the code below to detect values lower than the constant.

constant = 1
try:
    fp = open('disk2.csv')
    for line in fp: 
        ch4 = float(line.split(",")[4]) #data from channel four is in the fifth column
        if ch4 < constant:
            print line.split(",")[0] #print first column

except:
    ch4 = 'Not found'
finally:
    fp.close()
    print(ch4,type(ch4))

the print returns the following, without additional errors:

('Not found', <type 'str'>)

if I change the code to:

constant = 1
try:
    fp = open('disk2.csv')
    for line in fp: 
        ch4 = line.split(",")[4] #data from channel four is in the fifth column
        if ch4 < constant:
            print line.split(",")[0] #print first column

except:
    ch4 = 'Not found'
finally:
    fp.close()
    print(ch4,type(ch4))

It returns

(' 2.41650E+01', <type 'str'>)

So, the csv file is read as a string, and the string can be divided into a list using the split command, but I cannot turn the items in the list into floating numbers?

The error was not in the code but in my CSV file, which did not contain enough items on the first row

You can change the string into a float using float_value = float(ch4) — T Burgis
– T Burgis, Commented Oct 26, 2018 at 12:54
This doesn't directly answer your question so I'm not including it as an answer, but you might take a look at using the pandas library if you'll be working much with csv data. This could be done in 2 lines with the first being reading the file into a DataFrame and the second showing all rows with value less than constant. — jayemar
– jayemar, Commented Oct 26, 2018 at 18:11
pandas would load the whole thing into memory, so as long as you have enough memory you should be fine. There's another library called dask that uses the pandas API but allows for using data sets that don't fit into memory, but I've never used it myself. — jayemar
– jayemar, Commented Oct 26, 2018 at 19:51

antimirov · Accepted Answer · 2018-10-26 13:01:49Z

1

It's generally a bad practice to directly compare floats. it's better to use something like this:

abs(float(ch4), constant) <= allowed_error

Where allowed_error is some small value like 0.000001, for example. Floating point numbers are stored differently from integers and 1.0 can internally be 0.9999999 or 1.000001.

answered Oct 26, 2018 at 13:01

antimirov

5,2133 gold badges24 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Akshayanti Over a year ago

In case where they need to be ranked, is there some way to generate the value of allowed_error for 16 decimal places, for example?

antimirov Over a year ago

You can use sys.float_info.epsilon for that, I believe.

skleijn Over a year ago

This looks like a way to find small differences between numbers. I'm actually looking for a bigger difference, but the allowed_error value could be used to tune that. Anyway, I changed line 6 to abs(float(ch4),constant) <= allowed_error with allowed_error set to 0.1 and it generated the same result (i.e. the for loop is failing)

antimirov Over a year ago

Can you show a sample of your file? A couple of lines? Can you temporarily change constant to something that's definitely bigger than some values in that sample and try again?

skleijn Over a year ago

Most lines look like this

1.320460000E+04, 2.41900E+01, 2.41900E+01, 2.41900E+01, 2.41900E+01, 2.41900E+01, 2.50000E-02, 2.00000E-02, 2.40000E-01, 1.00000E-02,-2.36750E+01,0, \n

|

Akshayanti · Accepted Answer · 2018-10-26 12:54:26Z

0

In the first case, you are doing the comparison with the values, and changing the format from str to float for the comparison, as in if float(ch4) < constant. Note that you are not storing the value as a float type, but just converting it right there for this particular evaluation.

In the second case, you are comparing a str and an int. Notice that when you use constant = 1, the type for constant by default is int, and not float. Having said that, you are comparing an int and a str. For this evaluation, your code would compare the values by encoding the string as such into int. For example, in ASCII, 'A' would be encoded as 65. The string would be converted into the integer representation, depending on the encoding used, and then would be used for the evaluation.

To solve your problem, you must store the value in ch4 as a float. This can be done by ch4 = float(line.split(",")[4]) which will store the value in a float variable, as opposed to the str variable.

answered Oct 26, 2018 at 12:54

Akshayanti

3263 silver badges17 bronze badges

6 Comments

skleijn Over a year ago

If I change line 5 to ch4 = float(line.split(",")[4] then I get the same result as in the first example. Do you have an example of how it would work?

Akshayanti Over a year ago

you need to end the braces well. Are you sure it's not a typo? In the block itself, you can do a print(type(ch4)) to verify if it works

skleijn Over a year ago

when I request the type inside the for loop, it works when the code is ch4 = line.split(",")[4] but not when the code is ch4 = float(line.split(",")[4]) (indeed above there was a typo, but that was not in the real code.)

Akshayanti Over a year ago

According to the numbers you provided in another answer, and when I evaluated them with ch4 = float(blah), and it displays the type(ch4) as float. The condition says that the number is supposed to be less than 1, and so it displays the part in finally block. The number in question for the evaluation is 2.41900E+01. If i change the value of constant to 25, it displays the first column as a string (we didn't typecast it), and the final value of ch4, and it's type. Isn't that how it's supposed to work?

skleijn Over a year ago

You are right and it turned out I had missed that the top row of my file was unsuited for this way of reading out...

|

Collectives™ on Stack Overflow

Float conversion failing in for loop (python)

The error was not in the code but in my CSV file, which did not contain enough items on the first row

2 Answers 2

8 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

The error was not in the code but in my CSV file, which did not contain enough items on the first row

2 Answers 2

8 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related