Python-Cannot convert string to float...manipulating a text file

Question

To all:

I have a question about converting from string to float in python and any python advice you can give about my code.

I think the best way to show you my problem is to explain what I am doing.

I have a txt file that is generated from a fortran program. This text file is of the form:

 0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000
 0.000
 0.500     0.156     0.154     0.152     0.151     0.148     0.144     0.141     0.138     0.135     0.132     0.130     0.127     0.124     0.121     0.118     0.115     0.112     0.110     0.107     0.104     0.102     0.100     0.097     0.093     0.089     0.087     0.084     0.082     0.079     0.076     0.074     0.072     0.069     0.067     0.064     0.063     0.060     0.058     0.056     0.054     0.052     0.051     0.049     0.044     0.041     0.038     0.036     0.034     0.031     0.029     0.027     0.026     0.024     0.022     0.020     0.018     0.016     0.015     0.013     0.012     0.010     0.009     0.007     0.006     0.004     0.003     0.002     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000
 0.000

The first value 0.0 is a time, the second value is the water height at cell1, etc. It currently during processing after 100 inputs creates a newline, and at every new time creates a newline. I would like to be able to write a python code to make it look like:

time1     cell1     cell2     .....
time2     cell1     cell2     .....

Things to keep in mind are that the number of cells will vary and after every 100 a newline is created. (My example above only gives time and 100 cells as a demo.)

My code so far is below..

    from pylab import *
    from numpy import *
    import math

    ########################

    a=open('wh.txt','r')
    b=open('new.txt', 'w')

    for line in a:
      b.write(line.lstrip())

    c=open('new.txt','r')
    d=open('newer.txt','w')

    for line in c:
      d.write(line.replace('\n','     '))

    e=loadtxt('newer.txt')
    o=open('newest.txt','w')



    ### v = value to split, l = size of each chunk
    h = lambda v, l: [v[i*l:(i+1)*l] for i in range(int(math.ceil(len(v)/float(l))))]

    g=list(h(tuple(e),102))


    with open("newest.txt","w") as o:
        o.write('\n'.join(map(str,g)))

This gives a output as a tuple:

(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
(    0.5, 0.156, 0.154, 0.152, 0.151, 0.14799999999999999, 0.14399999999999999, 0.14099999999999999, 0.13800000000000001, 0.13500000000000001, 0.13200000000000001, 0.13, 0.127, 0.124, 0.121, 0.11799999999999999, 0.115, 0.112, 0.11, 0.107, 0.104, 0.10199999999999999, 0.10000000000000001, 0.097000000000000003, 0.092999999999999999, 0.088999999999999996, 0.086999999999999994, 0.084000000000000005, 0.082000000000000003, 0.079000000000000001, 0.075999999999999998, 0.073999999999999996, 0.071999999999999995, 0.069000000000000006, 0.067000000000000004, 0.064000000000000001, 0.063, 0.059999999999999998, 0.058000000000000003, 0.056000000000000001, 0.053999999999999999, 0.051999999999999998, 0.050999999999999997, 0.049000000000000002, 0.043999999999999997, 0.041000000000000002, 0.037999999999999999, 0.035999999999999997, 0.034000000000000002, 0.031, 0.029000000000000001, 0.027, 0.025999999999999999, 0.024, 0.021999999999999999, 0.02, 0.017999999999999999, 0.016, 0.014999999999999999, 0.012999999999999999, 0.012, 0.01, 0.0089999999999999993, 0.0070000000000000001, 0.0060000000000000001, 0.0040000000000000001, 0.0030000000000000001, 0.002, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)

I am not sure what I am doing incorrectly as I am fairly new to python. Any advice on this code or on another approach would be appreciated.

I don't understand what you want to achieve. From what you write it sounds like you just want to remove the newlines from the file. — Cito
– Cito, Commented Nov 1, 2011 at 16:52
@Cito I want to create an array of values where time is in the first column and then my cell values for that time in the columns next to it. The issue is that if I only remove the newlines I dont know how to go back through and add the newlines where I want them. — geop
– geop, Commented Nov 1, 2011 at 17:32
@johnthexiii John, will this work if I only want to delimit at certain spaces? Can you give me an example of how to do this? — geop
– geop, Commented Nov 1, 2011 at 18:00
What if there were 199 cells at a certain time? That would give you 2 full rows of 100 numbers each... how would you distinguish between that and two separate times, each with 99 cells? — David Z
– David Z, Commented Nov 1, 2011 at 18:11

Shawn Chin · Accepted Answer · 2011-11-02 15:16:26Z

0

As the comments have pointed out, the specifications for your data is ambiguous and can lead to wrongly parsed data i.e. if a timing row has exactly 100 cells the next timing row may be mistaken as part of the current row.

Nevertheless, here's my attempt at an implementation to help get you on your way. It's commented liberally to aid understanding, but feel free to ask if you need clarification.

def unwrap_data(filename, wrap_len=101, map_func=None):
    """
    Generator which reads a file and returns a list of float,
    one for each data row.

    Rows in the file are assumed to be wrapped after every 
    wrap_len columns, so we unwrap it before returning each
    data row.

    wrap_len defaults to 101 (1 time column + 100 cell values).

    Caveat: If a timing data has exactly 100 cell values (101 
    columns), the output of this function will be wrong unless
    an additional newline exists before the next timing row, e.g.

         time1      cell1_1    cell1_2  ... cell1_100
         cell1_101  cell1_102  ...
         time2      cell2_1    cell2_2  ... cell2_100

         time3      cell3_1    cell3_2  ...
    """
    next_data = []
    for line in open(filename, 'r'):  # for each line in file
        L = line.strip().split()
        if map_func:
            L = map(map_func, L)  # run map_func() on each list element
        next_data.extend(L)  # add to prev row
        if len(L) != wrap_len and next_data: 
            # the line was not wrapped, assume new timing data
            # "and next_data" will avoid returning empty lists for blank lines
            yield next_data
            next_data = []

I've defined it as a generator function in a bid to improve clarity and performance.

Example usage:

To print the parsed output into a new file as tab separated entries:

out = open("outfile.dat", "w")
for line in unwrap_data("input_file.dat"):
    out.write("\t".join(line) + "\n")

Note that the function returns a list of string values. To use the values as a float, make use of the map_func argument.

In the next example, we pass in the float() function so each entry is converted to a float. We then print out the time (first column) and the minimum/maximum cell value (remaining columns).

for line in unwrap_data("input_file.dat"):
    print line[0], min(line[1:]), max(line[1:])

I've also parametrised the wrap length just so you can change it by including the wrap_len=<new_value> argument when calling the function.

Hope this help.

edited Nov 2, 2011 at 15:16

answered Nov 2, 2011 at 9:57

Shawn Chin

87.5k20 gold badges168 silver badges193 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

geop Over a year ago

Thank you so much for the help...this was way more then I expected. Thank you also for being willing to answer any questions I have. I will let you know if I have trouble following something.

geop Over a year ago

stackoverflow.com/questions/7986459/… I was having trouble getting the map function to work on my text file, any help is appreciated.

Jeffery Smith · Accepted Answer · 2011-11-01 20:45:28Z

The biggest problem you'll run into is making sure you can tell the difference between states. As someone else pointed out, how do you know you don't have a time and 99 cells or an additional 100 cells that are carrying over from the previous line?

I would start by trying to find out something unique about the data to be able to differentiate it. Is there a range of values that make sense per cell? Certainly not the safest way to handle it but if what you've shown of the data is all there is to it, I'm not sure what other options are out there.

As far as code goes, I would split the line based on space as the delimeter. If you get the size of the resulting dict, you can tell if you have a complete record or if you've hit the 100 column limit. (Don't forget to strip the last element of the newline character) You'll also need a way to tell if that first element is a time or just another cell.

Hope this at least nudges you in the right direction.

Collectives™ on Stack Overflow

Python-Cannot convert string to float...manipulating a text file

2 Answers 2

Example usage:

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Example usage:

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related