2

A text file generated by a Fortran program contains "blocks" data, that need to be reformatted (Python script).

Each "block" of data in this file corresponds to the "Time:" specified in the beginning of the block. All "blocks" have a fixed size and structure.

I need to extract the data from "Head" and "Moisture" columns corresponding to different "Depths" (0, -1, and -2) for each "Time:".

Note: The header at the beginning is not part of the repeating "blocks" of data.

Sample input file:

 ******* Program Simulation
 ******* 
 This is initial header information for Simulation                              
 Date:   1. 6.    Time:  15: 3:39
 Units: L = cm   , T = min  , M = mmol 

 Time:        0.0000

 Node      Depth      Head Moisture       K         
           [L]        [L]    [-]        [L/T]      

   1     0.0000     -37.743 0.0630   0.5090E-05  
   2    -1.0000     -36.123 0.0750   0.5090E-05  
   3    -2.0000     -33.002 0.0830   0.5090E-05  
end

 Time:      360.0000

 Node      Depth      Head Moisture       K         
           [L]        [L]    [-]        [L/T]     

   1     0.0000 -0.1000E+07 0.0450   0.1941E-32  
   2    -1.0000    -253.971 0.0457   0.4376E-10  
   3    -2.0000     -64.510 0.0525   0.2264E-06  
end

 Time:      720.0000

 Node      Depth      Head Moisture       K         
           [L]        [L]    [-]        [L/T]     

   1     0.0000 -0.1000E+07 0.0550   0.1941E-32  
   2    -1.0000    -282.591 0.0456   0.2613E-10 
   3    -2.0000     -71.829 0.0513   0.1229E-06  
end

Desired output:

Time        Head(Depth=0)   Head(Depth=-1)  Head(Depth=-2)  Moisture(Depth=0)   Moisture(Depth=-1)  Moisture(Depth=-2)
0.0000      -37.743         -36.123         -33.002         0.0630              0.0750              0.0830
360.0000    -0.1000E+07     -253.971        -64.510         0.0450              0.0457              0.0525
720.0000    -0.1000E+07     -282.591        -71.829         0.0550              0.0456              0.0513

How I read the input file block-by-block from each "Time:" to "end" keywords and reformat to the desired output?

5
  • Is this output (reformatted plain text) what you actually want/need, or would a more structured text (xml, JSON) or data object (numpy.array, list of lists, dictionary) be preferred? Commented Jun 5, 2012 at 16:40
  • 1
    Reformatted plain text or a CSV should be fine. I am going to load it into Excel for further analysis. Commented Jun 5, 2012 at 16:44
  • You should pay attention to how Excel would interpret/reformat number in scientific notation, since it might not work fine sometimes. Commented Jun 5, 2012 at 16:58
  • @heltonbiker thanks for the tip. In case excel can't import it properly, I'll reformat scientific notation using Python before writing the output file. Commented Jun 5, 2012 at 20:31
  • @akashwani I don't think we need complex structure when doing ETL style work, just keep the structure as simple as possible and then load them( maybe use C ) Commented Jun 6, 2012 at 2:05

3 Answers 3

1

Edit: I have made a couple of changes so it actually runs.

from itertools import chain

def get_lines(f, n=1):
    return [f.next() for i in xrange(n)]

class BlockReader(object):
    def __init__(self, f, n=1):
        self.f = f
        self.n = n
    def __iter__(self):
        return self
    def next(self):
        return [self.f.next() for i in xrange(self.n)]

fmt = "{:<12}" + "{:<16}"*6 + "\n"
cols = [
    "Time",
    "Head(Depth=0)",
    "Head(Depth=-1)",
    "Head(Depth=-2)",
    "Moisture(Depth=0)",
    "Moisture(Depth=-1)",
    "Moisture(Depth=-2)"
]

def main():
    with open("simulation.txt") as inf, open("result.txt","w") as outf:
        # throw away input header
        get_lines(inf, 5)
        # write output header
        outf.write(fmt.format(*cols))

        # read input file in ten-line chunks
        for block in BlockReader(inf, 10):
            # grab time value
            time = float(block[1].split()[1])

            # grab head and moisture columns
            data = (line.split()[2:4] for line in block[6:9])
            values = (map(float,dat) for dat in data)
            h,m = zip(*values)

            # write data to output file
            outf.write(fmt.format(*chain([time],h,m)))

if __name__=="__main__":
    main()

Output is

Time        Head(Depth=0)   Head(Depth=-1)  Head(Depth=-2)  Moisture(Depth=0)Moisture(Depth=-1)Moisture(Depth=-2)
0.0         -37.743         -36.123         -33.002         0.063           0.075           0.083           
360.0       -1000000.0      -253.971        -64.51          0.045           0.0457          0.0525          
720.0       -1000000.0      -282.591        -71.829         0.055           0.0456          0.0513          
Sign up to request clarification or add additional context in comments.

4 Comments

I am trying to debug your code. The line time = float(block[1].split()[1]) gives this error IndexError: string index out of range. Any idea? @hugh-bothwell
@akashwani: yes, my error: I was thinking in terms of a file iterator returning ten-line lists, when what I actually wrote returned a ten-line list and then operated on each line. I have made a quick fix, and am now looking at a better (more Pythonic) version.
@akashwani: I have created a BlockReader class which operates as I had originally intended, returning ten-line chunks of the input file. Hope this helps!
Thank you, I am using your code. It's definitely more Pythonic and well structured. Thanks again!
1

Here's the parsing part:

import re

data = []

with open(xxxx) as f:
    for line in f:
        m = re.match(r'^\s+Time:\s+([\d.]+)', line)
        if m:
            data.append([float(m.group(1))])
        elif re.match(r'^\s+\d+', line):
            data[-1].append(map(float, line.strip().split()))

produces:

[[0.0,
  [1.0, 0.0, -37.743, 0.063, 5.09e-06],
  [2.0, -1.0, -36.123, 0.075, 5.09e-06],
  [3.0, -2.0, -33.002, 0.083, 5.09e-06]],
 [360.0,
  [1.0, 0.0, -1000000.0, 0.045, 1.941e-33],
  [2.0, -1.0, -253.971, 0.0457, 4.376e-11],
  [3.0, -2.0, -64.51, 0.0525, 2.264e-07]],
 [720.0,
  [1.0, 0.0, -1000000.0, 0.055, 1.941e-33],
  [2.0, -1.0, -282.591, 0.0456, 2.613e-11],
  [3.0, -2.0, -71.829, 0.0513, 1.229e-07]]]

it should be easy to print the desired table from this.

Comments

0

If the file isn't too large, you can do:

f = open('somefile')
file = f.read()
blocks = file.split('Time:')[1:]

1 Comment

The input file is large in size. How do I restructure each block into the desired output format.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.