0

I need to postprocess an output file from a model using python. The output file has a combination of data and strings. First, I want to separate the strings from data and then save the columns 0,1 and 2 from each output time (only data, with no strings) in a separate text file. So, for the below example, I will have 3 text files (for Time=0, Time=0.01, Time=0.04) each containing data from each output time without any header or any other strings in them. The short form of the output file from model looks like this:

 ******* Program ******  
 ******* Program ******  
 ******* Program ******                                                   
 Date:  26. 4.    Time:  15:40:32
 Units: L = cm   , T = days , M = mmol 


 Time:     0.000000

 Node    Depth    Head   Moisture   HeadF  MoistureF      Flux   
      [L]      [L]     [-]       [L]       [-]       [L/T]   

   1     0.00   -1000.00 0.1088    -1000.00 0.002508  -0.562E-03 
   2    -0.04   -1000.00 0.1088    -1000.00 0.002508  -0.562E-03 
   3    -0.08   -1000.00 0.1088    -1000.00 0.002508  -0.562E-03 
end


 Time:     0.010000

 Node    Depth    Head   Moisture   HeadF  MoistureF      Flux   
          [L]      [L]     [-]       [L]       [-]       [L/T]   

   1     0.00    -666.06 0.1304      -14.95 0.139033  -0.451E-02 
   2    -0.04    -666.11 0.1304      -15.01 0.138715  -0.887E-02 
   3    -0.08    -666.35 0.1304      -15.06 0.138394  -0.174E-01 
end


 Time:     0.040000

 Node    Depth    Head   Moisture   HeadF  MoistureF      Flux    
          [L]      [L]     [-]       [L]       [-]       [L/T]    

   1     0.00    -324.87 0.1720      -12.30 0.157799  -0.315E-02  
   2    -0.04    -324.84 0.1720      -12.31 0.157724  -0.628E-02  
   3    -0.08    -324.83 0.1720      -12.32 0.157649  -0.125E-01  
end

I found the following code from another question which was posted in stackoverflow earlier. Here is the link to that question: enter link description here

That problem is very similar to mine; however, I have problems modifying it to help solve my problem. How should I modify it for my problem? or should I use another strategy to approach this problem?

def parse_DPT(lines):
    DPT = []
    while lines:
        line = lines.pop(0).lstrip()
        if line == ' ' or line.startswith('*'):
            continue
        if line.startswith('*'):
            lines.insert(0, line)
            break
        data = line.split(' ')
        # pick only columns 0, 1, 2 and
        # convert to appropiate numeric format
        # and append to list for current DPT and step
        DPT.append([int(data[0]), float(data[1]), float(data[2])])
    return DPT

raw = []
with open('NOD_INFTEST.txt') as nit:
    lines = nit.readlines()
while lines:
line = lines.pop(0)

if line.startswith(''):
    if line.find('Time:') > -1:
        raw.append(parse_DPT(lines))

from pprint import pprint
for raw_step in zip(raw):
    print 'raw:'
    pprint(raw_step)

Here is the error message that I get from python:

'import sitecustomize' failed; use -v for traceback
Traceback (most recent call last):
  File "C:\Users\Desktop\python test\p-test3.py", line 58, in <module>
    raw.append(parse_DPT(lines))
  File "C:\Users\Desktop\python test\p-test3.py", line 35, in parse_DPT
    DPT.append([int(data[0]), float(data[1]), float(data[2])])
ValueError: invalid literal for int() with base 10: 'Units:'
4
  • What exactely are you trying to get as an end result? You say in the example below you have 3 files but I only see one. Where is the limit between them? Commented May 2, 2013 at 23:53
  • @IonutHulub Sorry if my explanation was confusing. By that sentence I meant that at the end, I want to have 3 separate text files that are created from the file that you can see above. Each section starts with the "Time: " and finishes with "end". I hope this helps with explanation. Commented May 3, 2013 at 0:01
  • And I'm guessing you want file one to contain this:` 1 0.00 -1000.00 2 -0.04 -1000.00 3 -0.08 -1000.00` If not please provide the expected output for the example you gave. Commented May 3, 2013 at 0:03
  • @IonutHulub that is exactly what I expect as file one Commented May 3, 2013 at 0:18

1 Answer 1

1

If I understood your question then this code should do the trick:

import re

with open('in.txt', 'r') as in_file:
    file_content = in_file.read()
    blocks = re.findall(
        'Time:\s*\d+\.\d*(.*?)end',
        file_content,
        re.DOTALL
        )

    file_number = 1
    for block in blocks:
        with open('out%s.txt'%str(file_number), 'w') as out_file:
            for row in re.findall(
                    '\s*(-?\d+.?\d*)\s*(-?\d+.?\d*)\s*(-?\d+.?\d*).*',
                    block):
                out_file.write(row[0] + ' ' + row[1] + ' ' + row[2] + '\n')
        file_number += 1

The code assumes that the files containing the text is called in.txt

Sign up to request clarification or add additional context in comments.

6 Comments

It exactly did what I needed. Thanks very much for your help. It just gives this message. Is this an error message? 'import sitecustomize' failed; use -v for traceback
what message does it give?
it creates the files that I need and gives this message: 'import sitecustomize' failed; use -v for traceback
That error doesn't come from my scrip. I do not import sitecustomize. If you get the desired output files you shouldn't worry about it though. If you post your full code I can tell you why it gives that error.
I don't have any additional code and it still gives that error. I get the desired output files.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.