Extract data from a txt file that contains data blocks and strings using python

Question

I need to postprocess an output file from a model using python. The output file has a combination of data and strings. First, I want to separate the strings from data and then save the columns 0,1 and 2 from each output time (only data, with no strings) in a separate text file. So, for the below example, I will have 3 text files (for Time=0, Time=0.01, Time=0.04) each containing data from each output time without any header or any other strings in them. The short form of the output file from model looks like this:

 ******* Program ******  
 ******* Program ******  
 ******* Program ******                                                   
 Date:  26. 4.    Time:  15:40:32
 Units: L = cm   , T = days , M = mmol 


 Time:     0.000000

 Node    Depth    Head   Moisture   HeadF  MoistureF      Flux   
      [L]      [L]     [-]       [L]       [-]       [L/T]   

   1     0.00   -1000.00 0.1088    -1000.00 0.002508  -0.562E-03 
   2    -0.04   -1000.00 0.1088    -1000.00 0.002508  -0.562E-03 
   3    -0.08   -1000.00 0.1088    -1000.00 0.002508  -0.562E-03 
end


 Time:     0.010000

 Node    Depth    Head   Moisture   HeadF  MoistureF      Flux   
          [L]      [L]     [-]       [L]       [-]       [L/T]   

   1     0.00    -666.06 0.1304      -14.95 0.139033  -0.451E-02 
   2    -0.04    -666.11 0.1304      -15.01 0.138715  -0.887E-02 
   3    -0.08    -666.35 0.1304      -15.06 0.138394  -0.174E-01 
end


 Time:     0.040000

 Node    Depth    Head   Moisture   HeadF  MoistureF      Flux    
          [L]      [L]     [-]       [L]       [-]       [L/T]    

   1     0.00    -324.87 0.1720      -12.30 0.157799  -0.315E-02  
   2    -0.04    -324.84 0.1720      -12.31 0.157724  -0.628E-02  
   3    -0.08    -324.83 0.1720      -12.32 0.157649  -0.125E-01  
end

I found the following code from another question which was posted in stackoverflow earlier. Here is the link to that question: enter link description here

That problem is very similar to mine; however, I have problems modifying it to help solve my problem. How should I modify it for my problem? or should I use another strategy to approach this problem?

def parse_DPT(lines):
    DPT = []
    while lines:
        line = lines.pop(0).lstrip()
        if line == ' ' or line.startswith('*'):
            continue
        if line.startswith('*'):
            lines.insert(0, line)
            break
        data = line.split(' ')
        # pick only columns 0, 1, 2 and
        # convert to appropiate numeric format
        # and append to list for current DPT and step
        DPT.append([int(data[0]), float(data[1]), float(data[2])])
    return DPT

raw = []
with open('NOD_INFTEST.txt') as nit:
    lines = nit.readlines()
while lines:
line = lines.pop(0)

if line.startswith(''):
    if line.find('Time:') > -1:
        raw.append(parse_DPT(lines))

from pprint import pprint
for raw_step in zip(raw):
    print 'raw:'
    pprint(raw_step)

Here is the error message that I get from python:

'import sitecustomize' failed; use -v for traceback
Traceback (most recent call last):
  File "C:\Users\Desktop\python test\p-test3.py", line 58, in <module>
    raw.append(parse_DPT(lines))
  File "C:\Users\Desktop\python test\p-test3.py", line 35, in parse_DPT
    DPT.append([int(data[0]), float(data[1]), float(data[2])])
ValueError: invalid literal for int() with base 10: 'Units:'

What exactely are you trying to get as an end result? You say in the example below you have 3 files but I only see one. Where is the limit between them? — Ionut Hulub
– Ionut Hulub, Commented May 2, 2013 at 23:53
@IonutHulub Sorry if my explanation was confusing. By that sentence I meant that at the end, I want to have 3 separate text files that are created from the file that you can see above. Each section starts with the "Time: " and finishes with "end". I hope this helps with explanation. — Mary Jane
– Mary Jane, Commented May 3, 2013 at 0:01
And I'm guessing you want file one to contain this:` 1 0.00 -1000.00 2 -0.04 -1000.00 3 -0.08 -1000.00` If not please provide the expected output for the example you gave. — Ionut Hulub
– Ionut Hulub, Commented May 3, 2013 at 0:03

Ionut Hulub · Accepted Answer · 2013-05-03 00:40:30Z

1

If I understood your question then this code should do the trick:

import re

with open('in.txt', 'r') as in_file:
    file_content = in_file.read()
    blocks = re.findall(
        'Time:\s*\d+\.\d*(.*?)end',
        file_content,
        re.DOTALL
        )

    file_number = 1
    for block in blocks:
        with open('out%s.txt'%str(file_number), 'w') as out_file:
            for row in re.findall(
                    '\s*(-?\d+.?\d*)\s*(-?\d+.?\d*)\s*(-?\d+.?\d*).*',
                    block):
                out_file.write(row[0] + ' ' + row[1] + ' ' + row[2] + '\n')
        file_number += 1

The code assumes that the files containing the text is called in.txt

answered May 3, 2013 at 0:40

Ionut Hulub

6,3365 gold badges29 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Mary Jane Over a year ago

It exactly did what I needed. Thanks very much for your help. It just gives this message. Is this an error message? 'import sitecustomize' failed; use -v for traceback

Ionut Hulub Over a year ago

what message does it give?

Mary Jane Over a year ago

it creates the files that I need and gives this message: 'import sitecustomize' failed; use -v for traceback

Ionut Hulub Over a year ago

That error doesn't come from my scrip. I do not import sitecustomize. If you get the desired output files you shouldn't worry about it though. If you post your full code I can tell you why it gives that error.

Mary Jane Over a year ago

I don't have any additional code and it still gives that error. I get the desired output files.

|

Collectives™ on Stack Overflow

Extract data from a txt file that contains data blocks and strings using python

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related