0

I have a good few hundred of these job metric definitions in a single file that I'm trying to parse into a formatted .csv document

Job Name                                                         Last Start           Last End             ST Run     Pri/Xit
________________________________________________________________ ____________________ ____________________ __ _______ ___
B9043CC_APP_DMLD_025_FR_xpabbdu1_D                               03/12/2014 18:21:32  03/12/2014 18:22:07  SU 49744331/3

  Status/[Event]  Time                 Ntry ES  ProcessTime           Machine
  --------------  --------------------- --  --  --------------------- ----------------------------------------
  [FORCE_STARTJOB]  03/12/2014 17:30:52    0  PD  03/12/2014 17:30:53
    < >
  STARTING        03/12/2014 17:30:53    1  PD  03/12/2014 17:30:53   ab-shared-batch
  RUNNING         03/12/2014 17:31:06    1  PD  03/12/2014 17:31:07   ab-shared-batch
  SUCCESS         03/12/2014 17:31:46    1  PD  03/12/2014 17:31:47
  [FORCE_STARTJOB]  03/12/2014 18:16:06    0  PD  03/12/2014 18:16:07
    < >
  STARTING        03/12/2014 18:16:07    2  PD  03/12/2014 18:16:07   ab-shared-batch-
  RUNNING         03/12/2014 18:16:19    2  PD  03/12/2014 18:16:20   ab-shared-batch-
  FAILURE         03/12/2014 18:17:02    2  PD  03/12/2014 18:17:03
  [*** ALARM ***]
    JOBFAILURE    03/12/2014 18:17:03    2  PD  03/12/2014 18:17:04
  [FORCE_STARTJOB]  03/12/2014 18:21:18    0  PD  03/12/2014 18:21:19
    < >
  STARTING        03/12/2014 18:21:19    3  PD  03/12/2014 18:21:19   ab-shared-batch-
  RUNNING         03/12/2014 18:21:32    3  PD  03/12/2014 18:21:32   ab-shared-batch-
  SUCCESS         03/12/2014 18:22:07    3  PD  03/12/2014 18:22:08

I would like my output to look at this:System Number Command Job name Box Job Name

System Number  Job Name                           Target Machiene    Status     Actual Start Date     Actual Start Time      Actual End Date    Actual End Time
9043           B9043CC_APP_DMLD_025_FR_xpabbdu1_D ab-shared-batch    SUCCESS       03/12/2014               17:30:53            03/12/2014         17:31:47
9043           B9043CC_APP_DMLD_025_FR_xpabbdu1_D ab-shared-batch    FAILURE       03/12/2014               18:16:07            03/12/2014         18:17:03
9043           B9043CC_APP_DMLD_025_FR_xpabbdu1_D ab-shared-batch    SUCCESS       03/12/2014               18:21:19            03/12/2014         18:22:08

The actual start/end times & actaul start/end dates are coming from the "Process time" column.I only want the data above and don't want any of the text including the "----" to be anywhere in the .csv file. As mentioned above, I have a few hundred of those definitions in a single file.

I know python has a built in csv module which I am using to write to the label colums:

import csv
import sys

infile = '/home/n5acc7/test/output/testtest.csv'
f = open(infile, 'wt')
try:
    writer = csv.writer(f)
    writer.writerow( ('System Number', 'Job Name' 'Target Machiene', 'Status', 'Actual Start Date' 'Actual Start Date', 'Actual End Time', 'Actual End Date', 'Actual End Time',) )
finally:
    f.close()

But from the parsing persepctive, I'm not sure where to start. I'm running python 2.4.3.

1
  • The csv module can read as well as write. Have you tried using the other portion of it? Commented Mar 20, 2014 at 16:39

2 Answers 2

2

Parsing this looks pretty straight-forward;

general logic:

read six lines (header)
get system number and batch name

until end of file:
    read five lines
    get machine name, status, start and end dates and times
    if status is FAILURE
        read two lines (clear error message)

and some actual code (although targeted at Python 2.7; you'll have to do some back-porting for Python 2.4, or switch to a more up-to-date Python):

INPUT = "/home/n5acc7/test/input/batch1.log"
OUTPUT = "/home/n5acc7/test/output/testtest.csv"

LINE = "{:<6} {:34} {:18} {:10} {:10} {:10} {:10} {:10}\n"

def get_lines(n, inf):
    return [next(inf) for _ in xrange(n)]

def read_header(inf):
    head = get_n_lines(6, inf)
    job_name = head[2].split(None, 1)[0]
    system_num = job_name[1:5]
    return system_num, job_name

def read_record(inf):
    record    = get_lines(5, inf)
    startline = record[2].split()
    sd, st, name = startline[5:8]
    endline   = record[4].split()
    status    = endline[0]
    ed, et    = endline[5:7]
    # skip failure message
    if status == "FAILURE":
        get_lines(2, inf)
    return name, status, sd, st, ed, et

def parse_jobfile(fname):
    with open(fname) as inf:
        try:
            batch = read_header(inf)
            while True:
                job = read_record(inf)
                yield batch + job
        except StopIteration:
            # end of file
            pass

def main():
    with open(OUTPUT, "w") as outf:
        outf.write(LINE.format("SysNum", "Job Name", "Target Machiene", "Status", "Start Date", "Start Time", "End Date", "End Time"))
        for result in parse_jobfile(INPUT):
            outf.write(LINE.format(*result))

if __NAME__=="__MAIN__":
    main()
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks! What exactly is get_header, though? @Hugh Bothwell
@Matt: it's a mistake I made when cleaning up the function names :-/ should have been read_header, fixed now.
Okay, that's what I thought. Thanks! @Hugh Bothwell
Also, I believe there is suppose to be anothe value in "sd, st, name = startline[5:8]". @Hugh Bothwell
@Matt: um, no? startline[5:8] gives you items 5, 6, and 7, which are the startdate, starttime, and machine name. Python slice syntax is a bit like range(), the last item (8) is not included.
|
1

How are you with regular expressions? Python supports this. Perl is excellent for file processing. CSV files can be tab or comma delimited (the format has some variance), so if you have a file handle it's an incredibly easy format to write to. The language wouldn't have to be restricted to its CSV capabilities, as long as you are comfortable with it, or it is efficient for parsing. As far as regular expressions go, here are some links for intros (if you have more specific parsing scenarios you encounter once you determine your approach, can update this to address them):

Python re

perlreref There are more Perl ones, such as:

perlre

Understand basic Regex

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.