0

I have a log file that is updated every few milliseconds however the information is currently saved with four(4) different delimitors. The log files contain millions of lines so the chances of performing the action in excel null.

What I have left to work on resembles:

Sequence=3433;Status=true;Report=223313;Profile=xxxx;
Sequence=0323;Status=true;Header=The;Report=43838;Profile=xxxx;
Sequence=5323;Status=true;Report=6541998;Profile=xxxx;

I would like these set to:

Sequence,Status;Report;Header;Profile
3433,true,Report=223313,,xxxx
0323,true,Report=43838,The,xxxx
5323,true,Report=6541998,,xxxx

Meaning that I would the need the creation of a header using all portions with the equal "=" symbol following it. All of the other operations within the file are taken care of and this will be used to perform a comparative check between files and replace or append fields. As I am new to python, I only need the assistance with this portion of the program I am writing.

Thank you all in advance!

1
  • I have not really gone that far in it due to my limited knowledge of python. I am just starting and I can reorder, open files, request data, code loops and such but this portion is a little out of my league. Commented Dec 25, 2015 at 4:02

1 Answer 1

1

You can try this.

First of all, I called the csv library to reduce the job of putting commas and quotes.

import csv

Then I made a function that takes a single line from your log file and outputs a dictionary with the fields passed in the header. If the current line hasn't a particular field from header, it will stay filled with an empty string.

def convert_to_dict(line, header):
    d = {}
    for cell in header:
        d[cell] = ''

    row = line.strip().split(';')    
    for cell in row:
        if cell:
            key, value = cell.split('=')
            d[key] = value

    return d

Since the header and the number of fields can vary between your files, I made a function extracting them. For this, I employed a set, a collection of unique elements, but also unordered. So I converted to a list and used the sorted function. Don't forget that seek(0) call, to rewind the file!

def extract_fields(logfile):
    fields = set()
    for line in logfile:
        row = line.strip().split(';')
        for cell in row:
            if cell:
                key, value = cell.split('=')
                fields.add(key)

    logfile.seek(0)
    return sorted(list(fields))

Lastly, I made the main piece of code, in which open both the log file to read and the csv file to write. Then, it extracts and writes the header, and writes each converted line.

if __name__ == '__main__':
    with open('report.log', 'r') as logfile:
        with open('report.csv', 'wb') as csvfile:
            csvwriter = csv.writer(csvfile)

            header = extract_fields(logfile)
            csvwriter.writerow(header)

            for line in logfile:
                d = convert_to_dict(line, header)
                csvwriter.writerow([d[cell] for cell in header])

These are the files I used as an example:

report.log

Sequence=3433;Status=true;Report=223313;Profile=xxxx;
Sequence=0323;Status=true;Header=The;Report=43838;Profile=xxxx;
Sequence=5323;Status=true;Report=6541998;Profile=xxxx;

report.csv

Header,Profile,Report,Sequence,Status
,xxxx,223313,3433,true
The,xxxx,43838,0323,true
,xxxx,6541998,5323,true

I hope it helps! :D

EDIT: I added support for different headers.

Sign up to request clarification or add additional context in comments.

6 Comments

I think I can definitely make this work. There are a vast number of columns and they vary from file to file but I can pre-define and filter as needed with this. I will give it a go after this Holiday Gathering. If it helps, you will definitely be a godsend.
I edited my answer to support different headers. You may change the line header = extract_fields(logfile) to a pre-defined array if you want, ignoring particular columns.
I am back from vacation and it works like a dream. My program as written formats all the different data types to be used by the portion provided. Once run, it creates a report based on the arguments and the additional functions called. I learned so much in picking apart the code. Thank you again!
Is there any way to pipe the content in? I get run time errors. for instance "cat file.txt | script.py" I set the logfile variable to be loaded by stdin but that was a no go
The correct syntax to redirect the stdin is script.py < file.txt
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.