0

I have a large text file with Event data that I am trying to parse to a csv. The structure looks like this:

START
USER: a
TIME: 1000
CLICKS: 1
COMMAND A: 2
COMMAND B: 1
END
START
USER: b
TIME: 00
CLICKS: 1
COMMAND A: 2
COMMAND B: 1
COMMAND C: 1
END

The events are separated using the START and END tags and I am trying to parse it to create a csv file that has each event as a row, and the other attributes as columns, so in the example above, the columns would be USER, TIME, CLICKS, COMMAND A, COMMAND B, COMMAND C and the values for each would be the value after the :

I know that this code will read an individual event:

with open('sampleIVTtxt.txt', 'r') as input_data:
for line in input_data:
    if line.strip() == 'START REPORT':
break
for line in input_data:  
    if line.strip() == 'END':

Where I am stuck is how to parse the lines within the event block and store them as columns and values in a csv. I'm thinking for each line within the event block I need to parse out the column name using regex and then store those names in an array and use writerow(namesarray) to create the columns. But I'm not sure how to loop through the whole txt file and store subsequent event values in those columns.

I am new to python, so any help would be appreciated.

5
  • 3
    Have you tried anything at all? Commented May 4, 2015 at 23:51
  • 4
    I think it would help if you (1) format your post correctly, and (2) add a python tag. Oh, and (3) post what you got and point out where you are stuck. Commented May 4, 2015 at 23:55
  • Thank you for your response. I've edited the question with tags and provided more detail on where I'm stuck Commented May 5, 2015 at 0:51
  • Will you know ahead of time the columns that you will need? Commented May 5, 2015 at 1:00
  • Yes, I will know all the columns that could exist for an event. However, not all events will have input for each column. Basically, if a COMMAND A was not used, there will be no line for it in that event block, so I would want the row to just have a 0 or null cell for that column Commented May 5, 2015 at 1:04

1 Answer 1

2

Something like:

import csv

with open('sampleIVTtxt.csv', 'w') as csvfile:
    fieldnames = ['USER', 'TIME','CLICKS','COMMAND_A','COMMAND_B','COMMAND_C']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()

with open('sampleIVTtxt.txt', 'r') as input_data:
for line in input_data:
    thisLine=line.strip()
    if thisLine == 'START':
       myDict={}
    elif "USER" in thisLine:
       myDict['USER'] = thisLine[6:]
     ....and so on....
    elif thisLine == 'END':
      writer.writerow(myDict)
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks kaz, I am getting an "invalid syntax" error on the line myDict{'USER': thisLine[6:]}. Does this part: elif "USER" in thisLine: myDict{'USER': thisLine[6:]} check if there is a row with "USER" and if so, store the value in the column called user?
sorry, been a while in Python - wrong syntax. I'll edit it. And yes, that is the approach - except I first store all the data for a row in a dictionary, then use a csv writer that uses that dictionary to write the values to the appropriate columns.
Thanks kaz, I am still tweaking my code but I think this answer will get me what I am looking for. I appreciate the help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.