Extracting required Variables from Event Log file using Python

Question

sample first row of event log file ,here i have successfully extracted evrything apart from last key value pair which is attribute-

{"event_type":"ActionClicked","event_timestamp":1451583172592,"arrival_timestamp":1451608731845,"event_version":"3.0",
  "application":{"app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:
    4d9cf803-0487-44ec-be27-1e160d15df74","package_name":"com.think.vito","sdk":{"name":"aws-sdk-android","version":"2.2.2"}
    ,"title":"Vito","version_name":"1.0.2.1","version_code":"3"},"client":{"client_id":"438b152e-5b7c-4e99-9216-831fc15b0c07",
      "cognito_id":"us-east-1:448efb89-f382-4975-a1a1-dd8a79e1dd0c"},"device":{"locale":{"code":"en_GB","country":"GB",
        "language":"en"},"make":"samsung","model":"GT-S5312","platform":{"name":"ANDROID","version":"4.1.2"}},
  "session":{"session_id":"c15b0c07-20151231-173052586","start_timestamp":1451583052586},"attributes":{"OfferID":"20186",
    "Category":"40000","CustomerID":"304"},"metrics":{}}

Hello Every One ,I am trying to extract the content from Event log file as shown in attached image .As to requirement i have to fetch customer ID, offer id, category these are important variable i need to extract from the this event log file .this is csv formatted file. i tryed with regular expression but it is't working because you can observe format of every column is different. As you see first row has category customer id offer id and second row is totally blank in this case regular expression wont work apart from this we have to consider we have to consider all possible condition, we has 14000 sample.in Event log file ...#Jason # Parsing #Python #Pandas

Is this a plain text file? Does every line start and end with {}? If so, seems like you can read the file line by line and use literal_eval to turn each line to a Python dict object. — DeepSpace
– DeepSpace, Commented Jul 10, 2016 at 8:12
Can you provide the actual piece of your data log instead of the image format? You don't expect us to type your data one by one, right? — MaThMaX
– MaThMaX, Commented Jul 10, 2016 at 8:22
yes , it was in txt format earlier.it was huge file i extracted below variable from event log file event_type event_timestamp arrival_timestamp event_version application { app_id cognito_identity_pool_id } client{} device{} session{} attributes{} — Nabi Shaikh
– Nabi Shaikh, Commented Jul 10, 2016 at 8:23
Why do you have single quotes in the image but double quotes in the text? (The latter could be in JSON format.) — user2285236
– user2285236, Commented Jul 10, 2016 at 8:35
@ayhan image file is in csv format and where as the in text form its in .txt format ...after extracting from .txt file i separated every key to individual csv file . — Nabi Shaikh
– Nabi Shaikh, Commented Jul 10, 2016 at 9:12

mhawke · Accepted Answer · 2016-07-10 09:55:28Z

2

Edit

The data, after your edit, now appears to be JSON data. You can still use literal_eval as below, or you could use the json module:

import json

with open('event.log') as events:
    for line in events:
        event = json.loads(line)
        # process event dictionary

To access the CustomerID, OfferID, Category etc. you need to access the nested dictionary associated with the key 'attributes' in the event dictionary:

print(event['attributes']['CustomerID'])
print(event['attributes']['OfferID'])
print(event['attributes']['Category'])

If it is the case that some keys could be missing use dict.get() instead:

print(event['attributes'].get('CustomerID'))
print(event['attributes'].get('OfferID'))
print(event['attributes'].get('Category'))

Now you will get None if the key is missing.

You can extend this principle to access other items with the dictionary.

If I understand your question you also want to create a CSV file containing the extracted fields. You use the extracted values with csv.DictWriter like this:

import csv

with open('event.log') as events, open('output.csv', 'w') as csv_file:
    fields = ['CustomerID', 'OfferID', 'Category']
    writer = csv.DictWriter(csv_file, fields)
    writer.writeheader()
    for line in events:
        event = json.loads(line)
        writer.writerow(event['attributes'])

DictWriter will simply leave fields empty when the dictionary is missing keys.

Original answer The data is not in CSV format, it appears to contain Python dictionary strings. These can be parsed into Python dictionaries using ast.literal_eval():

from ast import literal_eval

with open('event.log') as events:
    for line in events:
        event = literal_eval(line)
        # process event dictionary

edited Jul 10, 2016 at 9:55

answered Jul 10, 2016 at 8:23

mhawke

87.5k10 gold badges122 silver badges142 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Nabi Shaikh Over a year ago

we require to extract the values of customer id and offer id and category and also in some rows "{ }" with no key : value pair in it Sir , the Result was >>> event {u'MenuItem': u'Category', u'CustomerID': u'364'} @mhawke

mhawke Over a year ago

@NabiShaikh: Once you have the dictionary you can access the attributes in it. Looking at your updated sample of data (which now looks to be JSON data!) you actually have nested dictionaries, so you would access the customer id with event['attributes']['CustomerID'] for example.

Nabi Shaikh Over a year ago

,The EVENT LOG file is in .txt format , its not jason format i am facing error Traceback (most recent call last): File "<stdin>", line 7, in <module> File "C:\Anaconda2\lib\csv.py", line 152, in writerow return self.writer.writerow(self._dict_to_list(rowdict)) File "C:\Anaconda2\lib\csv.py", line 148, in _dict_to_list + ", ".join([repr(x) for x in wrong_fields])) ValueError: dict contains fields not in fieldnames: u'Lat', u'Long'

mhawke Over a year ago

@NabiShaikh: it is a text file, but the contents are JSON. The json parser successfully parses it, doesn't it? Don't pass dictionaries to DictWriter.writerow() that contain keys that you have not defined in the fieldnames argument to DictWriter. In this case Lat and Long are being passed to writerow(). Don't do that.

Mohammad Yusuf · Accepted Answer · 2016-07-11 04:40:16Z

1

This might not be the most efficient way to convert nested json records in a text file (delimited by line) to DataFrame object, but it kinda does the job.

import pandas as pd
import json
from pandas.io.json import json_normalize

with open('path_to_your_text_file.txt', 'rb') as f:
    data = f.readlines()

data = map(lambda x: eval(json_normalize(json.loads(x.rstrip())).to_json(orient="records")[1:-1]), data)
e = pd.DataFrame(data)
print e.head()

answered Jul 11, 2016 at 4:40

Mohammad Yusuf

17.1k12 gold badges60 silver badges88 bronze badges

Collectives™ on Stack Overflow

Extracting required Variables from Event Log file using Python

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related