2

enter image description here

sample first row of event log file ,here i have successfully extracted evrything apart from last key value pair which is attribute-

{"event_type":"ActionClicked","event_timestamp":1451583172592,"arrival_timestamp":1451608731845,"event_version":"3.0",
  "application":{"app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:
    4d9cf803-0487-44ec-be27-1e160d15df74","package_name":"com.think.vito","sdk":{"name":"aws-sdk-android","version":"2.2.2"}
    ,"title":"Vito","version_name":"1.0.2.1","version_code":"3"},"client":{"client_id":"438b152e-5b7c-4e99-9216-831fc15b0c07",
      "cognito_id":"us-east-1:448efb89-f382-4975-a1a1-dd8a79e1dd0c"},"device":{"locale":{"code":"en_GB","country":"GB",
        "language":"en"},"make":"samsung","model":"GT-S5312","platform":{"name":"ANDROID","version":"4.1.2"}},
  "session":{"session_id":"c15b0c07-20151231-173052586","start_timestamp":1451583052586},"attributes":{"OfferID":"20186",
    "Category":"40000","CustomerID":"304"},"metrics":{}}

Hello Every One ,I am trying to extract the content from Event log file as shown in attached image .As to requirement i have to fetch customer ID, offer id, category these are important variable i need to extract from the this event log file .this is csv formatted file. i tryed with regular expression but it is't working because you can observe format of every column is different. As you see first row has category customer id offer id and second row is totally blank in this case regular expression wont work apart from this we have to consider we have to consider all possible condition, we has 14000 sample.in Event log file ...#Jason # Parsing #Python #Pandas

6
  • 1
    Is this a plain text file? Does every line start and end with {}? If so, seems like you can read the file line by line and use literal_eval to turn each line to a Python dict object. Commented Jul 10, 2016 at 8:12
  • 1
    Can you provide the actual piece of your data log instead of the image format? You don't expect us to type your data one by one, right? Commented Jul 10, 2016 at 8:22
  • yes , it was in txt format earlier.it was huge file i extracted below variable from event log file event_type event_timestamp arrival_timestamp event_version application { app_id cognito_identity_pool_id } client{} device{} session{} attributes{} Commented Jul 10, 2016 at 8:23
  • Why do you have single quotes in the image but double quotes in the text? (The latter could be in JSON format.) Commented Jul 10, 2016 at 8:35
  • @ayhan image file is in csv format and where as the in text form its in .txt format ...after extracting from .txt file i separated every key to individual csv file . Commented Jul 10, 2016 at 9:12

2 Answers 2

2

Edit

The data, after your edit, now appears to be JSON data. You can still use literal_eval as below, or you could use the json module:

import json

with open('event.log') as events:
    for line in events:
        event = json.loads(line)
        # process event dictionary

To access the CustomerID, OfferID, Category etc. you need to access the nested dictionary associated with the key 'attributes' in the event dictionary:

print(event['attributes']['CustomerID'])
print(event['attributes']['OfferID'])
print(event['attributes']['Category'])

If it is the case that some keys could be missing use dict.get() instead:

print(event['attributes'].get('CustomerID'))
print(event['attributes'].get('OfferID'))
print(event['attributes'].get('Category'))

Now you will get None if the key is missing.

You can extend this principle to access other items with the dictionary.

If I understand your question you also want to create a CSV file containing the extracted fields. You use the extracted values with csv.DictWriter like this:

import csv

with open('event.log') as events, open('output.csv', 'w') as csv_file:
    fields = ['CustomerID', 'OfferID', 'Category']
    writer = csv.DictWriter(csv_file, fields)
    writer.writeheader()
    for line in events:
        event = json.loads(line)
        writer.writerow(event['attributes'])

DictWriter will simply leave fields empty when the dictionary is missing keys.


Original answer The data is not in CSV format, it appears to contain Python dictionary strings. These can be parsed into Python dictionaries using ast.literal_eval():

from ast import literal_eval

with open('event.log') as events:
    for line in events:
        event = literal_eval(line)
        # process event dictionary
Sign up to request clarification or add additional context in comments.

4 Comments

we require to extract the values of customer id and offer id and category and also in some rows "{ }" with no key : value pair in it Sir , the Result was >>> event {u'MenuItem': u'Category', u'CustomerID': u'364'} @mhawke
@NabiShaikh: Once you have the dictionary you can access the attributes in it. Looking at your updated sample of data (which now looks to be JSON data!) you actually have nested dictionaries, so you would access the customer id with event['attributes']['CustomerID'] for example.
,The EVENT LOG file is in .txt format , its not jason format i am facing error Traceback (most recent call last): File "<stdin>", line 7, in <module> File "C:\Anaconda2\lib\csv.py", line 152, in writerow return self.writer.writerow(self._dict_to_list(rowdict)) File "C:\Anaconda2\lib\csv.py", line 148, in _dict_to_list + ", ".join([repr(x) for x in wrong_fields])) ValueError: dict contains fields not in fieldnames: u'Lat', u'Long'
@NabiShaikh: it is a text file, but the contents are JSON. The json parser successfully parses it, doesn't it? Don't pass dictionaries to DictWriter.writerow() that contain keys that you have not defined in the fieldnames argument to DictWriter. In this case Lat and Long are being passed to writerow(). Don't do that.
1

This might not be the most efficient way to convert nested json records in a text file (delimited by line) to DataFrame object, but it kinda does the job.

import pandas as pd
import json
from pandas.io.json import json_normalize

with open('path_to_your_text_file.txt', 'rb') as f:
    data = f.readlines()

data = map(lambda x: eval(json_normalize(json.loads(x.rstrip())).to_json(orient="records")[1:-1]), data)
e = pd.DataFrame(data)
print e.head()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.