1

I'm processing the output of a forensic perl program in Python 2.7. Assuming you wanted to parse the below data into a nested object for ingestion into another program (e.g. splunk, etc...)

I'm struggling with conceptualization of how the data should be presented programmatically without losing myself into the vastness of nested objects. I tried to find some good resources online for how to best approach writing a nested object but failed horribly.

Any additional resources regarding this subject outside those found on SO/Python manual would be greatly appreciated.

Raw data

appcompatflags v.20130930
(NTUSER.DAT, Software) Extracts AppCompatFlags for Windows.


Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CompatibilityAssistant\Store
Fri Jul 10 11:00:24 2015 - E:\VMware-player-4.0.6-1035888.exe
Fri Jul 10 11:00:24 2015 - C:\Users\aUser\AppData\Local\Microsoft\OneDrive\17.3.5892.0626\FileSyncConfig.exe
Fri Jul 10 11:00:24 2015 - C:\Users\aUser\AppData\Local\Microsoft\OneDrive\Update\OneDriveSetup.exe
Fri Jul 10 11:00:24 2015 - C:\Users\aUser\AppData\Local\Microsoft\OneDrive\17.3.6201.1019\FileSyncConfig.exe
Fri Jul 10 11:00:24 2015 - E:\AdbeRdr11000_mui_Std\Setup.exe

The field/values I've considered using

Top Level Name - NTUser 
'''
I'm not positive this is correct for JSON. The program runs off a declaration of which 
registry I want to parse (e.g. NTUSER, SYSTEM, etc...) Generating a multitude
of results from various plugins found within the program.
'''

Fields : Values

Plugin_Name:     appcompatflags v.20130930, someOtherPlugin
Description:     (NTUSER.DAT, Software) Extracts AppCompatFlags for Windows.
Location:        Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags
Date:            Fri Jul 10 11:00:24 2015, Fri Jul 10 11:00:24 2015, etc...  
Result:          E:\VMware-player-4.0.6-1035888.exe, E:\AdbeRdr11000_mui_Std\Setup.exe

Edit: I recognize this is a general question, I'm still really new to programming and I'm hopeful someone on SO can help alleviate the headache this is causing me by pointing me in the right direction.

5
  • What part of the data here is going to be nested? What do you mean by "parsing directly into JSON"? Commented Feb 24, 2016 at 23:07
  • 1
    Seems ok to me. You should try to keep the dates and results together, So I'd say instead of a separate date and result field have a list of results each containing a date and result. I don't really understand your data so I might be way off. You will be making a dict first and then creating json, if you wish to send the data somewhere (i.e. file or to another machine). Commented Feb 24, 2016 at 23:09
  • @ImNotLeet so would the JSON output you're looking for be something like {"NTUser": {"Plugin_Name": "appcompat"...}}? Also, if I look at the output from the data, you may be looking for something like: {registry_key: { plugin_one: {description, location, date, result}, plugin_two: {...}}, registry_key_two: {...}}. Does that seem right? Commented Feb 24, 2016 at 23:22
  • @ImNotLeet Gotcha, let me try to write an answer with the format you may be looking for. Commented Feb 24, 2016 at 23:28
  • @GaneshDatta yes that is essentially what I need, can you suggest any good reading on the topic beyond this SO post? Commented Feb 24, 2016 at 23:28

1 Answer 1

1

Not sure what you mean by "parsing directly into JSON". In order to get the data you have into a JSON format, you do need to run it through some sort of script, in this case I assume you have chosen to write it in Python. Within this script, you'll be parsing the values into a dict, and then outputting this dict as a JSON file.

Here's an example of how your JSON could look:

{
    "NTUser": {
        "appcompat": {
            "description": "(NTUSER.DAT, Software) Extracts AppCompatFlags for Windows.",
            "location": "Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags",
            "data": [
                        {"date": "Fri Jul 10 11:00:24 2015", "result": "E:\VMware-player-4.0.6-1035888.exe"},
                        {"date": "Fri Jul 10 11:00:24 2015", "result": "E:\AdbeRdr11000_mui_Std\Setup.exe"}
            ]
        },
        "some_other_plugin": {},
    },
    "SomeOtherRegistry": {}
}

With regards to further reading, the Wikipedia page has some good explanations and examples for how to represent various kinds of data in JSON format. On the Python side, I would learn about how dictionaries work - a dict will map directly to JSON, so working with dicts is crucial.

Sign up to request clarification or add additional context in comments.

2 Comments

Parsing directly into JSON was me over thinking it, being new it's hard not to feel overwhelmed. I'm pretty familiar with dictionaries, if I wanted to write the above as a nested dict would it look similar to: {NTUser:{appcom{description:value, location:value}}} and should i use orderdict to preserve the input?
Totally understand! The nested dict would be exactly as it looks in JSON. I'm not sure why an OrderedDict is needed in this case, so use it only if you need to data (specifically the dict keys) to be in the order you insert them in.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.