0

I have some rather large JSON files. Each contains thousands of objects within one (1) array. The JSONs are structured in the following format:

{
    "alert": [
    { "field1": "abc",
    "field2": "def",
    "field3": "xyz
},
{ "field1": null,
"field2": null,
"field3": "xyz",
},
...
...
]

What's the most efficient way to use Python and the json library to search through a JSON file, find the unique values in each object within the array, and count how many times they appear? E.g., search the array's "field3" objects for the value "xyz" and count how many times it appears. I tried a few variations based on existing solutions in StackOverflow, but they are not providing the results I'm looking for.

1 Answer 1

1

A quick search on PyPI turned up

Here's an example which should work for your data

import ijson

counts = {}
with file("data.json") as f:
    objects = ijson.items(f, 'alert.item')
    for o in objects:
        for k, v in o.items():
            field = counts.get(k,{})
            total = field.get(v,0)
            field[v] = total + 1
            counts[k] = field

import json
print json.dumps(counts, indent=2)

running this with your sample data in data.json produces

{
  "field2": {
    "null": 1, 
    "def": 1
  }, 
  "field3": {
    "xyz": 2
  }, 
  "field1": {
    "null": 1, 
    "abc": 1
  }
}

Note however that the null in your input was transformed into the string "null".

As a point of comparison, here is a jq command which produces an equivalent result using tostream

 jq -M '
    reduce (tostream|select(length==2)) as [$p,$v] (
      {}
    ; ($p[2:]+[$v|tostring]) as $k
    | setpath($k; getpath($k)+1)
    )
' data.json
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.