1

I am trying to parse json to find the value of a desired key. I am doing so recursively. If there is another, fast or more efficient way to do so, I am open

example json:

{  
   "data_version":"5",
   "application":{  
      "platform":"iPhone",
      "os":"iPhone OS",
      "locale":"en_US",
      "app_version":"unknown",
      "mobile":{  
         "device":"iPhone",
         "carrier":"Verizon",
      }
   },
   "event_header":{  
      "accept_language":"en-us",
      "topic_name":"mobile-clickstream",
      "server_timestamp":1416958459572,
      "version":"1.0"
   },
   "session":{  
      "properties":{  

      }
   },
   "event":{  
      "timestamp":1416958459185,
      "properties":{  
         "event_sequence_number":97
      }
   }
}

here is what I have so far

def json_scan(json_obj, key):
    result = None
    for element in json_obj:
        if str(element) == key:
            result = json_obj[element]
        else:
            if type(json_obj[element]) == DictType:
                json_scan(json_obj[element], key)
            elif type(json_obj[element]) == ListType:
                json_scan(element, key)
    return result

expected output:

>>> json_scan(json_obj, "timestamp")
1416958459185

As I go through the debugger, I am able to find the the desired value but the line result = None resets result to None and at the end of the method, the value I get is None. I'm not sure how to fix this. I tried removing the line but I get error because result is not preset to a value.

6
  • 1
    You need to assign the recursive call - e.g. result = json_scan(element, key) Commented Dec 1, 2014 at 17:54
  • @jonrsharpe i tried this but the line still resets result to None Commented Dec 1, 2014 at 18:08
  • What do you mean "the line"? You shouldn't need result = None at all. Commented Dec 1, 2014 at 18:11
  • @jonrsharpe I removed the line result = None and added the adjustments you suggested for the recursive calls ( result = json_scan(json_obj[element], key) and result = json_scan(element, key)) but I am getting the local variable "result" referenced before assignment error Commented Dec 1, 2014 at 18:17
  • 1
    In that case, for element in json_obj: isn't happening in one of the calls (whatever json_obj is is empty), hence you reach return result before it gets assigned. Commented Dec 1, 2014 at 18:18

3 Answers 3

2

Using json library in order to parse the json file (some commas should be deleted) and using native dict types :

def json_scan(json_obj, key):
    d = json.loads(json_obj)

    def _(dictobj, lookup):
        if lookup in dictobj.keys():
            return dictobj[lookup]
        else:
            for sub_dictobj in [d for d in dictobj.values() if type(d) == DictType]:
                result = _(sub_dictobj, lookup)
                if result:
                    return result
            return None

    return _(d, key)

A more complete version :

def json_scan(json_obj, key):
    d = json.loads(json_obj)

    def _(dictobj, lookup):
        if lookup in dictobj.keys():
            return dictobj[lookup]
        else:
            for sub_dictobj in [d for d in dictobj.values() if type(d) == DictType]:
                result = _(sub_dictobj, lookup)
                if result:
                    return result

            # if objects in dictobj.values() are lists, go through them
            for listobject in [l for l in dictobj.values() if type(d) == list]:
                for sub_dictobj in [d for d in listobject if type(d) == DictType]:
                    result = _(sub_dictobj, lookup)
                    if result:
                        return result
            return None

    return _(d, key)

EDIT (2015/04/25):

After looking @ PyCon 2015 videos, I came across dict_digger :

http://jtushman.github.io/blog/2013/11/06/dict-digger/ https://github.com/jtushman/dict_digger

It comes with tests...

Sign up to request clarification or add additional context in comments.

2 Comments

This was an excellent solution. However, it does not take consideration when the key is in a dictionary within a list. {"events": [ { "network_type":"unknown", "properties": { "DeviceOsVersion":"iopi-iOS-8.1", "DeviceModel":"iopi-iPhone Simulator", "AppVersion":"iopi-iPhone-7.4b1 (lib 7.4b1)"}, "timestamp":1416848861703 } ], "session": { "properties":{} } }
You're right ! Adding a test for list type to support deeper search resolves the limitation.
1

You should return result from inside your if statement. So, your code would be:

def json_scan(json_obj, key):
    for element in json_obj:
        if str(element) == key:
            result = json_obj[element]
            return result
        else:
            if type(json_obj[element]) == DictType:
                json_scan(json_obj[element], key)
            elif type(json_obj[element]) == ListType:
                json_scan(element, key)
    return None

That way if you find the result, it'll return it immediately instead of resetting it to None. If it doesn't find it, it'll still return None at the end.

5 Comments

It finds the key but that line still resets results to None
What if you remove result = None and only assign result inside of your if statement? See my edit.
@Liondancer Does it still return None if you replace json_scan(...) in ekrah's code with return json_scan(...)?
@tepples I get UnboundLocalError: local variable 'result' referenced before assignment it is due to the empty dictionaries in my JSON
You actually don't really need result. Just return json_obj[element].
1

The problem is that you don't assign the recursive calls to result:

def json_scan(json_obj, key):
    result = None
    for element in json_obj:
        if str(element) == key:
            result = json_obj[element]
        else:
            if type(json_obj[element]) == DictType:
                result = json_scan(json_obj[element], key)
            elif type(json_obj[element]) == ListType:
                result = json_scan(element, key)
    return result

Another problem is that your scan doesn't work for lists - json_obj[element] is only going to work for dicts - but since your data doesn't have lists, its working for now. You should remove list processing completely (unless you really have lists, then the algorithm needs to change).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.