41

When screen-scraping some website, I extract data from <script> tags.
The data I get is not in standard JSON format. I cannot use json.loads().

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = {'x':1, 'y':2, 'z':3}

Currently, I use regex to transform the raw data to JSON format.
But I feel pretty bad when I encounter complicated data structure.

Do you have some better solutions?

5
  • What is non-standard about the data you want to parse? Commented Jun 4, 2014 at 1:29
  • @HuuNguyen I want to parse Plain old javascript data structure to python object. Commented Jun 4, 2014 at 1:32
  • Oh I didn't see that js_obj didn't have quotes around the keys. How complicated would your data structures get? It's hard to suggest anything without knowing the cases you're trying to solve for. Commented Jun 4, 2014 at 1:34
  • @HuuNguyen js_obj maybe nested Commented Jun 4, 2014 at 1:37
  • there are similar questions on SO already: stackoverflow.com/a/10057449/384442 none of them is offering any ready to use solution Commented Jun 4, 2014 at 1:49

7 Answers 7

62

demjson.decode()

import demjson

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = demjson.decode(js_obj)

chompjs.parse_js_object()

import chompjs

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = chompjs.parse_js_object(js_obj)

jsonnet.evaluate_snippet()

import json, _jsonnet

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = json.loads(_jsonnet.evaluate_snippet('snippet', js_obj))

ast.literal_eval()

import ast

# from
js_obj = "{'x':1, 'y':2, 'z':3}"

# to
py_obj = ast.literal_eval(js_obj)
Sign up to request clarification or add additional context in comments.

6 Comments

JSON does not support circular object
Links to demjson and jsonnet are dead
demjson is giving problems with Python 3 because Setuptools has removed support for 2to3. So it may not be a valid alternative for those using python 3.X right now.
For jsonnet I am getting: /home/hafiz031/anaconda3/envs/py38/lib/python3.8/site-packages/_jsonnet.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv
ModuleNotFoundError: No module named '_jsonnet'
|
17

Use json5

import json5

js_obj = '{x:1, y:2, z:3}'

py_obj = json5.loads(js_obj)

print(py_obj)

# output
# {'x': 1, 'y': 2, 'z': 3}

2 Comments

This is the best one :)
Caution: unless you have very small object, don't use JSON5, it's explicitely stated in their documentation that it is slow. And they are not lying, it is very very slow even on average size JSON. Test it on a real usecase befeore adopting this. (I tested version 0.9.8)
9

I'm facing the same problem this afternoon, and I finally found a quite good solution. That is JSON5.

The syntax of JSON5 is more similar to native JavaScript, so it can help you parse non-standard JSON objects.

You might want to check pyjson5 out.

2 Comments

This is the fastest library. I tried using demjson, my script ran 10.5s. Pyjson5 completes this task in 0.004s.
@Inventor can you post an example of how this works?
4

If you have node available on the system, you can ask it to evaluate the javascript expression for you, and print the stringified result. The resulting JSON can then be fed to json.loads:

def evaluate_javascript(s):
    """Evaluate and stringify a javascript expression in node.js, and convert the
    resulting JSON to a Python object"""
    node = Popen(['node', '-'], stdin=PIPE, stdout=PIPE)
    stdout, _ = node.communicate(f'console.log(JSON.stringify({s}))'.encode('utf8'))
    return json.loads(stdout.decode('utf8'))

1 Comment

After trying other suggestions, I finally finish my problem with this solution. Thank you very much!
3

This will likely not work everywhere, but as a start, here's a simple regex that should convert the keys into quoted strings so you can pass into json.loads. Or is this what you're already doing?

In[70] : quote_keys_regex = r'([\{\s,])(\w+)(:)'

In[71] : re.sub(quote_keys_regex, r'\1"\2"\3', js_obj)
Out[71]: '{"x":1, "y":2, "z":3}'

In[72] : js_obj_2 = '{x:1, y:2, z:{k:3,j:2}}'

Int[73]: re.sub(quote_keys_regex, r'\1"\2"\3', js_obj_2)
Out[73]: '{"x":1, "y":2, "z":{"k":3,"j":2}}'

Comments

2

Not including objects

json.loads()

  • json.loads() doesn't accept undefined, you have to change to null
  • json.loads() only accept double quotes
    • {"foo": 1, "bar": null}

Use this if you are sure that your javascript code only have double quotes on key names.

import json

json_text = """{"foo": 1, "bar": undefined}"""
json_text = re.sub(r'("\s*:\s*)undefined(\s*[,}])', '\\1null\\2', json_text)

py_obj = json.loads(json_text)

ast.literal_eval()

  • ast.literal_eval() doesn't accept undefined, you have to change to None
  • ast.literal_eval() doesn't accept null, you have to change to None
  • ast.literal_eval() doesn't accept true, you have to change to True
  • ast.literal_eval() doesn't accept false, you have to change to False
  • ast.literal_eval() accept single and double quotes
    • {"foo": 1, "bar": None} or {'foo': 1, 'bar': None}
import ast

js_obj = """{'foo': 1, 'bar': undefined}"""
js_obj = re.sub(r'([\'\"]\s*:\s*)undefined(\s*[,}])', '\\1None\\2', js_obj)
js_obj = re.sub(r'([\'\"]\s*:\s*)null(\s*[,}])', '\\1None\\2', js_obj)
js_obj = re.sub(r'([\'\"]\s*:\s*)NaN(\s*[,}])', '\\1None\\2', js_obj)
js_obj = re.sub(r'([\'\"]\s*:\s*)true(\s*[,}])', '\\1True\\2', js_obj)
js_obj = re.sub(r'([\'\"]\s*:\s*)false(\s*[,}])', '\\1False\\2', js_obj)

py_obj = ast.literal_eval(js_obj) 

Comments

2

Some answers here are outdated, so here's a speed comparison between json5, hjson, and chompjs. (ast.literal_eval and json.loads failed). Evaluated functions on a 1 MB js object to get a good sample. All 3 successes produced an identical dictionary.

#   100.00% - reference time
chompjs.parse_js_object(text)

#   666.65% - 7 times slower
hjson.loads(text)

# 60460.57% - 605 times slower
json5.loads(text)

# fail
ast.literal_eval(text)
json.loads(text)

# won't install on Python 3.11.9
demjson
jsonnet

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.