1

I'm trying to parse the code of JavaScript objects that hold huge JavaScript arrays and convert it to a Python dictionary with lists.

At the moment I'm using PyYaml, but that didn't work directly, as it can't handle consecutive commas (e.g. it breaks on '[,,,0,]' with: expected the node content, but found ','). So I substituted these out, but this is all very slow. I'm wondering if any of you know of a better and faster way to do this. JSON decode doesn't work as JavaScript code isn't JSON valid either.

This is the code I'm using, explained above, with js_obj as example:

js_obj = "{index: '37',data: [, 1, 2, 3,,,]}"

def repl(match):
    content = re.sub(" ", "",match.group(0))
    length = len(content) - 1
    result = ''
    if content[0] == '[':
        result = '[""'
        length -= 1

    after = ','
    if content[-1] == ']':
        length -= 1
        after += '""]'

    return result + (',""' * length) + after

py_dict = yaml.load(re.sub('\[? *(, *)+\]?', repl, js_obj))
2

1 Answer 1

1

You probably should write data from JavaScript using JSON, and then read it into Python in JSON. YAML is OK, but I tend to prefer JSON over YAML; JSON is more consistent.

If you must parse the JavaScript, you might want to look into pyparsing or similar.

Sign up to request clarification or add additional context in comments.

1 Comment

I have no access to the JavaScript side, but I'll have a look at pyparsing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.