0

I have scraped a JavaScript object which I can't parse as JSON because it has unquoted keys.

I found a solution here which says to load the object as a Python data structure using the PyYaml library, and then write it back out as valid JSON:

https://stackoverflow.com/a/31030022/10601287

This would be a great solution for me, however yaml.load(js_obj) causes the keys & values to merge together as a key, and causes the value to default to 'None'. This is my code snippet:

import yaml

yaml_obj = yaml.safe_load(js_obj)

print(yaml_obj)

Example of the JavaScript Object before loaded as YAML (in reality it is much bigger than this):

{
  path:"1/83656/83659/83669/83670",
  is_active:!0,
  level:4,
  children_count:0,
  product_count:59,
  parent_id:83669,
  name:"Red Wine",
  position:1,
  id:83670,
  include_in_menu:1,
  url_key:"red-wine-83670",
  url_path:"liquor/wine/red-wine.html",
  _score:null,
  slug:"red-wine-83670"
}

After yaml.load(js_obj):

{
  'path:"1/83656/83659/83669/83670"': None, 
  'is_active:!0': None, 
  'level:4': None, 
  'children_count:0': None, 
  'product_count:59': None, 
  'parent_id:83669': None, 
  'name:"Red Wine"': None, 
  'position:1': None, 
  'id:83670': None, 
  'include_in_menu:1': None, 
  'url_key:"red-wine-83670"': None, 
  'url_path:"liquor/wine/red-wine.html"': None, 
  '_score:null': None, 
  'slug:"red-wine-83670"': None
}

Any advice would be greatly appreciated.

2
  • 1
    You need to do this in Python? Commented Apr 19, 2021 at 14:24
  • Yes unfortunately I do because I am actually using an analytics software called Knime, where I can only use Python scripts. Commented Apr 19, 2021 at 14:44

1 Answer 1

1

YAML requires the colon in a mapping to be followed by at least one space character, so your input isn't valid YAML either. If the format is as simple as your example indicates, you could preprocess it into YAML by searching for a word at the beginning of a line followed by a colon and inserting a space after the colon. (Or you could insert quotes around the word to make it JSON, but you're going have a problem with is_active:!0,, because !0 isn't a JSON value.)

So you could try something like:

import re
first_word = re.compile(r"^\s*[_a-zA-Z]\w*:") 

# ...
    yaml_obj = yaml.load(first_word.replace(r"\g<0> ", js_obj))

Of course, if the input is less regular, that could fail horribly.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.