13

Firstly, I understand that comments aren't valid json. That said, for some reason this .json file I have to process has comments at the start of lines and at the end of lines.

How can i handle this in python and basically load the .json file but ignore the comments so that I can process it? I am currently doing the following:

with open('/home/sam/Lean/Launcher/bin/Debug/config.json', 'r') as f:
        config_data=json.load(f)

But this crashes at the json.load(f) command because the file f has comments in it.

I thought this would be a common problem but I can't find much online RE how to handle it in python. Someone suggested commentjson but that makes my script crash saying

ImportError: cannot import name 'dump'

When I import commentjson

Thoughts?

Edit: Here is a snippet of the json file i must process.

{
  // this configuration file works by first loading all top-level
  // configuration items and then will load the specified environment
  // on top, this provides a layering affect. environment names can be
  // anything, and just require definition in this file. There's
  // two predefined environments, 'backtesting' and 'live', feel free
  // to add more!

  "environment": "backtesting",// "live-paper", "backtesting", "live-interactive", "live-interactive-iqfeed"

  // algorithm class selector
  "algorithm-type-name": "BasicTemplateAlgorithm",

  // Algorithm language selector - options CSharp, FSharp, VisualBasic, Python, Java
  "algorithm-language": "CSharp"
}

5 Answers 5

12

Switch into json5. The JSON 5 is a very small superset of JSON that supports comments and few other features you could just ignore.

import json5 as json
# and the rest is the same

It is beta, and it is slower, but if you just need to read some short configuration once when starting the program, this probably can be considered as an option. It is better to switch into another standard than not to follow any.

Sign up to request clarification or add additional context in comments.

3 Comments

Is this module included in the standard library? If so, at what version?
Appears to be a package - see pypi.org/project/json5
It is noticeable slower than the standard json.
8

kind of a hack (because if there are // within the json data then it will fail) but simple enough for most cases:

import json,re

s = """{
  // this configuration file works by first loading all top-level
  // configuration items and then will load the specified environment
  // on top, this provides a layering affect. environment names can be
  // anything, and just require definition in this file. There's
  // two predefined environments, 'backtesting' and 'live', feel free
  // to add more!

  "environment": "backtesting",// "live-paper", "backtesting", "live-interactive", "live-interactive-iqfeed"

  // algorithm class selector
  "algorithm-type-name": "BasicTemplateAlgorithm",

  // Algorithm language selector - options CSharp, FSharp, VisualBasic, Python, Java
  "algorithm-language": "CSharp"
}
"""

result = json.loads(re.sub("//.*","",s,flags=re.MULTILINE))

print(result)

gives:

{'environment': 'backtesting', 'algorithm-type-name': 'BasicTemplateAlgorithm', 'algorithm-language': 'CSharp'}

apply regular expression to all the lines, removing double slashes and all that follows.

Maybe a state machine parsing the line would be better to make sure the // aren't in quotes, but that's slightly more complex (but doable)

3 Comments

Beware that if any of the configs contain a url with // then this will fail. To account for that use something like this to ensure there is whitespace before json.loads(re.sub("\s//.*", "", s, flags=re.MULTILINE))
okay but with a space this isn't going to work for the first line if the first line starts by //. Something like ^|\s then but then you have to group and don't capture. Oh well...
Given just how common it is to find comments in JSON (official or not, they are not uncommon), I figured the loads function would have an option which would be false by default (to follow the spec) but could be enabled to strip them out. Sadly there's no such option in the documentation. docs.python.org/3/library/json.html
2

I haven't used it personally but you can have a look on JSONComment python package which supports parsing a json file with comment. Use it in place of JsonParser

parser = JsonComment(json)
parsed_object = parser.loads(jsonString)

Comments

0

We use a powerful json preprocessor to solve this problem. Next to comments it supports also

  • import (nested) JSON files
  • use ${variable} syntax to reference already before defined variables
  • use python syntax (True, False, None, …)
  • . (dot) syntax for dictionary objects

Download: JsonPreprocessor (PyPI)

This allows common definitions and hierarchical structures for huge projects.

We use also a VSCode Plugin for JSONP syntax: test-fullautomation/vscode-jsonp (github.com)

Comments

-1

You can take out the comments with the following:

data=re.sub("//.*?\n","",data)
data=re.sub("/\\*.*?\\*/","",data)

This should remove all comments from the data. It could cause problems if there are // or /* inside your strings

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.