I am parsing a scraped html page that contains a script with JSON inside. This JSON contains all info I am looking for but I can't figure out how to extract a valid JSON.
Minimal example:
my_string = '
(function(){
window.__PRELOADED_STATE__ = window.__PRELOADED_STATE__ || [];
window.__PRELOADED_STATE__.push(
{ *placeholder representing valid JSON inside* }
);
})()
'
The json inside is valid according to jsonlinter.
The result should be loaded into a dictionary:
import json
import re
my_json = re.findall(r'.*(?={\").*', my_string)[0] // extract json
data = json.loads(my_json)
// print(data)
regex: https://regex101.com/r/r0OYZ0/1
This try results in:
>>> data = json.loads(my_json)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
File "<console>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 7 (char 6)
How can the JSON be extracted and loaded from the string with Python 3.7.x?
json.loadsmethod (don't forget thes). But your string doesn't seem to be valid json.{"publicRuntimeConfig":{"public }