Update: Please disregard this solution because it is flawed. Thanks to @jhnc for giving me the sample that exposes the flaws. The errant unescaped quotes within the strings cannot be found like this and in my estimation, by no other method.
This might work. Uses PCRE to skip past valid syntax.
This can also work using Python regex engine "import regex" -
This must be installed first.
This is tested and works on your sample but that doesn't mean it will work
on all cases. Its unknown on all possible cases.
Unfortunately these are the only engines that would support this regex:
PCRE/Perl/Python regex/ECMAScript/C# (all Dot Net).
Otherwise a (Json) descent parser is needed.
(?:[\[{,:\]}]+[\[{,:\]}\s]*"(*SKIP)(*FAIL)|"(?!\s*[\[{,:\]}]))
replace \\$0
https://regex101.com/r/N13TgZ/1
(?:
[\[{,:\]}]+
[\[{,:\]}\s]*
"
(*SKIP) (*FAIL)
|
"
(?! \s* [\[{,:\]}] )
)
ECMAScript version
(?<![\[{,:\]}]+[\[{,:\]}\s]*)"(?!\s*[\[{,:\]}])
replace \\$&
https://regex101.com/r/ZdHaDw/1
These regex work on minimized json as well.
Minimized Source :
{"Equipment Contents":["XL APPEARANCE GROUP","body-color grille","body-color body-side molding with chrome insert","luggage rack","15" deep-dish cast aluminum wheels","P235/75R15SL all-terrain OWL tires","power door locks","power windows","p ower mirrors","cloth captains chairs"]}
PCRE/Perl: https://regex101.com/r/qSTp8S/1
ECMAScript: https://regex101.com/r/CG74kA/1
To use the Python 3rd party regex module, it has to be installed intp your Python setup.
I have a Windows box with Python 3.7 installed along with the regex module.
Use the pip install command pip install regex. I think it's here
https://pypi.org/project/regex but not sure.
This is what it runs like on my box:
>>> import regex
>>>
>>> pattern = regex.compile(r'''(?:[\[{,:\]}]+[\[{,:\]}\s]*"(*SKIP)(*FAIL)|"(?!\s*[\[{,:\]}]))''')
>>>
>>> data = r'''{
... "Equipment Contents": [
... "XL APPEARANCE GROUP",
... "body-color grille",
... "body-color body-side molding with chrome insert",
... "luggage rack",
... "15" deep-dish cast aluminum wheels",
... "16" x 6.5" Chrome Wheels",
... "P235/75R15SL all-terrain OWL tires",
... "power door locks",
... "power windows",
... "power mirrors",
... "cloth captains chairs"
... ]
... }
... '''
>>>
>>> result_string = regex.sub(pattern, '\\"', data)
>>>
>>> print(result_string)
{
"Equipment Contents": [
"XL APPEARANCE GROUP",
"body-color grille",
"body-color body-side molding with chrome insert",
"luggage rack",
"15\" deep-dish cast aluminum wheels",
"16\" x 6.5\" Chrome Wheels",
"P235/75R15SL all-terrain OWL tires",
"power door locks",
"power windows",
"power mirrors",
"cloth captains chairs"
]
}
>>>
>>>