I have a problem with a DataStage Parallel Job that doesn't parse strings properly. I have a csv file, with two columns containing json-shaped strings. The job reads it with a Sequential File stage, where the delimiter is set as comma and the quotation mark is double quote. I don't produce the file and can't ask to modify it. The strings have this form:
"{""key1"":""value1"",""key2"":""value2""}"
Datastage reads "{" and interprets it as the string that have to be inserted in the correspondent field, and ignores the rest of the string - "key1"":""value1"",""key2"":""value2""}". I tried to run the job with a test file where those strings were replaced with strings of the following form:
'{"key1":"value1","key2":"value2"}'
and it worked. I know the reason is that the replacement of some of the double quotes with single quotes allows DataStage to understand the structure of the beginning and the end of the string. But I would like to know if there is any way to solve the problem while keeping those strings in the original form.
Thanks in advance to anyone who can help.
sed -E -e "s/\"{2}/'/g", but this could cause problems, if a string within a json (like "value1") again contains single quotes. If you know your source data will not have such strings, it could work.