Unicode escape sequence in NIFI flow to convert JSON to XML

Question

Using a sequence of GenerateTableFetch, ExecuteSQL, SplitAvro, and ConvertAvroToJSON processors, I am fetching a JSON field from a MySql view that has this content:
"A 7-point scale (1=\u201Cnot at all\u201D to 7=\u201Cextremely\u201D) is used.."

If I view the content of the file in a queue and chose option formatted (as opposed to original), I get this:
"A 7-point scale (1=“not at all” to 7=“extremely”) is used..."

And this unescaped string is what I would like to store in a NoSQL db. Is this in-built NIFI viewer using a function that I can tap into?

I am asking this because later in the flow, I wrap the JSON within an xml tag in order to transform it to XML using an XSLT stylesheet. But I end up with the unicode characters after the transformation and would like to retrieve back the original unescaped JSON (before I store it in the NoSQL db).

I am trying to avoid working with FlowFile attributes. I would like to work with the content. — FernOfTheAndes
– FernOfTheAndes, Commented May 27, 2020 at 17:56
\u201C is a json-encoded character in a string that represents “. so json formatted nifi-viewer decodes it to display. — daggett
– daggett, Commented May 27, 2020 at 20:48
@daggett Thanks, from this comment and the answer from Andy, the direction seems to be that I need to provide the conversion mappings myself...there are many other occurrences of quotes and em-dashes that I will need to preserve due to authors requirements. — FernOfTheAndes
– FernOfTheAndes, Commented May 27, 2020 at 21:11
but this is a correct json value. why do you need to replace it? are you going to store json into nosql or a text value from this json? — daggett
– daggett, Commented May 27, 2020 at 21:56
@daggett the string \u201C is showing in the NoSQL content...I am using the XSLT transform from stackoverflow.com/questions/13007280/… (the one that supports null) and I suspect it is converting the unicode sequence to a literal string with characters \u201C, otherwise the NoSQL db would have been able to unescape it properly — FernOfTheAndes
– FernOfTheAndes, Commented May 28, 2020 at 0:11

Andy · Accepted Answer · 2020-05-27 20:58:25Z

1

You can use a ReplaceText processor to replace all instances of a byte sequence (\u201C) in the flowfile content with “. If you need the leading and trailing quotes to be different, you can use ReplaceTextWithMapping to associate the different Unicode code points with the specific replacement value. If you don't, you can just use the generic ReplaceText, match \u201[CD], and replace it with ".

answered May 27, 2020 at 20:58

Andy

14.2k2 gold badges38 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

FernOfTheAndes Over a year ago

Thanks Andy. I will use an ExecuteScript processor with a Python script to global-replace all the unicode escape sequences I can identify. I really don't like having to take this approach since it can be rather tedious. I was hoping to borrow the services of the NIFI viewer.

FernOfTheAndes Over a year ago

Andy, your response is appropriate within the context of my question. Because I should have been more specific in the formulation of my question, I am referring anyone that comes here to my last comment to daggett above.

Collectives™ on Stack Overflow

Unicode escape sequence in NIFI flow to convert JSON to XML

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related