0

Using a sequence of GenerateTableFetch, ExecuteSQL, SplitAvro, and ConvertAvroToJSON processors, I am fetching a JSON field from a MySql view that has this content:
"A 7-point scale (1=\u201Cnot at all\u201D to 7=\u201Cextremely\u201D) is used.."

If I view the content of the file in a queue and chose option formatted (as opposed to original), I get this:
"A 7-point scale (1=“not at all” to 7=“extremely”) is used..."

And this unescaped string is what I would like to store in a NoSQL db. Is this in-built NIFI viewer using a function that I can tap into?

I am asking this because later in the flow, I wrap the JSON within an xml tag in order to transform it to XML using an XSLT stylesheet. But I end up with the unicode characters after the transformation and would like to retrieve back the original unescaped JSON (before I store it in the NoSQL db).

7
  • I am trying to avoid working with FlowFile attributes. I would like to work with the content. Commented May 27, 2020 at 17:56
  • 1
    \u201C is a json-encoded character in a string that represents . so json formatted nifi-viewer decodes it to display. Commented May 27, 2020 at 20:48
  • @daggett Thanks, from this comment and the answer from Andy, the direction seems to be that I need to provide the conversion mappings myself...there are many other occurrences of quotes and em-dashes that I will need to preserve due to authors requirements. Commented May 27, 2020 at 21:11
  • but this is a correct json value. why do you need to replace it? are you going to store json into nosql or a text value from this json? Commented May 27, 2020 at 21:56
  • @daggett the string \u201C is showing in the NoSQL content...I am using the XSLT transform from stackoverflow.com/questions/13007280/… (the one that supports null) and I suspect it is converting the unicode sequence to a literal string with characters \u201C, otherwise the NoSQL db would have been able to unescape it properly Commented May 28, 2020 at 0:11

1 Answer 1

1

You can use a ReplaceText processor to replace all instances of a byte sequence (\u201C) in the flowfile content with . If you need the leading and trailing quotes to be different, you can use ReplaceTextWithMapping to associate the different Unicode code points with the specific replacement value. If you don't, you can just use the generic ReplaceText, match \u201[CD], and replace it with ".

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Andy. I will use an ExecuteScript processor with a Python script to global-replace all the unicode escape sequences I can identify. I really don't like having to take this approach since it can be rather tedious. I was hoping to borrow the services of the NIFI viewer.
Andy, your response is appropriate within the context of my question. Because I should have been more specific in the formulation of my question, I am referring anyone that comes here to my last comment to daggett above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.