1

My hive table test_tbl has a string field tag_score, which contains some json string separated by ";":

----------
tag_score |
--------------------------------------------------------------------------------
{"keyword":"abc","score": "0.6"};{"keyword":"烟花","score":"0.516409816917747"} |
--------------------------------------------------------------------------------

How can I extract the correct json from it? The result string should be like this:

[{"keyword":"abc","score": "0.6"},{"keyword":"烟花","score":"0.516409816917747"}]

I've tried this hive sql:

select split(tag, ";") from test_tbl;

But I got array of string, not the desired one:

["{"keyword":"abc","score": "0.6"}","{"keyword":"烟花","score":"0.516409816917747"}"]

1 Answer 1

1

You may need to split array and parse struct elements if you want to get array<struct<...>> type. If you just want to get JSON string, all you need is string manipulation: replace and concat.

Replace semicolon between curly brackets with comma, concatenate with square brackets

concat('[',regexp_replace(tag_score ,'\\}\073\\{','},{'),']')

\073 - is a semicolon.

If it can be spaces between curly brackets and semicolon, use '\\}\\s*\073\\s*\\{' regexp, it will work the same with any number of spaces, like this: } ; {

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.