6

From a CSV file (with a header and a pipe delimiter) I've got the following content which contains a JSON column (with a collection inside), like this:

ProductId|IngestTime|ProductOrders
9180|20171025145034|[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]
8251|20171026114034|[{"OrderId":"1799","Location":"London"}]

What I need is to create a SELECT Hive query which returns:

ProductId  IngestTime      OrderId        OrderLocation
9180       20171025145034  299            NY
9180       20171025145034  499            LA
8251       20171026114034  1799           London

So far, I tried many combinations by using 'explode', 'get_json_object' and so on, but I still haven't found the right SQL query.

Have you got a solution ?

Thanks a lot for your help :-)

1 Answer 1

1

I was having similar kind of requirement. The solution from this link helped me solve it. BTW, below is the query for your requirement assuming all the columns in your DB_TABLE are of type 'String'.

    SELECT ProductId,
       IngestTime,
       split(split(results,",")[0],':')[1] AS OrderId,
       regexp_replace(split(split(results,",")[1],':')[1], "[\\]|}]", "") AS OrderLocation
    FROM
       (SELECT ProductId,
             IngestTime,
             split(translate(ProductOrders, '"\\[|]|\""',''), "},") AS r
       FROM DB_TABLE) t1 LATERAL VIEW explode(r) rr AS results
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.