2

I have data in below format in hive. table test(seq string, result string);

|seq  | result                                                                                                                                                                                                                                                                                                                                                 |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|0001 | [{"offerId":"Default_XYZ","businessName":"Apple","businessGroup":"Default","businessIssue":"Default","interactionId":"-4930126168287369915","campaignID":"P-1","rank":"1"},{"offerId":"Default_NAV","businessName":"Orange","businessGroup":"Default","businessIssue":"Default","interactionId":"-7830126168223452134","campaignID":"P-1","rank":"2"}] |

Output should be like

|seq  | offerId     | businessName   | businsesGroup| businessIssue | interactionId        | campaignId | rank |
----------------------------------------------------------------------------------------------------------------
|0001 | Default_XYZ | Apple          | Default      | Default       | -4930126168287369915 | P-1        | 1    |
|0001 | Default_NAV | Orange         | Default      | Default       | -7830126168223452134 | P-1        | 2    |

I tried to convert string to Array of struct, but it didnt work with direct CAST.

Any help please?

[EDIT - Tried below query]

 select sequenceNumber, offerId, businessName, rank from (

 select sequenceNumber,
        collect_list(oid['offerId']) as offerid_list
       , collect_list(oid['businessName']) as businessName_list
        ,collect_list(oid['rank']) as rank_list
  from (
 select sequenceNumber,
        str_to_map(translate(offer_Id,'{}','')) as oid

        from test
        lateral view explode (split(translate(result, '[]"',''),"\\},")) oid as offer_id
    ) x
    group by sequenceNumber

      ) y lateral view explode(offerid_list) olist as offerId
      lateral view explode(businessName_list) olist as businessName
      lateral view explode(rank_list) rlist as rank
3
  • any queries that you have tried so far? Commented Mar 1, 2018 at 5:58
  • Yes. Tried some, but not getting required results. Edited my question with that query. Commented Mar 1, 2018 at 6:36
  • I just want make sure that I understand, but it looks like you have 2 string columns, one of those is a json. You should be able to use a json serde if you are able to add your seq as prat of the json. Commented Mar 1, 2018 at 6:53

2 Answers 2

2

Found one solution to my question:

select                                                   
seq, 
split(split(results,",")[0],':')[1] as offerId,
split(split(results,",")[1],':')[1] as businessName,
split(split(results,",")[2],':')[1] as businessGroup,
split(split(results,",")[3],':')[1] as businessIssue,
split(split(results,",")[4],':')[1] as interactionId,
split(split(results,",")[5],':')[1] as campignId
regexp_replace(split(split(results,",")[6],":")[1], "[\\]|}]", "") as  rank

from
(
  select seq,
     split(translate(result), '"\\[|]|\""',''), "},") as r
      from test  
) t1
LATERAL VIEW explode(r) rr AS results
Sign up to request clarification or add additional context in comments.

Comments

1

You may try get_json_object function.

select seq, get_json_object(result,'$\[0].offerId') as offerId,
            get_json_object(result,'$\[0].businessName') as businessName,
            get_json_object(result,'$\[0].businsesGroup') as businsesGroup,
            get_json_object(result,'$\[0].businessIssue') as businessIssue,
            get_json_object(result,'$\[0].interactionId') as interactionId,
            get_json_object(result,'$\[0].campaignId') as campaignId,
            get_json_object(result,'$\[0].rank') as rank
    from t
    UNION ALL
select seq, get_json_object(result,'$\[1].offerId') as offerId,
            get_json_object(result,'$\[1].businessName') as businessName,
            get_json_object(result,'$\[1].businsesGroup') as businsesGroup,
            get_json_object(result,'$\[1].businessIssue') as businessIssue,
            get_json_object(result,'$\[1].interactionId') as interactionId,
            get_json_object(result,'$\[1].campaignId') as campaignId,
            get_json_object(result,'$\[1].rank') as rank
    from t

2 Comments

Thanks Kaushik. What if result array contians more than 2 elements? We cannot dynamically change the hive query.
@Naveen : I am not sure if that could be done. But, as hlagos suggested, you may use a JSON serde for such purpose. Even your solution would not be perfoming so well for large datasets, due to multiple function calls. So, you would probably need a different approach. Wish you all the best.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.