How to parse an string to an array of maps in HIVE

Question

I have a hive table which is ingested from system logs. The data is encoded in a weird format (an array of maps) in which each element of the array contains the field_name and it's value. The column type is STRING. Just like in the example below:

select 1 as user_id, '[{"field":"name", "value":"Bob"}, {"field":"gender", "value":"M"}]' as user_info
union all
select 2 as user_id, '[{"field":"gender", "value":"F"}, {"field":"age", "value":22}, {"field":"name", "value":"Ana"}]' as user_info;

Which creates something like this:

user_id	user_info
1	[{"field":"name", "value":"Bob"}, {"field":"gender", "value":"M"}]
2	[{"field":"gender", "value":"F"}, {"field":"age", "value":22}, {"field":"name", "value":"Ana"}]

Notice that the array size is not always the same. I'm trying to convert the array of maps to a simple map. Then, this is what I expect as result:

user_id	user_info
1	{"name":"Bob", "gender":"M"}
2	{"name":"Ana", "gender":"F", "age":22}

I was planning to reach that in 3 steps: (1) parse the string column to create an array of maps, (2) explode the array (using lateral view), (3) collect the list of fields and group them by user_id

I'm struggling to complete the first step: parse the string column to create an array of maps. Any help would be much appreciated :D

leftjoin · Accepted Answer · 2021-09-09 19:31:24Z

See comments in the code. Array of strings to be transformed to maps is produced by this split(user_info, '(?<=\\}) *, *(?=\\{)'). Then it is exploded and each element converted to map.

with mydata as
(select 1 as user_id, '[{"field":"name", "value":"Bob"}, {"field":"gender", "value":"M"}]' as user_info
union all
select 2 as user_id, '[{"field":"gender", "value":"F"}, {"field":"age", "value":22}, {"field":"name", "value":"Ana"}]' as user_info
)

select user_id,
       --build new map
       str_to_map(concat('name:', name, nvl(concat(',','gender:', gender),''),  nvl(concat(',','age:', age),'') )) as user_info
from 
(
select user_id, 
      --get name, gender, age, aggregate by user_id
      max(case when user_info['field'] = 'name' then user_info['value'] end) name,
      max(case when user_info['field'] = 'gender' then user_info['value'] end) gender,
      max(case when user_info['field'] = 'age' then user_info['value'] end) age
      
from      
(
select s.user_id, 
       --remove {} and ", convert to map
       str_to_map(regexp_replace(e.element,'^\\{| *"|\\}$','')) as user_info 
from
(
select user_id, regexp_replace(user_info, '^\\[|\\]$','') as user_info -- remove []
 from mydata
)s lateral view outer explode(split(user_info, '(?<=\\}) *, *(?=\\{)'))e as element --split by comma between }{ with optional spaces in between
) s
group by user_id
)s

Result:

user_id   user_info 
1        {"name":"Bob","gender":"M"}
2        {"name":"Ana","gender":"F","age":"22"}

Collectives™ on Stack Overflow

How to parse an string to an array of maps in HIVE

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related