3

I have a database column that stores JSON format strings. The string itself contains multuple elements like an array. Each element contains multiple key-value pairs. Some value may also contain multiple key-value pairs as well, for example, the "address" attribute below.

[{"name":"abc", 
  "address":{"street":"str1", "city":"c1"},
  "phone":"1234567"
 },
 {"name":"def", 
  "address":{"street":"str2", "city":"c1"},
  "phone":"7145895"
 }
]

My ultimate goal is to get the single value of each field within the JSON string. I will probably use explode() to do that but the explode() needs to have arrays passed into it, not a string. So my first goal is to convert the JSON string into an array. Can someone please let me know how to do it ? Many thanks.

2 Answers 2

1

You can start with this:

select concat(‘{“name”’,data_json) from your_table q1 --re-construct your json
lateral view explode(split(json_data,’{“name”’)) json_splits as data_json --split json at each {"name" tag into array and then explode

Note: I code is not tested as I don't have access to hive currently. This should definitely give you a good start OR you can always go with Hive SerDe for JSON com.cloudera.hive.serde.JSONSerDe

Sign up to request clarification or add additional context in comments.

Comments

1

As suggested by @ruben123, go with Hive SerDe for JSON especially when your json is complex. There are several JSONSerDe available, eg. com.cloudera.hive.serde.JSONSerDe, org.openx.data.jsonserde.JsonSerDe link

Make sure json is properly formatted, one line json for one record. So, your json should be:

{"name":"abc", "address":{"street":"str1", "city":"c1"}, "phone":"1234567"}
{"name":"def", "address":{"street":"str2", "city":"c1"}, "phone":"7145895"} 

Create hive table:

CREATE TABLE sample_json (
   name STRING,
   address STRUCT<
     street: STRING,
     city: STRING>,
   phone INT )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/your/hdfs/directory';

To select access field, simply

select name, address.street, address.city, phone from sample_json;

abc   str1  c1  1234567
def   str2  c1  7145895

Note: if JSONSerDe is not installed yet, you must run ADD JAR

2 Comments

Thank you very much for the reply
@jlp had posted that only one of his columns has json strings. Since SerDe is essentially a serialization deserialization framework, JsonSerDe can be applied only on the whole table right? What if the user has a mix of data Json and non-json pipe delimitied? Even if the user uses a RegexSerde, the Json part will still be a string. In that case, how should we go about exploding the string json?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.