1

I have a table in Hive, which is generated by reading from a Sequence File in my HDFS. Those sequence files are json and look like this:

{"Activity":"Started","CustomerName":"CustomerName3","DeviceID":"StationRoboter","OrderID":"CustomerOrderID3","DateTime":"2018-11-27T12:56:47Z+0100","Color":[{"Name":"red","Amount":1},{"Name":"green","Amount":1},{"Name":"blue","Amount":1}],"BrickTotalAmount":3}

They submit product part colours and the amount of them which are counted in one service process run.

Please notice the json-array in color

Therefore my code to create the table is:

CREATE EXTERNAL TABLE iotdata(
  activity              STRING,
  customername          STRING,
  deviceid              STRING,
  orderid               STRING,
  datetime              STRING,
  color                 ARRAY<MAP<String,String>>,
  bricktotalamount      STRING
)
ROW FORMAT SERDE "org.apache.hive.hcatalog.data.JsonSerDe"
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION '/IoTData/scray-data-000-v0';

This works, and if I do a select * on that table it looks like this:

enter image description here

But my problem is, that I have to access the data inside the color column for analysis. For example, I want to calc all red values in the table.

So this leads to several opportunities and questions: how can I cast the amount string which is created to an integer?

How can I access the data in my color-column via select?

Or is there a possibility to change my table schema right at the beginning to get 4 extra columns for my 4 colours and 4 extra columns for the related colour amounts?

I also tried to read in the whole json as string to one column, and select the subcontent there, but this importing json array into hive leads me only to NULL values, propably because my json file is not 100% well-formed.

2 Answers 2

1

You can do this in two steps.

Create proper JSON table

CREATE external TABLE temp.test_json (
  activity string,
  bricktotalamount int,
  color array<struct<amount:int, name:string>>,
  customername string,
  datetime string,
  deviceid string,
  orderid string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
location '/tmp/test_json/table'

enter image description here

Explode the Table in Select Statement

select activity, bricktotalamount, customername, datetime, deviceid, orderid, name, amount from temp.test_json
lateral view inline(color) c as amount,name

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Hey Gaurang, this helped me much. Thank you for your fast and understandable support! But i need to have the 4 colours and their amount in seperate columns. at the end it should look like: imgur.com/a/L4cq8LE - Can you please help me out again :-) ? Would be very nice.
What are you looking for pivoting. There is not inbuilt mechanism in hive however there are few third party libraries. or if orderid is uniquer you can groupby it and then write case statement.
0

The data inside of your array is definitely not a map for hive, you need to specify. I would recommend redefine your table specifying the structure of the array's data like this

CREATE EXTERNAL TABLE iotdata(
  activity              STRING,
  customername          STRING,
  deviceid              STRING,
  orderid               STRING,
  datetime              STRING,
  color ARRAY<STRUCT<NAME: STRING,AMOUNT:BIGINT>>
  bricktotalamount      STRING
)
ROW FORMAT SERDE "org.apache.hive.hcatalog.data.JsonSerDe"
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION '/IoTData/scray-data-000-v0';

in that way you should be able to the structure it self

1 Comment

Thank you! This also works. I now have to see how i get the 4 colours in seperate columns, like i mentioned in the comments in the post above yours. Maybe you can help me here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.