0

I have a HIVE Table with following schema like this:

hive>desc books;
gen_id                  int                                         
author                  array<string>                               
rating                  double                               
genres                  array<string>  

hive>select * from books;

| gen_id         | rating    | author          |genres
+----------------+-------------+---------------+----------
| 1              | 10        | ["A","B"]       | ["X","Y"]  
| 2              | 20        | ["C","A"]       | ["Z","X"]
| 3              | 30        | ["D"]           | ["X"]

Is there a query where I can perform some SELECT operation and that returns individual rows, like this:

| gen_id      |  rating        | JoinData
+-------------+---------------+-------------
| 1           | 10            | ["A","B","X","Y"]
| 2           | 20            | ["C","A","Z","X"]
| 3           | 30            | ["D","X"]
| 1           | 10            | "Y"

Can someone guide me how can get to this result. Thanks in advance for any kind of help.

1 Answer 1

2

Answer is in this post:
[1]: http://stackoverflow.com/questions/21578477/array-intersect-hive

For the people, that don't want to enter the thread:

1) Create a temp function using UDF CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';

2) make a select statement

select gen_id
    , rating
    , combine(author, genres) as JoinData 
from books
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.