0

How do I get this output of listing all the movies for each year using spark.sql?

Ouput:
(1988,{(Rain Man),(Die Hard)})
(1990,{(The Godfather: Part III),(Die Hard 2),(The Silence of the Lambs),(King of New York)})
(1992,{(Unforgiven),(Bad Lieutenant),(Reservoir Dogs)})
(1994,{(Pulp Fiction)})

this is the json data:

{ "id": "movie:1", "title": "Vertigo", "year": 1958, "genre": "Drama", "summary": "A retired San Francisco detective suffering from acrophobia investigates the strange activities of an old friend's wife, all the while becoming dangerously obsessed with her.", "country": "USA", "director": { "id": "artist:3", "last_name": "Hitchcock", "first_name": "Alfred", "year_of_birth": "1899" }, "actors": [ { "id": "artist:15", "role": "John Ferguson" }, { "id": "artist:16", "role": "Madeleine Elster" } ] }

Here is the code I have tried:

val hiveCtx = new org.apache.spark.sql.hive.HiveContext(sc) 
val movies = hiveCtx.jsonFile("movies.json") 
movies.createOrReplaceTempView("movies")
val ty = hiveCtx.sql("SELECT year, title FROM movies")

Please help me find the correct query.

Thanks for you help.

2
  • 1
    How are you storing this data? Can you please include all code you have used to get to this point? Commented Jun 17, 2019 at 16:56
  • create hivectx: val hiveCtx = new org.apache.spark.sql.hive.HiveContext(sc) val movies = hiveCtx.jsonFile("movies.json") movies.createOrReplaceTempView("movies") now i need a sql query to get the ouptut listing all movies for each year val ty = hiveCtx.sql("SELECT year, title FROM movies")? Commented Jun 17, 2019 at 17:46

1 Answer 1

1

You can get something similar without using spark.sql. You can simply perform the operation on the dataframe itself:

movies.groupBy($"year").agg(concat_ws("; ", collect_list($"title"))).show

Dataset used:

{ "id": "movie:1", "title": "Vertigo", "year": 1958, "genre": "Drama", "summary": "A retired San Francisco detective suffering from acrophobia investigates the strange activities of an old friend's wife, all the while becoming dangerously obsessed with her.", "country": "USA", "director": { "id": "artist:3", "last_name": "Hitchcock", "first_name": "Alfred", "year_of_birth": "1899" }, "actors": [ { "id": "artist:15", "role": "John Ferguson" }, { "id": "artist:16", "role": "Madeleine Elster" } ] }
{ "id": "movie:2", "title": "The Blob", "year": 1958, "genre": "Drama", "summary": "The Blob", "country": "USA", "director": { "id": "artist:3", "last_name": "Hitchcock", "first_name": "Alfred", "year_of_birth": "1899" }, "actors": [ { "id": "artist:15", "role": "John Ferguson" }, { "id": "artist:16", "role": "Madeleine Elster" } ] }

Output:

+----+----------------------------------+
|year|concat_ws(; , collect_list(title))|
+----+----------------------------------+
|1958|                 Vertigo; The Blob|
+----+----------------------------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.