I need to write a query:
Find the difference between the average rating of movies released before 1980 and the average rating of movies released after 1980. (Make sure to calculate the average rating for each movie, then the average of those averages for movies before 1980 and movies after. Don't just calculate the overall average rating before and after 1980.)
The schema is as follows:
Movie ( mID, title, year, director )
English: There is a movie with
ID number mID, a title, a release year, and a director.
Reviewer ( rID, name )
English: The reviewer with ID number rID has a certain name.
Rating ( rID, mID, stars, ratingDate )
English: The reviewer rID gave the movie mID a
number of stars rating (1-5) on a certain ratingDate.
The following is the query I came up with. The result is correct but is definitely not a very good query:
select t1.p1-t2.p2 from
(select avg(average) as p1 from
(select g.mid,g.average, year from
(select mid, avg(stars) as average from rating
group by mid) g, movie
where g.mid=movie.mid) j
where year >= 1980) t1,
(select avg(average) as p2 from
(select g.mid,g.average, year from
(select mid, avg(stars) as average from rating
group by mid) g, movie
where g.mid=movie.mid) j
where year < 1980) t2;
The following is how I arrived at this query. First of all, I wrote this subquery that retrieves movie id, average rating for that movie, the year of the movie:
select g.mid,g.average, year from
(select mid, avg(stars) as average from rating
group by mid) g, movie
where g.mid=movie.mid
Now I need to use the same subquery to create two tables where the first table contains average of rating for movies after 1980. The second contains the average of rating for movies before 1980. In the top level query, I subtract these 2 values.
The problem is I am duplicating the same code in two places. Can you please help optimize the code from a code duplication standpoint, as well as performance?