A Comprehensive Introduction to Window Functions in MySQL
Do a deep dive into the working principles, syntax, and applications of various MySQL window functions.
There are many open source databases you could choose; MySQL is consistently one of the most popular. But until 2018, there was no provision for including window functions in MySQL. Fortunately, all that has changed – starting from MySQL 8.0, SQL window functions were now available to MySQL users.
Please note that window functions are available only in MySQL 8.0. As of the time of this writing, there is no plan to add SQL window functions to MySQL’s earlier releases.
Why is the advent of window functions in MySQL so important? They are a powerful data analysis and reporting tool. It’s essential to learn window functions, especially if you work with earlier releases of MySQL and are planning to upgrade to MySQL 8.0.
In this article, we’ll go through MySQL’s available window functions and explain their working principles. Next, we’ll compare window functions and aggregate functions. Finally, we’ll examine various sample queries (including their input and output tables) so you can see for yourself how window functions work.
Overview of MySQL Window Functions
The table presents the available window functions in MySQL along with a brief explanation of each:
| Window Function | Description |
|---|---|
| RANK() | Returns the rank of the current row within a defined partition. If one or more rows share the same ranking value, some rank numbers will be omitted from the sequence (e.g. if there are two rows tied for second rank, the rank sequence will be 1, 2, 2, 4…). |
| DENSE_RANK() | Returns the rank of the current row within a defined partition. The difference between DENSE_RANK() and RANK() is that DENSE_RANK() does not skip any sequence numbers. (e.g. if there are two rows tied for second rank, the rank sequence will be 1, 2, 2, 3, 4…). |
| ROW_NUMBER() | Returns the number of the current row within a defined partition. |
| PERCENT_RANK() | Returns a number between 0 and 1 (both inclusive) that indicates the percentage of the current partition values that are less than the current row value of that particular partition. |
| LEAD() | Looks up a subsequent row value for the specified column within the current partition. |
| LAG() | Looks up a previous row value for a specified column within the current partition. |
| FIRST_VALUE() | Returns the value of a specified column for the first row within the current partition. |
| LAST_VALUE() | Returns the value of a specified column for the last row within the current partition. |
| NTH_VALUE() | Returns the value of a specified column for the nth row within the current partition, where n is defined by the user. |
| NTILE() | Divides rows within the current partition into buckets. The number of buckets is specified by the user. The NTILE() function then assigns the number of the bucket to each row. |
| CUME_DIST() | Returns the cumulative distribution of a value within a defined partition. |
For a quick overview of SQL window functions, please see our Window Functions Cheat Sheet.
MySQL Window Function Syntax
Window function syntax is as follows:
The window function (red box) is declared first, followed by the OVER clause (blue box). The OVER clause determines how the rows are arranged and then processed by the window function.
Inside the OVER clause are the PARTITION BY and ORDER BY clauses (green boxes). The optional PARTITION BY clause divides window columns into groups (partitions), as you’ll see in subsequent examples. The ORDER BY clause – also optional – orders the resulting table by the values in a user-selected column.
Depending on the particular window function you choose, you’ll use either PARTITION BY, ORDER BY, neither, or both.
Finally, there’s the closing OVER bracket (blue box).
Here’s an example of the above syntax at work:
Next, we’ll compare the window functions with the GROUP BY clause. This will help you get a better general understanding of the concept behind window functions.
Window Functions vs. GROUP BY
To truly understand window functions, let’s briefly compare window functions and aggregate functions.
In aggregate functions, input table rows are summarized, so several (or many) rows are collapsed into one summary row. Let’s see an example. To do this, we'll need the table CARS_SOLD:
| SaleId | Car | Country | Sold |
|---|---|---|---|
| 1 | Audi | Germany | 120 |
| 2 | Audi | USA | 110 |
| 3 | Audi | Japan | 100 |
| 4 | BMW | Germany | 250 |
| 5 | BMW | USA | 200 |
| 6 | BMW | Japan | 200 |
| 7 | Ford | Germany | 260 |
| 8 | Ford | USA | 300 |
| 9 | Ford | Japan | 200 |
When running this query, which uses GROUP BY with the aggregate function SUM(), the number of rows in the result table is less than in the original input table. This is because the rows are treated as the operands in the addition operation. It returns one summary row for each car type.
SELECT Car, SUM(Sold) FROM CARS_SOLD GROUP BY Car;
And the result table:
| Car | SUM(Sold) |
|---|---|
| Audi | 330 |
| BMW | 650 |
| Ford | 760 |
On the other hand, the number of rows in the input and output tables is the same when we use a window function:
SELECT Car, Country, Sold, SUM(Sold) OVER (PARTITION BY Car) as SoldBrand FROM CARS_SOLD;
The result is:
| Car | Country | Sold | SoldBrand |
|---|---|---|---|
| Audi | Germany | 120 | 330 |
| Audi | USA | 110 | 330 |
| Audi | Japan | 100 | 330 |
| BMW | Germany | 250 | 650 |
| BMW | USA | 200 | 650 |
| BMW | Japan | 200 | 650 |
| Ford | Germany | 260 | 760 |
| Ford | USA | 300 | 760 |
| Ford | Japan | 200 | 760 |
The summary is returned in the SoldBrand column, where the total value of sold cars per brand is presented. Thus, the returned number of records equals the total number of rows in the input table.
You may notice that the PARTITION BY clause in this example plays the same role as the GROUP BY clause in the previous query: it groups rows that have the same values in the Car column.
For more information on the value of window functions, see our article Why Should I Learn SQL Window Functions? or check out our course on Window Functions. Be sure to read our article SQL Course of the Month - Window Functions first to find out what to expect from the course.
Now that we’ve seen the difference between window and aggregate functions, let’s start exploring MySQL window functions.
Learn MySQL Window Functions by Example
This section will introduce you to the power of MySQL window functions. We’ll examine each function on its own and explain what it does. The tables used throughout the examples are presented below.
Table CONTINENTS
| ContId | Continent |
|---|---|
| 1 | America |
| 2 | Europe |
| 3 | Asia |
Table COUNTRIES
| CountryId | CountryName | Continent | TotalCountrySales |
|---|---|---|---|
| 1 | USA | 1 | 90 |
| 2 | Germany | 2 | 95 |
| 3 | France | 2 | 80 |
| 4 | Japan | 3 | 60 |
| 5 | Italy | 2 | 75 |
| 6 | Sweden | 2 | 85 |
Table CAR_MAKERS
| Id | Maker | ProductionYear | Country | Price |
|---|---|---|---|---|
| 2 | Volkswagen | 2015 | 2 | 70000 |
| 3 | BMW | 2015 | 2 | 100000 |
| 5 | Ford Motor Company | 2015 | 1 | 80000 |
| 7 | Citroen | 2015 | 3 | 50000 |
| 8 | Nissan Motors | 2017 | 4 | 50000 |
| 9 | Fiat | 2017 | 5 | 40000 |
| 11 | Honda | 2017 | 4 | 40000 |
| 13 | Daimler Benz | 2017 | 2 | 110000 |
| 14 | Opel | 2017 | 2 | 65000 |
| 15 | Peugeot | 2019 | 3 | 55000 |
| 16 | Renault | 2019 | 3 | 60000 |
| 19 | Toyota | 2019 | 4 | 45000 |
| 21 | Volvo | 2019 | 6 | 75000 |
Example 1: The Ranking Window Functions
Let’s look at some examples presenting the ranking window functions.
RANK() Window Function
The RANK() window function assigns a ranking value to each row within the defined partition. It reinitializes the rank to start from 1 when the partition is switched.
RANK() skips sequence numbers if the row value is repeated, i.e. the same rank is given to rows with the same values.
It’s easier to understand the working principle of RANK() when we look at the example below.
SELECT Maker, Price, RANK() OVER (ORDER BY Price) as RankValue FROM CAR_MAKERS;
| Maker | Price | RankValue |
|---|---|---|
| Fiat | 40000 | 1 |
| Honda | 40000 | 1 |
| Toyota | 45000 | 3 |
| Citroen | 50000 | 4 |
| Nissan Motors | 50000 | 4 |
| Peugeot | 55000 | 6 |
| Renault | 60000 | 7 |
| Opel | 65000 | 8 |
| Volkswagen | 70000 | 9 |
| Volvo | 75000 | 10 |
| Ford Motor Company | 80000 | 11 |
| BMW | 100000 | 12 |
| Daimler Benz | 110000 | 13 |
In this example, we are ordering the result using the Price column and treating all table rows as one partition. The first row is assigned to the rank value 1. In the second row, the rank value is 1 again because the Price column value is the same for the first and second row.
In the third row, the rank value is 3 (because of the skipped number due to the repeated Price column values); the value is 3 because this is the third row. In the fourth row, the rank value is 4 as expected. Once again, two rows share the same Price value, and thus the rank 4 is repeated. This principle is applied throughout all the rows in the example.
Next, let’s look at an example of the RANK() window function that partitions the result by the ProductionYear column and then orders each partition according to the Country column:
SELECT Maker, ProductionYear, Price, RANK() OVER (PARTITION BY ProductionYear ORDER BY Price) as RankValue FROM CAR_MAKERS;
| Maker | ProductionYear | Price | RankValue |
|---|---|---|---|
| Citroen | 2015 | 50000 | 1 |
| Volkswagen | 2015 | 70000 | 2 |
| Ford Motor Company | 2015 | 80000 | 3 |
| BMW | 2015 | 100000 | 4 |
| Fiat | 2017 | 40000 | 1 |
| Honda | 2017 | 40000 | 1 |
| Nissan Motors | 2017 | 50000 | 3 |
| Opel | 2017 | 65000 | 4 |
| Daimler Benz | 2017 | 110000 | 5 |
| Toyota | 2019 | 45000 | 1 |
| Peugeot | 2019 | 55000 | 2 |
| Renault | 2019 | 60000 | 3 |
| Volvo | 2019 | 75000 | 4 |
We placed the PARTITION BY clause inside the OVER() clause. This divides the result into 3 groups:
- Group 1 with a ProductionYear of 2015.
- Group 2 with a ProductionYear of 2017.
- Group 3 with a ProductionYear of 2019.
Similar partitioning by the ProductionYear column takes place in the upcoming examples of this article.
In this example, each group (partition) has its rank values started from 1. The rank values are assigned using the same working principle as described in the previous example.
DENSE_RANK() Window Function
The DENSE_RANK() window function is very similar to the RANK() function. The only difference is that it does not skip any numbers in the rank sequence.
Let’s examine the example below to see the difference:
SELECT Maker, ProductionYear, Price, DENSE_RANK() OVER (PARTITION BY ProductionYear ORDER BY Price) as DenseRankValue FROM CAR_MAKERS;
| Maker | ProductionYear | Price | DenseRankValue |
|---|---|---|---|
| Citroen | 2015 | 50000 | 1 |
| Volkswagen | 2015 | 70000 | 2 |
| Ford Motor Company | 2015 | 80000 | 3 |
| BMW | 2015 | 100000 | 4 |
| Fiat | 2017 | 40000 | 1 |
| Honda | 2017 | 40000 | 1 |
| Nissan Motors | 2017 | 50000 | 2 |
| Opel | 2017 | 65000 | 3 |
| Daimler Benz | 2017 | 110000 | 4 |
| Toyota | 2019 | 45000 | 1 |
| Peugeot | 2019 | 55000 | 2 |
| Renault | 2019 | 60000 | 3 |
| Volvo | 2019 | 75000 | 4 |
In the seventh row of the above output table, the DenseRankValue column value is 2 – although it is in the third row of its partition. The DENSE_RANK() function does not skip the rank value 2 , which was not used in the second row due to row value repetitions. On the other hand, RANK() would use the rank value 3 in the third row of the red partition.
ROW_NUMBER() Window Function
The ROW_NUMBER() window function does exactly what its name says. It assigns a row number to each record within the partition; it reinitializes row numbers to start from 1 when the partition is switched.
First, let’s look at an example that does not use the PARTITION BY clause within the OVER() clause. Please note that although the OVER() clause is empty, you still have to define it with the window function:
SELECT CountryName, Continent, ROW_NUMBER() OVER () as RowNumberValue FROM COUNTRIES;
| CountryName | Continent | RowNumberValue |
|---|---|---|
| USA | 1 | 1 |
| Germany | 2 | 2 |
| France | 2 | 3 |
| Japan | 3 | 4 |
| Italy | 2 | 5 |
| Sweden | 2 | 6 |
As we do not partition the result set, the output is very straightforward. The rows are numbered from 1 to 6.
Once we partition the output table, there will be more than one set of row numbers. Please note that the ORDER BY clause is optional in both of these examples.
SELECT Maker, ProductionYear, Price, ROW_NUMBER() OVER (PARTITION BY ProductionYear ORDER BY Price) as RowNumberValue FROM CAR_MAKERS;
| Maker | ProductionYear | Price | RowNumberValue |
|---|---|---|---|
| Citroen | 2015 | 50000 | 1 |
| Volkswagen | 2015 | 70000 | 2 |
| Ford Motor Company | 2015 | 80000 | 3 |
| BMW | 2015 | 100000 | 4 |
| Fiat | 2017 | 40000 | 1 |
| Honda | 2017 | 40000 | 2 |
| Nissan Motors | 2017 | 50000 | 3 |
| Opel | 2017 | 65000 | 4 |
| Daimler Benz | 2017 | 110000 | 5 |
| Toyota | 2019 | 45000 | 1 |
| Peugeot | 2019 | 55000 | 2 |
| Renault | 2019 | 60000 | 3 |
| Volvo | 2019 | 75000 | 4 |
Notice that rows are numbered from 1 for each of the partitioned groups.
PERCENT_RANK() Window Function
The PERCENT_RANK() window function returns the value from 0 to 1 (both inclusive), which indicates the percentage of current partition rows with a value less than the current row value. This will become clearer with an example. Have a look:
SELECT Maker, ProductionYear, Price, PERCENT_RANK() OVER (PARTITION BY ProductionYear ORDER BY Price) as PercentValue FROM CAR_MAKERS;
| Maker | ProductionYear | Price | PercentValue |
|---|---|---|---|
| Citroen | 2015 | 50000 | 0 |
| Volkswagen | 2015 | 70000 | 0.3333333333333333 |
| Ford Motor Company | 2015 | 80000 | 0.6666666666666666 |
| BMW | 2015 | 100000 | 1 |
| Fiat | 2017 | 40000 | 0 |
| Honda | 2017 | 40000 | 0 |
| Nissan Motors | 2017 | 50000 | 0.5 |
| Opel | 2017 | 65000 | 0.75 |
| Daimler Benz | 2017 | 110000 | 1 |
| Toyota | 2019 | 45000 | 0 |
| Peugeot | 2019 | 55000 | 0.3333333333333333 |
| Renault | 2019 | 60000 | 0.6666666666666666 |
| Volvo | 2019 | 75000 | 1 |
Let’s go through the output table starting from the first row. In the first partition, there is no Price column value that is smaller than 50000. Hence, the PercentValue column value is 0, i.e. 0% of the current partition’s Price values are smaller than the current row’s Price value.
In the case of the second row, 1/3 of the Price values in the current partition are smaller than 70000. So, the PercentValue value of this row is 0.333(or 33%).
In the last row of this partition, we see that all the Price values are smaller than 100000. Thus, the PercentValue value is 1, i.e. 100% of the current partition’s Price values are smaller than the current Price value.
You can apply the same working principle to go through the remaining two partitions of the above example.
Want to know more about ranking window functions? The article Overview of Ranking Functions in SQL will help you out.
Example 2: The Value Window Functions
Below are some usage examples of MySQL’s value window functions, which are very useful for data analysis.
LEAD() Window Function
The LEAD() window function allows us to look up the values of subsequent rows in the current partition. It is commonly used to calculate the difference between current and following row values.
First, let’s look at the example that simply uses the LEAD() function to output the value of the row after the current one:
SELECT Maker, ProductionYear, Price, TotalCountrySales, LEAD(TotalCountrySales, 1, 0) OVER (PARTITION BY ProductionYear ORDER BY Price) as LeadValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;
| Maker | ProductionYear | Price | TotalCountrySales | LeadValue |
|---|---|---|---|---|
| Citroen | 2015 | 50000 | 80 | 95 |
| Volkswagen | 2015 | 70000 | 95 | 90 |
| Ford Motor Company | 2015 | 80000 | 90 | 95 |
| BMW | 2015 | 100000 | 95 | 0 |
| Honda | 2017 | 40000 | 60 | 75 |
| Fiat | 2017 | 40000 | 75 | 60 |
| Nissan Motors | 2017 | 50000 | 60 | 95 |
| Opel | 2017 | 65000 | 95 | 95 |
| Daimler Benz | 2017 | 110000 | 95 | 0 |
| Toyota | 2019 | 45000 | 60 | 80 |
| Peugeot | 2019 | 55000 | 80 | 80 |
| Renault | 2019 | 60000 | 80 | 85 |
| Volvo | 2019 | 75000 | 85 | 0 |
The LEAD() window function takes three arguments:
- The column value to be returned – in this case, it is the
TotalCountrySalescolumn. - The row number (relative to the current row) to be looked up – here,
LEAD()looks up the value of the row immediately after the current row. - A default value that’s returned if there is no subsequent row – in the current example, the
LeadValuein the fourth row is 0 because there are no more rows in this partition.
In the output, the first row has a LeadValue equal to 95 – the TotalCountrySales of the second row. The second row has a LeadValue of 90, the TotalCountrySales of the third row, and so on. As I already mentioned, the LeadValue of the fourth row is 0 (the default value we defined); although there are subsequent rows in the table, the fourth row is the last row in the current partition.
This process is repeated for each partition of the resulted table.
Now let’s try to calculate the TotalCountrySales difference between the current and subsequent rows.
SELECT Maker, ProductionYear, Price, TotalCountrySales, LEAD(TotalCountrySales , 1, 0) OVER (PARTITION BY ProductionYear ORDER BY Price) as LeadValue, TotalCountrySales - (LEAD(TotalCountrySales , 1, 0) OVER (PARTITION BY ProductionYear ORDER BY Price)) as Difference FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;
| Maker | ProductionYear | Country | TotalCountrySales | LeadValue | Difference |
|---|---|---|---|---|---|
| Citroen | 2015 | 50000 | 80 | 95 | -15 |
| Volkswagen | 2015 | 70000 | 95 | 90 | 5 |
| Ford Motor Company | 2015 | 80000 | 90 | 95 | -5 |
| BMW | 2015 | 100000 | 95 | 0 | 95 |
| Fiat | 2017 | 40000 | 75 | 60 | 15 |
| Honda | 2017 | 40000 | 60 | 60 | 0 |
| Nissan Motors | 2017 | 50000 | 60 | 95 | -35 |
| Opel | 2017 | 65000 | 95 | 95 | 0 |
| Daimler Benz | 2017 | 110000 | 95 | 0 | 95 |
| Toyota | 2019 | 45000 | 60 | 80 | -20 |
| Peugeot | 2019 | 55000 | 80 | 80 | 0 |
| Renault | 2019 | 60000 | 80 | 85 | -5 |
| Volvo | 2019 | 75000 | 85 | 0 | 85 |
Except for the Difference column, this example is similar to the previous one. This column stores the difference between the TotalCountrySales value of the current row (column TotalCountrySales) and the TotalCountrySales value of the next row (column LeadValue).
Please note that the LEAD() window function is not limited to getting the value from the immediate next row. Let’s see an example where LEAD() is used to get a value from the third row after the current row. In this example, we do not partition the result table.
SELECT Maker, ProductionYear, Price, CountrySalesIndicator, LEAD(TotalCountrySales, 3) OVER (ORDER BY Price) as LeadValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;
| Maker | ProductionYear | Price | TotalCountrySales | LeadValue |
|---|---|---|---|---|
| Fiat | 2017 | 40000 | 75 | 80 |
| Honda | 2017 | 40000 | 60 | 60 |
| Toyota | 2019 | 45000 | 60 | 80 |
| Citroen | 2015 | 50000 | 80 | 80 |
| Nissan Motors | 2017 | 50000 | 60 | 95 |
| Peugeot | 2019 | 55000 | 80 | 95 |
| Renault | 2019 | 60000 | 80 | 85 |
| Opel | 2017 | 65000 | 95 | 90 |
| Volkswagen | 2015 | 70000 | 95 | 95 |
| Volvo | 2019 | 75000 | 85 | 95 |
| Ford Motor Company | 2015 | 80000 | 90 | 80 |
| BMW | 2015 | 100000 | 95 | 85 |
| Daimler Benz | 2017 | 110000 | 95 | 0 |
Starting from the first row of the result table, each row gets a value of the 3rd row following it. Please note that the last 3 rows values are null because there is no value for the 3rd row following them.
LAG() Window Function
The LAG() window function is the opposite of the LEAD() function. It returns the value from a previous row.
Let’s see it in action. The following example returns a LagValue and calculates the difference between the current row and the previous row:
SELECT Maker, ProductionYear, Price, TotalCountrySales, LAG(TotalCountrySales, 1, 0) OVER (PARTITION BY ProductionYear ORDER BY Price) as LagValue, TotalCountrySales - (LAG(TotalCountrySales, 1, 0) OVER (PARTITION BY ProductionYear ORDER BY Price)) as Difference FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;
| Maker | ProductionYear | Price | TotalCountrySales | LagValue | Difference |
|---|---|---|---|---|---|
| Citroen | 2015 | 50000 | 80 | 0 | 80 |
| Volkswagen | 2015 | 70000 | 95 | 80 | 15 |
| Ford Motor Company | 2015 | 80000 | 90 | 95 | -5 |
| BMW | 2015 | 100000 | 95 | 90 | 5 |
| Fiat | 2017 | 40000 | 75 | 0 | 75 |
| Honda | 2017 | 40000 | 60 | 75 | -15 |
| Nissan Motors | 2017 | 50000 | 60 | 60 | 0 |
| Opel | 2017 | 65000 | 95 | 60 | 35 |
| Daimler Benz | 2017 | 110000 | 95 | 95 | 0 |
| Toyota | 2019 | 45000 | 60 | 0 | 60 |
| Peugeot | 2019 | 55000 | 80 | 60 | 20 |
| Renault | 2019 | 60000 | 80 | 80 | 0 |
| Volvo | 2019 | 75000 | 85 | 80 | 5 |
The example should speak for itself, as it is analogous to the example used for LEAD(). Here the values are looked up from the previous row; this is opposite of the LEAD() function, where the values are looked up from the subsequent row.
Like LEAD(), the LAG() window function can fetch the value of any row preceding it – not just the immediately preceding row.
FIRST_VALUE() Window Functions
This window function outputs the first value of the current partition. This will be clear after looking at the example below.
SELECT Maker, ProductionYear, Price, TotalCountrySales, FIRST_VALUE(TotalCountrySales) OVER (PARTITION BY ProductionYear ORDER BY Price) as FirstValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;
| Maker | ProductionYear | Price | TotalCountrySales | FirstValue |
|---|---|---|---|---|
| Citroen | 2015 | 50000 | 80 | 80 |
| Volkswagen | 2015 | 70000 | 95 | 80 |
| Ford Motor Company | 2015 | 80000 | 90 | 80 |
| BMW | 2015 | 100000 | 95 | 80 |
| Fiat | 2017 | 40000 | 75 | 75 |
| Honda | 2017 | 40000 | 60 | 75 |
| Nissan Motors | 2017 | 50000 | 60 | 75 |
| Opel | 2017 | 65000 | 95 | 75 |
| Daimler Benz | 2017 | 110000 | 95 | 75 |
| Toyota | 2019 | 45000 | 60 | 60 |
| Peugeot | 2019 | 55000 | 80 | 60 |
| Renault | 2019 | 60000 | 80 | 60 |
| Volvo | 2019 | 75000 | 85 | 60 |
Let’s analyze the first partition in this output table. The FirstValue column contains an 80 for all rows; this is the TotalCountrySales column value of the first row in this partition.
Example 3: Using Aggregate Functions with OVER()
Our next example demonstrates the usage of the aggregate functions SUM(), COUNT(), and AVG() with the OVER() clause. Using these functions with OVER() turns them into window functions.
Please note that the OVER() clause is mandatory for all window functions, regardless of whether it is empty or not.
SELECT Maker, ProductionYear, Country, TotalCountrySales, SUM(TotalCountrySales) OVER (PARTITION BY ProductionYear) as SumValue, COUNT(TotalCountrySales) OVER (PARTITION BY ProductionYear) as CountValue, AVG(TotalCountrySales) OVER (PARTITION BY ProductionYear) as AvgValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;
| Maker | ProductionYear | Country | TotalCountrySales | SumValue | CountValue | AvgValue |
|---|---|---|---|---|---|---|
| Volkswagen | 2015 | 2 | 95 | 360 | 4 | 90.0000 |
| BMW | 2015 | 2 | 95 | 360 | 4 | 90.0000 |
| Ford Motor Company | 2015 | 1 | 90 | 360 | 4 | 90.0000 |
| Citroen | 2015 | 3 | 80 | 360 | 4 | 90.0000 |
| Nissan Motors | 2017 | 4 | 60 | 385 | 5 | 77.0000 |
| Fiat | 2017 | 5 | 75 | 385 | 5 | 77.0000 |
| Honda | 2017 | 4 | 60 | 385 | 5 | 77.0000 |
| Daimler Benz | 2017 | 2 | 95 | 385 | 5 | 77.0000 |
| Opel | 2017 | 2 | 95 | 385 | 5 | 77.0000 |
| Peugeot | 2019 | 3 | 80 | 305 | 4 | 76.2500 |
| Renault | 2019 | 3 | 80 | 305 | 4 | 76.2500 |
| Toyota | 2019 | 4 | 60 | 305 | 4 | 76.2500 |
| Volvo | 2019 | 6 | 85 | 305 | 4 | 76.2500 |
In the above example, the sum, count, and average values are calculated using the values in the current partition. It is also worth mentioning that they are calculated for each partition separately.
Let’s analyze the first partition to see what’s going on:
- The
SumValuecolumn stores the sum of allTotalCountrySalesvalues in the current partition (i.e. 95+95+90+80=360). - The
CountValuecolumn stores the number of records in the current partition (i.e. 4). - The
AvgValuecolumn stores the average ofTotalCountrySalescolumn values in the current partition (i.e. (95+95+90+80)/4=90).
A good example of ‘aggregate function + OVER() clause’ usage is the SQL running total. If you want to learn more, please see the article What Is a SQL Running Total and How Do You Compute It?.
Example 4: NTILE() and CUME_DIST() Window Functions
The NTILE() Window Function
The NTILE() window function divides partition rows into buckets. It takes the parameter n that the user places in the function brackets and creates that number of buckets. Thus, n groups/buckets are created for each partition and the appropriate bucket number is assigned to each row.
Like a few of the other concepts we’ve talked about, this will be clearer after you see an example. Suppose you want to divide car makers into two groups for each production year. This is the query you’d use:
SELECT Maker, ProductionYear, NTILE(2) OVER (PARTITION BY ProductionYear ORDER BY Maker) as NtileValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;
| Maker | ProductionYear | NtileValue |
|---|---|---|
| BMW | 2015 | 1 |
| Citroen | 2015 | 1 |
| Ford Motor Company | 2015 | 2 |
| Volkswagen | 2015 | 2 |
| Daimler Benz | 2017 | 1 |
| Fiat | 2017 | 1 |
| Honda | 2017 | 1 |
| Nissan Motors | 2017 | 2 |
| Opel | 2017 | 2 |
| Peugeot | 2019 | 1 |
| Renault | 2019 | 1 |
| Toyota | 2019 | 2 |
| Volvo | 2019 | 2 |
The column NtileValue contains the bucket number – which is either 1 or 2 – for each row in each partition.
The NTILE window function is used to divide the result table rows into groups. In the example above, we partition the result table into 3 partitions and each partition’s rows are divided into 2 groups because the query uses NTILE(2).
The CUME_DIST() Window Function
The CUME_DIST() window function – as its name indicates – calculates the cumulative distribution value for each row in a partition.
Cumulative distribution is the number of rows that have a value that’s less than or equal to the current row value divided by the number of all rows within the partition.
The example below will help clear up this concept:
SELECT Maker, ProductionYear, Country, TotalCountrySales, CUME_DIST() OVER (PARTITION BY ProductionYear ORDER BY TotalCountrySales) as CumeDistValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;
| Maker | ProductionYear | Country | TotalCountrySales | CumeDistValue |
|---|---|---|---|---|
| Citroen | 2015 | 3 | 80 | 0.25 |
| Ford Motor Company | 2015 | 1 | 90 | 0.5 |
| Volkswagen | 2015 | 2 | 95 | 1 |
| BMW | 2015 | 2 | 95 | 1 |
| Nissan Motors | 2017 | 4 | 60 | 0.4 |
| Honda | 2017 | 4 | 60 | 0.4 |
| Fiat | 2017 | 5 | 75 | 0.6 |
| Daimler Benz | 2017 | 2 | 95 | 1 |
| Opel | 2017 | 2 | 95 | 1 |
| Toyota | 2019 | 4 | 60 | 0.25 |
| Peugeot | 2019 | 3 | 80 | 0.75 |
| Renault | 2019 | 3 | 80 | 0.75 |
| Volvo | 2019 | 6 | 85 | 1 |
Let’s analyze the first partition as an example. It includes TotalCountrySales column values of 80, 90, 95, and 95. The CumeDistValue column values of the first four rows can be explained as follows:
- The first row has a
TotalCountrySalescolumn value of 80. This is the smallestTotalCountrySalesvalue in the current partition; hence, 1/4=0.25. - The second row has a
TotalCountrySalesvalue of 90. There are two values less than or equal to 90; hence, 2/4=0.5. - The third and fourth rows have a
TotalCountrySalesvalue of 95. This is the highest in this partition, so 4/4=1.
The CUME_DIST window function returns the cumulative distribution value. It is used mostly for data analysis.
Going Deeper into MySQL Window Functions
SQL window functions are similar to aggregate functions, but with a major difference. Aggregate functions create one summary row for a group of rows, while window functions assign summary values to each row. MySQL window functions are very useful for thorough data analysis: they allow the user to analyze each data group without losing the details in each row.
This article was meant to be an introduction to MySQL window functions. To really learn them, you should check out the LearnSQL.com course on Window Functions. It’ll give you a detailed explanation of each function and lots of opportunities to practice all of them.
