Ranking functions are incredibly useful in SQL when you want to rank rows based on specific criteria. Let's dive into the world of ranking and learn how these functions work!
In SQL, ranking functions help us assign ranks to rows based on partitioning and order expressions. Think of these ranks as numbers that indicate the position of a row within a specific "window" of data. A window is just a set of rows that meet certain conditions - hence the name "window function"! π
Before we delve into the differences, let's look at the generic syntax of ranking functions:
ROW_NUMBER(): This function assigns a unique sequential number to each row within a window. It's like numbering the rows in order.
RANK(): The function handles tied values by assigning the same rank to them. However, it may skip subsequent ranks, leaving gaps in the sequence.
DENSE_RANK(): Similar to , also handles tied values by assigning the same rank. However, it does not skip ranks, resulting in no gaps in the sequence.
Let's use a dataset containing information about artists and their concert revenue and see how the ranking functions differ.
Schema:
Column Name | Type | Description |
---|---|---|
artist_id | integer | A unique identifier for each artist or band performing in the concert. |
artist_name | varchar(100) | The name of the artist or band performing in the concert. |
genre | varchar(50) | The music genre associated with the concert. |
concert_revenue | integer | The total revenue generated from the concert. |
year_of_formation | integer | The year that the artist or band was formed. |
country | varchar(50) | The country of origin or residence of the artist or band. |
number_of_members | integer | The number of members in the band. |
album_released | integer | The total number of albums released by the artist or band. |
label | varchar(100) | The record label or music company associated with the artist or band. |
Table:
artist_id | artist_name | genre | concert_revenue | year_of_formation | country | number_of_members | album_released | label |
---|---|---|---|---|---|---|---|---|
103 | Taylor Swift | Pop | 700000 | 2004 | United States | 1 | 9 | Republic Records |
104 | BTS | K-Pop | 800000 | 2013 | South Korea | 7 | 7 | Big Hit Music |
105 | Adele | Pop | 600000 | 2006 | United Kingdom | 1 | 3 | Columbia Records |
109 | Blackpink | K-Pop | 450000 | 2016 | South Korea | 4 | 5 | YG Entertainment |
110 | Maroon 5 | Pop | 550000 | 1994 | United States | 5 | 7 | Interscope Records |
Write a query to display the list of artists sorted in descending order of their concert revenue.
Results:
artist_name | concert_revenue | row_num | rank_num | dense_rank_num |
---|---|---|---|---|
BTS | 800000 | 1 | 1 | 1 |
Beyonce | 750000 | 2 | 2 | 2 |
Ariana Grande | 720000 | 3 | 3 | 3 |
Bruno Mars | 700000 | 4 | 4 | 4 |
Taylor Swift | 700000 | 5 | 4 | 4 |
Justin Bieber | 680000 | 6 | 6 | 5 |
Pink Floyd | 670000 | 7 | 7 | 6 |
Red Hot Chili Peppers | 660000 | 8 | 8 | 7 |
U2 | 650000 | 9 | 9 | 8 |
Ed Sheeran | 650000 | 10 | 9 | 8 |
Interpreting the results:
Still curious about how the ranking functions work? Let's dive into another example.
In this example, we'll list the artists by country in the chronological order of their year of formation. To achieve this, we'll use the expression along with the clause.
In this query, each artist is ranked within their respective country based on the descending order of their year of formation.
The results look like this:
artist_name | country | concert_revenue | row_num | rank_num | dense_rank_num |
---|---|---|---|---|---|
Rihanna | Barbados | 550000 | 1 | 1 | 1 |
Justin Bieber | Canada | 680000 | 1 | 1 | 1 |
The Weeknd | Canada | 620000 | 2 | 2 | 2 |
BTS | South Korea | 800000 | 1 | 1 | 1 |
Super Junior | South Korea | 490000 | 2 | 2 | 2 |
EXO | South Korea | 480000 | 3 | 3 | 3 |
Got7 | South Korea | 470000 | 4 | 4 | 4 |
SHINee | South Korea | 470000 | 5 | 4 | 4 |
Ed Sheeran | United Kingdom | 650000 | 1 | 1 | 1 |
Adele | United Kingdom | 600000 | 2 | 2 | 2 |
One Direction | United Kingdom | 580000 | 3 | 3 | 3 |
... | ... | ... | ... | ... | ... |
The table is partitioned by country, meaning the ranking is reset for each country group. This is why the same rank may appear for different artists from different countries.
Let's explain how the functions differ in scenarios where there are differences in ranking:
Now, let's put these window function skills to the test, by solving multiple real SQL interview questions. Because of how often shows up in interviews, you've got a lot of work in front of you to do!
In this real Spotify Rank SQL Interview Question, you're asked to find the top 5 artists whose songs appear most frequently in the Top 10 of the global_song_rank table.
If two or more artists have the same number of song appearances, they should be assigned the same ranking, and the rank numbers should be continuous (i.e. 1, 2, 2, 3, 4, 5). The expected rankings should look something like this:
artist_name | artist_rank |
---|---|
Taylor Swift | 1 |
Bad Bunny | 2 |
Drake | 2 |
Ed Sheeran | 3 |
Adele | 3 |
Lady Gaga | 4 |
Assume you're given a table on Walmart user transactions. Based on their most recent transaction date, write a query that retrieve the users along with the number of products they bought.
Output the user's most recent transaction date, user ID, and the number of products, sorted in chronological order by the transaction date.
Here's a fun Google SQL interview question to try, about finding the sum of odd and even measurements. Don't forget the () SQL math function for the intermediate SQL tutorial β it'll come in handy!
Next up, we'll look at two more popular window functions: LEAD and LAG
Next Lesson
SQL LEAD LAG π’