Back to questions
Assume there are three Spotify tables containing information about the artists, songs, and music charts. Write a query to determine the top 5 artists whose songs appear in the Top 10 of the table the highest number of times. From now on, we'll refer to this ranking number as "song appearances".
Output the top 5 artist names in ascending order along with their song appearances ranking (not the number of song appearances, but the rank of who has the most appearances). The order of the rank should take precedence.
For example, Ed Sheeran's songs appeared 5 times in Top 10 list of the global song rank table; this is the highest number of appearances, so he is ranked 1st. Bad Bunny's songs appeared in the list 4, so he comes in at a close 2nd.
Assumptions:
Column Name | Type |
---|---|
artist_id | integer |
artist_name | varchar |
artist_id | artist_name |
---|---|
101 | Ed Sheeran |
120 | Drake |
Column Name | Type |
---|---|
song_id | integer |
artist_id | integer |
song_id | artist_id |
---|---|
45202 | 101 |
19960 | 120 |
Column Name | Type |
---|---|
day | integer (1-52) |
song_id | integer |
rank | integer (1-1,000,000) |
day | song_id | rank |
---|---|---|
1 | 45202 | 5 |
3 | 45202 | 2 |
1 | 19960 | 3 |
9 | 19960 | 15 |
artist_name | artist_rank |
---|---|
Ed Sheeran | 1 |
Drake | 2 |
Ed Sheeran's song appeared twice in the Top 10 list of global song rank while Drake's song is only listed once. Therefore, Ed is ranked #1 and Drake is ranked #2.
The dataset you are querying against may have different input & output - this is just an example!
Let's break this question down into 3 steps:
Step 1
First, join the and tables to get a table with artist and their song rankings. This link has more information about using .
Count [1] the number of times an artist's song/s appear in the table and group by the artist [2].
Don't forget to do the most crucial step: filtering for top 10 ranks. Let's call this output the Top 10 Chart. The query should look something like this:
The first 5 rows of the output should look like this:
artist_id | song_count |
---|---|
101 | 5 |
200 | 4 |
125 | 6 |
240 | 3 |
120 | 2 |
To interpret the output, we can say that the songs of artist id 101 have appeared in the Top 10 of the global song rank chart 5 times.
Step 2
Next, we rank the artists based on the number of times their songs appeared in the Top 10 Chart.
We can accomplish this by using the above as a subquery [3], and implementing a [4] window function. will help us create a ranking based on the number of times the artists' songs appear in descending order.
Do you know why we use and not ?
Both the and functions assign the same rank to duplicates. However, skips the next number in the ranking (see the last row in the output below), which is not what we want. For this question, we will use , which doesn't skip any rank numbers.
Check out the difference between and here:
artist_id | song_count | dense_rank_num | rank_num |
---|---|---|---|
125 | 6 | 1 | 1 |
101 | 5 | 2 | 2 |
145 | 4 | 3 | 3 |
200 | 4 | 3 | 3 |
240 | 3 | 4 | 5 |
Notice the difference between the and columns? Artist 145 and 200 have the same of 4, so both are ranked 3rd. Using , the immediate next artist 240 is correctly ranked as 4th, but skips the 4th rank and labels artist 240 as the 5th rank.
Note that we want the ranking to be continuous (1, 2, 2, 3, 4, 5) and not skipped (1, 2, 2, 4, 5). Did you get the difference? ;)
Once we implement , our query looks like this:
Let's convert this query to a Common Table Expression (CTE) called so we can reuse it in the final step.
Again, we're only showing the first 5 rows of output:
artist_id | artist_rank |
---|---|
125 | 1 |
101 | 2 |
145 | 3 |
200 | 3 |
240 | 4 |
Here, artist 101 is ranked 2nd, which means that the songs of artist id 125 appeared in the Top 10 Chart more often than those of artist 101!
Step 3
For the final step, we'll use a CTE to find the top 5 artists, then it with the table to get their names.
A Common Table Expression (CTE) is a temporary data set that exists during the entire query session. Click here to learn more about CTEs.
Our full solution looks like this:
Results:
artist_name | artist_rank |
---|---|
Bad Bunny | 1 |
Ed Sheeran | 2 |
Adele | 3 |
Lady Gaga | 3 |
Katy Perry | 4 |
Drake | 5 |
Full Solution:
PostgreSQL 14