Back to questions
Assume there are three Spotify tables containing information about the artists, songs, and music charts. Write a query to find the top 5 artists whose songs appear most frequently in the Top 10 of the table.
Display the top 5 artist names in ascending order, along with their song appearance ranking. Note that if two artists have the same number of song appearances, they should have the same ranking, and the rank numbers should be continuous (i.e. 1, 2, 2, 3, 4, 5).
For instance, if Ed Sheeran appears in the Top 10 five times and Bad Bunning four times, Ed Sheeran should be ranked 1st, and Bad Bunny should be ranked 2nd.
Column Name | Type |
---|---|
artist_id | integer |
artist_name | varchar |
artist_id | artist_name |
---|---|
101 | Ed Sheeran |
120 | Drake |
Column Name | Type |
---|---|
song_id | integer |
artist_id | integer |
song_id | artist_id |
---|---|
45202 | 101 |
19960 | 120 |
Column Name | Type |
---|---|
day | integer (1-52) |
song_id | integer |
rank | integer (1-1,000,000) |
day | song_id | rank |
---|---|---|
1 | 45202 | 5 |
3 | 45202 | 2 |
1 | 19960 | 3 |
9 | 19960 | 15 |
artist_name | artist_rank |
---|---|
Ed Sheeran | 1 |
Drake | 2 |
Ed Sheeran's song appeared twice in the Top 10 list of global song rank while Drake's song is only listed once. Therefore, Ed is ranked #1 and Drake is ranked #2.
The dataset you are querying against may have different input & output - this is just an example!
The task can be broken down into 3 steps:
Step 1: Find the top 10 artists with the most songs in the global top 10 chart
First, join the and tables to get a table with artist and their song rankings using .
Then, count the number of times an artist's song appears in the top 10 of the chart, grouping by the artist. The resulting output should give us a list of artists with their respective song counts.
Here's the query:
Here's the output for the first 5 rows:
artist_id | song_count |
---|---|
101 | 5 |
200 | 4 |
125 | 6 |
240 | 3 |
120 | 2 |
Based on the first row, artist ID 101 whose songs have appeared in the Top 10 of the chart 5 times.
Step 2: Rank Artists by Song Appearances in Top 10 Chart
Now that we have the artists and their song counts, we need to rank them according to their number of song appearances in descending order.
We can use a subquery to get the song counts and then apply a window function to assign a rank to each artist based on their song count. The resulting output should give us a list of artists with their respective ranks.
Do you know why we're using instead of ?
While both functions assign the same rank to duplicates, actually skips the next number in the ranking, which is not what we want for this question. That's why we'll be using instead, which doesn't skip any rank numbers.
Here's an example to illustrate the difference between DENSE_RANK and RANK:
artist_id | song_count | dense_rank_num | rank_num |
---|---|---|---|
125 | 6 | 1 | 1 |
101 | 5 | 2 | 2 |
145 | 4 | 3 | 3 |
200 | 4 | 3 | 3 |
240 | 3 | 4 | 5 |
If we compare the and columns in the table, we can see a difference in the ranking of artists with the same number of songs.
For example, artist ID 145 and 200 both have 4 songs, so they should have the same rank. With , both artists are correctly ranked as 3rd, and the next artist, ID 240, is ranked 4th. However, with , the 4th rank is skipped, so artist ID 240 is labeled as the 5th rank. This is not what we want, as it breaks the continuity of the ranking. Therefore, we use to ensure that all ranks are assigned without skipping any numbers.
It's important that the ranking is continuous (1, 2, 2, 3, 4, 5) and not skipped (1, 2, 2, 4, 5). So in this case, we want to use . Do you see the difference now? ;)
Step 3: Limit Results to Top 5 Ranked Artists and Join with Artist Names
Finally, we need to filter the results to the top 5 artists by their rank and join the result with the table to get their names.
Here's the query:
PostgreSQL 14