Back to questions
Assume there are three Spotify tables: , , and , which contain information about the artists, songs, and music charts, respectively.
Write a query to find the top 5 artists whose songs appear most frequently in the Top 10 of the table. Display the top 5 artist names in ascending order, along with their song appearance ranking.
If two or more artists have the same number of song appearances, they should be assigned the same ranking, and the rank numbers should be continuous (i.e. 1, 2, 2, 3, 4, 5). If you've never seen a rank order like this before, do the rank window function tutorial.
Column Name | Type |
---|---|
artist_id | integer |
artist_name | varchar |
label_owner | varchar |
artist_id | artist_name | label_owner |
---|---|---|
101 | Ed Sheeran | Warner Music Group |
120 | Drake | Warner Music Group |
125 | Bad Bunny | Rimas Entertainment |
Column Name | Type |
---|---|
song_id | integer |
artist_id | integer |
name | varchar |
song_id | artist_id | name |
---|---|---|
55511 | 101 | Perfect |
45202 | 101 | Shape of You |
22222 | 120 | One Dance |
19960 | 120 | Hotline Bling |
Column Name | Type |
---|---|
day | integer (1-52) |
song_id | integer |
rank | integer (1-1,000,000) |
day | song_id | rank |
---|---|---|
1 | 45202 | 5 |
3 | 45202 | 2 |
1 | 19960 | 3 |
9 | 19960 | 15 |
artist_name | artist_rank |
---|---|
Ed Sheeran | 1 |
Drake | 2 |
Ed Sheeran's song appeared twice in the Top 10 list of global song rank while Drake's song is only listed once. Therefore, Ed is ranked #1 and Drake is ranked #2.
The dataset you are querying against may have different input & output - this is just an example!
Step 1: Retrieve the song counts for the 10 artists
To start, we combine the table with the and tables using appropriate joins.
Then, we calculate the number of times each artist's song appears in the top 10 of the chart, grouping the results by the artist. This query provides us with a list of artists and their corresponding song counts.
Step 2: Rank the artists based on song appearances in the top 10 chart
Now that we have the artists and their song counts, we need to assign ranks to them based on the number of times their songs appear in the top 10 chart. We can achieve this using the window function.
The resulting query provides a list of artists, their song counts, and their respective ranks in descending order of song appearances.
Do you know why we're using instead of ?
While both functions assign the same rank to duplicates, skips the next number in the ranking, which is not suitable for this question. Hence, we use , which ensures that all ranks are assigned without skipping any numbers.
Let's consider the following example:
artist_name | song_count | dense_rank_num | rank_num |
---|---|---|---|
Bad Bunny | 6 | 1 | 1 |
Ed Sheeran | 5 | 2 | 2 |
Lady Gaga | 4 | 3 | 3 |
Adele | 4 | 3 | 3 |
Katy Perry | 3 | 4 | 5 |
In this example, Lady Gaga and Adele both have 4 songs, so they should have the same rank. With , both artists are correctly ranked as 3rd, and the next artist, Katy Perry is ranked 4th. However, with , the 4th rank is skipped, so Katy Perry is labeled as the 5th rank. This breaks the continuity of the ranking.
For our question, it's important to have a continuous ranking without skipping any numbers. Therefore, we use to ensure all ranks are assigned correctly.
Step 3: Retrieve the top 5 ranked artists along with their ranks
Finally, we need to filter the results to include only the top 5 ranked artists.