Back to questions
CVS Health is trying to better understand its pharmacy sales, and how well different drugs are selling.
Write a query to find the top 2 drugs sold, in terms of units sold, for each manufacturer. List your results in alphabetical order by manufacturer.
Column Name | Type |
---|---|
product_id | integer |
units_sold | integer |
total_sales | decimal |
cogs | decimal |
manufacturer | varchar |
drug | varchar |
product_id | units_sold | total_sales | cogs | manufacturer | drug |
---|---|---|---|---|---|
94 | 132362 | 2041758.41 | 1373721.70 | Biogen | UP and UP |
9 | 37410 | 293452.54 | 208876.01 | Eli Lilly | Zyprexa |
50 | 90484 | 2521023.73 | 2742445.9 | Eli Lilly | Dermasorb |
61 | 77023 | 500101.61 | 419174.97 | Biogen | Varicose Relief |
136 | 144814 | 1084258.00 | 1006447.73 | Biogen | Burkhart |
109 | 118696 | 1433109.50 | 263857.96 | Eli Lilly | Tizanidine Hydrochloride |
manufacturer | top_drugs |
---|---|
Biogen | Burkhart |
Biogen | UP and UP |
Eli Lilly | Tizanidine Hydrochloride |
Eli Lilly | TA Complete Kit |
Biogen sold 144,814 units of Burkhart drug (ranked 1) followed by the second highest with 132,362 units of UP and UP drug (ranked 2).
Eli Lilly sold 118,696 units of Tizanidine Hydrochloride drug (ranked 1) followed by the second highest with 90,484 units of TA Complete Kit drug (ranked 2).
The dataset you are querying against may have different input & output - this is just an example!
Let's divide this question into these 3 steps:
The data will first be grouped according to manufacturers, and the ranks will be determined by the units sold.
We can implement this by using the window function to partition all drug-related information by manufacturers. The clause sorts the rows in each manufacturer-partition by the column in descending order.
In each partition, the drug with the highest number of units sold will be positioned as no. 1, followed by the drug with the second-highest units sold at no. 2, and so on.
Showing the top 3 records for Biogen and 2 records for Eli Lilly:
manufacturer | drug | units_sold | drug_position |
---|---|---|---|
Biogen | DIPHENHYDRAMINE HYDROCHLORIDE | 231084 | 1 |
Biogen | Nefazodone Hydrochloride | 201167 | 2 |
Biogen | Non Aspirin PM | 197527 | 3 |
Eli Lilly | Cialis | 498342 | 1 |
Eli Lilly | Metoclopramide | 260823 | 2 |
Let's examine the implementation of the window function here:
For Biogen, DIPHENHYDRAMINE HYDROCHLORIDE and Nefazodone Hydrochloride drugs are ranked 1 and 2 as both drugs reported the highest units sold among the 3 drugs by Biogen.
From there on, we can simply wrap the query into a common table expression (CTE) and filter for the top 2 drugs sold (denoted as ) by each manufacturer.
A CTE is a temporary data set to be used as part of a query and it exists during the entire query session. Click here to read about it in detail.
Query output:
manufacturer | drug | drug_position |
---|---|---|
Biogen | DIPHENHYDRAMINE HYDROCHLORIDE | 1 |
Biogen | Nefazodone Hydrochloride | 2 |
Eli Lilly | Cialis | 1 |
Eli Lilly | Metoclopramide | 2 |
As the final step, we will sort the result in alphabetical order by the manufacturer using the clause. Additionally, we will eliminate the unnecessary columns from the final solution.
We have completed all three steps to address the problem and concluded the final query.
Solution #1: Using CTE
Solution #2: Using Subquery