Back to questions
This is the same question as problem #32 in the SQL Chapter of Ace the Data Science Interview!
Assume you're given a table containing information about Wayfair user transactions for different products. Write a query to calculate the year-on-year growth rate for the total spend of each product, grouping the results by product ID.
The output should include the year in ascending order, product ID, current year's spend, previous year's spend and year-on-year growth percentage, rounded to 2 decimal places.
Column Name | Type |
---|---|
transaction_id | integer |
product_id | integer |
spend | decimal |
transaction_date | datetime |
transaction_id | product_id | spend | transaction_date |
---|---|---|---|
1341 | 123424 | 1500.60 | 12/31/2019 12:00:00 |
1423 | 123424 | 1000.20 | 12/31/2020 12:00:00 |
1623 | 123424 | 1246.44 | 12/31/2021 12:00:00 |
1322 | 123424 | 2145.32 | 12/31/2022 12:00:00 |
year | product_id | curr_year_spend | prev_year_spend | yoy_rate |
---|---|---|---|---|
2019 | 123424 | 1500.60 | NULL | NULL |
2020 | 123424 | 1000.20 | 1500.60 | -33.35 |
2021 | 123424 | 1246.44 | 1000.20 | 24.62 |
2022 | 123424 | 2145.32 | 1246.44 | 72.12 |
Product ID 123424 is analyzed for multiple years: 2019, 2020, 2021, and 2022.
To calculate the year-on-year growth rate, we compare the current year's spend with the previous year's spend.For instance, the spend grew by 24.62% from 2020 to 2021, indicating a positive growth rate.
The dataset you are querying against may have different input & output - this is just an example!
First, we need to summarize the user transactions table to obtain the yearly spend information for each product. We'll use the function on the transaction date to extract the year and select the product ID and spend for each year.
Here's the query for this step:
This query will generate a table that shows the product ID, the year, and the spend for each product in each year.
Here's the yearly spend for product ID 234412:
year | product_id | curr_year_spend |
---|---|---|
2019 | 234412 | 1800.00 |
2020 | 234412 | 1234.00 |
2021 | 234412 | 889.50 |
2022 | 234412 | 2900.00 |
Next, we'll calculate the previous year's spend for each product using the window function, which you can understand more here. The LAG function allows us to access the spend of the previous year based on the product ID.
Here's an example output table for product ID 234412:
year | product_id | curr_year_spend | prev_year_spend |
---|---|---|---|
2019 | 234412 | 1800.00 | NULL |
2020 | 234412 | 1234.00 | 1800.00 |
2021 | 234412 | 889.50 | 1234.00 |
2022 | 234412 | 2900.00 | 889.50 |
In this table, you can see that the previous year's spend is displayed in the column, which is populated based on the product ID and the order of the years.
For example, in the year 2020, the previous year's spend is 1800.00, which is the spend for the year 2019. Similarly, in the year 2021, the previous year's spend is 1234.00, which is the spend for the year 2020.
In the final step, we'll wrap the query from Step 2 in CTE called .
Within this CTE, we'll apply the year-on-year (y-o-y) growth rate formula to calculate the growth rate between the current year's spend and the previous year's spend.
Year-on-Year Growth Rate = ((Current Year's Spend - Previous Year’s Spend) / Previous Year’s Spend) x 100
We'll also round the growth rate to 2 decimal places.
Here's the final output table for product id 234412:
year | product_id | curr_year_spend | prev_year_spend | yoy_rate |
---|---|---|---|---|
2019 | 234412 | 1800.00 | NULL | NULL |
2020 | 234412 | 1234.00 | 1800.00 | -31.44 |
2021 | 234412 | 889.50 | 1234.00 | -27.92 |
2022 | 234412 | 2900.00 | 889.50 | 226.03 |
PostgreSQL 14