Back to questions
Given the reviews table, write a query to retrieve the average star rating for each product, grouped by month. The output should display the month as a numerical value, product ID, and average star rating rounded to two decimal places. Sort the output first by month and then by product ID.
P.S. If you've read the Ace the Data Science Interview, and liked it, consider writing us a review?
Column Name | Type |
---|---|
review_id | integer |
user_id | integer |
submit_date | datetime |
product_id | integer |
stars | integer (1-5) |
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
Product 50001 received two ratings of 4 and 3 in the month of June (6th month), resulting in an average star rating of 3.5.
The dataset you are querying against may have different input & output - this is just an example!
Here's some more Amazon SQL Interview Questions:
As observed, the table does not have a separate column for month. Therefore, we need to extract the month from the column using the function, which returns the month in numerical format.
Refer to this SQL date/time tutorial for more explanation on the function for dates.
To calculate the average star ratings per month for each product, we can use the aggregate function to calculate the mean of the stars column and the function to round the result to two decimal places for accuracy.
The query would be as follows:
In SQL, the order of query execution is important to understand. In the given solution's query, the sequence of execution is as follows:
It's important to note that the clause is executed before the statement. Therefore, we cannot use the alias in the clause, as the column is created after the statement is executed. However, we can use the mth alias in the ORDER BY clause, as it is executed after the statement, and the mth column has been created by then.
Understanding the order of SQL execution is crucial, as it is a common topic in technical interviews and is a common source of errors (since it's just too easy to reference an aliased column that the SQL query execution engine doesn't know about yet). For more details, refer to this SQL tutorial on Query Execution Order.