The Amazon interview process is notoriously challenging, featuring multiple SQL rounds for most Business Analyst, Data Analyst, Data Science, Data Engineering, and Business Intelligence Engineer (BIE) roles. And while the work-life balance at Amazon isn't to die for, and they've been doing layoffs in 2024, and the perks are lacking compared to other FAANG companies, here's the silver lining: the compensation IS good at Amazon.
So, to help you get that Jeff Bezos money, here's 6 real Amazon SQL interview questions to practice so you can Ace the SQL interview!
In a recent Amazon data analyst interview, the candidate was given the reviews table, and asked to write a SQL query to get the average stars for each product every month.
The output should include the month in numerical value, product id, and average star rating rounded to two decimal places. Sort the output based on month followed by the product id.
reviews
Example Input:review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
Before peaking at the solution, you can try this real Amazon Data Analyst SQL interview question online in our interactive SQL code editor:
As we can see, there is no month column in the reviews
table. First, we have to extract the month from the submit_date
column.
There is a simple function to extract month from a date. Here's the syntax: EXTRACT(MONTH from column_name)
You can look at this page for more explanation on the EXTRACT
function.
After extracting the month in numerical values, get the average of the star ratings and round them to two decimal places. It can be achieved using the functions AVG()
and ROUND()
. If you're a bit rusty on how to do math in SQL, try this interactive tutorial on ROUND/FLOOR/CEIL/ABS.
Solution:
SELECT EXTRACT(MONTH FROM submit_date) AS mth, product_id, ROUND(AVG(stars), 2) AS avg_stars FROM reviews GROUP BY EXTRACT(MONTH FROM submit_date), product_id ORDER BY mth, product_id;
SELECT *
Note: the interviewer will likely push you for more detail, or ask you about a real example about a time you had to make one of these optimizations yourself.
While this question might be out-of-scope for Business Analysts, Data Analysts, Data Scientists, Amazon expects people interviewing for Data Engineering and Business Intelligence Engineering (BIE) roles to know how databases work internally, and best practices for database design.
That's because when operating on massive Amazon/AWS scale, they expect their Data Engineers/SWEs/BIEs to squeeze out every last performance improvement.
If you aren't familiar with these SQL optimization concepts, check out my article on how to prep for database design interviews and this guide on SQL optimization from ThoughtSpot.
Constraints are simply rules for what data goes into your database. Some SQL constraints are:
NULL
value from being insertedAssume you are given the table containing information on Amazon customers and their spending on products in various categories. Identify the top two highest-grossing products within each category in 2022. Output the category, product, and total spend.
product_spend
Example Input:category | product | user_id | spend | transaction_date |
---|---|---|---|---|
appliance | refrigerator | 165 | 246.00 | 12/26/2021 12:00:00 |
appliance | refrigerator | 123 | 299.99 | 03/02/2022 12:00:00 |
appliance | washing machine | 123 | 219.80 | 03/02/2022 12:00:00 |
electronics | vacuum | 178 | 152.00 | 04/05/2022 12:00:00 |
electronics | wireless headset | 156 | 249.90 | 07/08/2022 12:00:00 |
electronics | vacuum | 145 | 189.00 | 07/15/2022 12:00:00 |
category | product | total_spend |
---|---|---|
appliance | refrigerator | 299.99 |
appliance | washing machine | 219.80 |
electronics | vacuum | 341.00 |
electronics | wireless headset | 249.90 |
Before reading the solution, why don't try the Highest-Grossing Items Amazon SQL Interview Question yourself?
To find the highest-grossing products, we must find the total spend by category and product. Note that we must filter by transactions in 2022.
SELECT category, product, SUM(spend) AS total_spend FROM product_spend WHERE transaction_date >= '2022-01-01' AND transaction_date <= '2022-12-31' GROUP BY category, product;
category | product | total_spend |
---|---|---|
electronics | wireless headset | 447.90 |
appliance | refrigerator | 299.99 |
appliance | washing machine | 439.80 |
electronics | computer mouse | 45.00 |
electronics | vacuum | 486.66 |
The output represents the total spend by category (electronics, appliance) and product.
Then, we reuse the query as a CTE or subquery (in this case, we are using a CTE) and utilize the RANK
window function to calculate the ranking by total spend, partition by category and order by the total spend in descending order.
Read more about SQL Window Functions.
WITH product_category_spend AS ( -- Insert query above) SELECT *, RANK() OVER ( PARTITION BY category ORDER BY total_spend DESC) AS ranking FROM product_category_spend;
category | product | total_spend | ranking |
---|---|---|---|
appliance | washing machine | 439.80 | 1 |
appliance | refrigerator | 299.99 | 2 |
electronics | vacuum | 486.66 | 1 |
electronics | wireless headset | 447.90 | 2 |
electronics | computer mouse | 45.00 | 3 |
Finally, we use this result and filter for a rank less than or equal to 2 as the question asks for top two highest-grossing products only.
WITH product_category_spend AS ( SELECT category, product, SUM(spend) AS total_spend FROM product_spend WHERE transaction_date >= '2022-01-01' AND transaction_date <= '2022-12-31' GROUP BY category, product ), top_spend AS ( SELECT *, RANK() OVER ( PARTITION BY category ORDER BY total_spend DESC) AS ranking FROM product_category_spend) SELECT category, product, total_spend FROM top_spend WHERE ranking <= 2 ORDER BY category, ranking;
RANK()
and DENSE_RANK()
? Essentially RANK
is to SELECT what dense_rank()
is to SELECT DISTINCT.
RANK() gives you the ranking within your ordered partition. Ties have the same rank, with the next ranking(s) skipped. So, if you have 4 items at rank 2, the next rank listed would be ranked 6.
DENSE_RANK() also ranks within your ordered partition, BUT the ranks are consecutive. This means no ranks are skipped if there are ranks with multiple items, and the rank order depends on your ORDER BY
clause.
This question about Amazon orders comes from a real Amazon Data Analyst SQL assessment. It’s a multi-part SQL question, similar to how take-home SQL challenges are structured, and asks increasingly more complex questions about the amazon orders.
Your given an orders
table:
ORDERS
Here’s some sample data from orders
:
order_id | customer_id | order_datetime | item_id | order_quantity |
---|---|---|---|---|
O-001 | 42489 | 2023-06-15 04:35:22 | C004 | 3 |
O-005 | 11733 | 2023-01-12 11:48:35 | C005 | 1 |
O-005 | 11733 | 2023-01-12 11:48:35 | C008 | 1 |
O-006 | 83167 | 2023-01-16 02:52:07 | C012 | 2 |
You are also given an items
table:
ITEMS
Here’s some sample data from items
:
item_id | item_category |
---|---|
C004 | Books |
C005 | Books |
C006 | Apparel |
C007 | Electronics |
C008 | Electronics |
SELECT current_date - INTEGER '1' AS yesterday_date
order_id
for all customer for each date they placed an order. Hint: customers can place multiple orders on a single day!order_id
for each customer for each date they placed two or more orders.We’ve left the answers to this Amazon take-home challenge to the reader, but you can find a similar multi-part SQL assessment from CVS Health on Pharmacy Analytics which comes with full solutions.
Besides solving the Amazon SQL interview questions above, we recommend practicing the bigger list of SQL interview questions from comparable companies like Facebook and Google.
You can also learn what SQL interviews generally cover, and how to best prepare for them in our 6,000 word SQL interview guide.
While Amazon Business Analyst interviews ask easier SQL questions than Data Engineering and BIE interviews, things can still be pretty rough, especially if you're more used to using Excel all day.
As such, we recommend Business Analysts master these simpler SQL topics before an Amazon SQL interview:
If these Amazon questions look too easy, then try some more advanced sql interview questions for a real challenge.
For data roles like data science and data analytics, Amazon interviews also ask statistics interview questions, Machine Learning interview questions, product-sense interview questions, and of course coding interview questions:
Make sure to study up on your stats interview questions as well, as these at vital basics expected from every Data Scientist.
I'm a bit biased, but I also recommend the book Ace the Data Science Interview because it has multiple Amazon Data Science Interview questions with solutions in it.
And if you're looking for a one-page SQL Interview Cheat Sheet that covers everything you need to know for the interview check out this free one from DataLemur!
You should also study the Amazon leadership principles to pass the tricky bar-raiser rounds. For a deep dive into the Amazon 16 leadership principles, along with potential behavioral questions you'll get at Amazon check out our Amazon Behavioral Interview Guide.
You can also watch my mock Amazon behavioral interview video for help with the behavioral aspect!
Also Amazon will be asking more than just SQL questions, read up and prepare for the Python portion with the 9 best books on Python for Data Scientists.