At AMD, SQL is used quite frequently for analyzing performance data for processor enhancements, and managing datasets related to semiconductor manufacturing processes. They even make solutions to speed up SQL queries. Because of this, AMD frequently asks SQL coding questions in interviews for Data Analyst, Data Science, and BI jobs.
So, if you're studying for a SQL Interview, we've collected 10 AMD SQL interview questions to practice – how many can you solve?
AMD produces computer chips and related technologies for business and consumer markets. As an interviewee, you are given a dataset of reviews for AMD products.
The reviews
table contains the following columns: review_id
(integer), user_id
(integer), submit_date
(timestamp), product_id
(integer), and stars
(integer ranging from 1 to 5). submit_date
is the date and time when a review was submitted. stars
is the number of stars given by a user.
Write a SQL query to calculate the average ratings (stars) of each product, grouped by month. The results should be ordered by the month and product_id.
reviews
Example Input:review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 00:00:00 | 50001 | 4 |
7802 | 265 | 2022-06-10 00:00:00 | 69852 | 4 |
5293 | 362 | 2022-06-18 00:00:00 | 50001 | 3 |
6352 | 192 | 2022-07-26 00:00:00 | 69852 | 3 |
4517 | 981 | 2022-07-05 00:00:00 | 69852 | 2 |
SELECT EXTRACT(MONTH FROM submit_date) AS mth, product_id, AVG(stars) OVER (PARTITION BY product_id, EXTRACT(MONTH FROM submit_date)) as avg_stars FROM reviews ORDER BY mth, product_id ;
This query uses the EXTRACT
function to get the month from the submit_date
column. We then group by product_id
and the extracted month by using the PARTITION BY
clause inside the AVG
window function. This will calculate the average stars for each group of product_id
and month. Finally, we order the result by mth
and product_id
to get the desired result.
p.s. Window functions show up super often during SQL interviews, so practice the 27+ window function questions on DataLemur
You are a data analyst at AMD, a company that designs and builds processors for computers. AMD has a variety of processors which they sell on their e-commerce platform, and they want to build a report that displays the total sales volume per product type.
Given the following Orders
and Products
tables:
Orders
Sample Input:order_id | product_id | quantity | order_date |
---|---|---|---|
1001 | 2001 | 3 | 2020-01-01 |
1002 | 2002 | 2 | 2020-02-15 |
1003 | 2003 | 1 | 2020-03-01 |
1004 | 2002 | 5 | 2020-03-20 |
1005 | 2001 | 2 | 2020-04-10 |
Products
Sample Input:product_id | product_name | product_type |
---|---|---|
2001 | Ryzen 7 | Desktop Processor |
2002 | Ryzen 5 | Desktop Processor |
2003 | Threadripper | High Performance Desktop Processor |
Write a SQL query that returns a table that displays the total quantity sold for each product type in the year 2020.
SELECT p.product_type, SUM(o.quantity) AS total_quantity_sold FROM Orders o JOIN Products p ON o.product_id = p.product_id WHERE EXTRACT(YEAR FROM o.order_date) = 2020 GROUP BY p.product_type;
Given the sample input above, the query first joins the Orders
and Products
tables on the product_id
field. Then, it only considers orders that were placed in the year 2020. Finally, it groups the results by the product_type
field and calculates the total quantity sold for each product type.
product_type | total_quantity_sold |
---|---|
Desktop Processor | 10 |
High Performance Desktop Processor | 1 |
undefined |
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1st, 2nd, 3rd normal forms).
Denormalization is done to improve the performance of the database when it is being used for online analytics processing (OLAP), rather than online transaction processing (OLTP) use cases.
For these OLAP use cases, you're bottleneck frequently is joining multiple tables, but de-normalizing your database cuts these outs.
Denormalization can improve query performance by reducing the number of expensive joins required to retrieve data, but it comes with its own drawbacks too. By adding redundant columns, you incur more data storage costs, and there's need for more complex update and delete operations in order to maintain data integrity across all the duplicated data. Thus, it's important to carefully consider the trade-offs involved before implementing denormalization.
For AMD, they may ask you to find the average units sold per month for a specific processor category (Let's say "Ryzen") in the past year. This information will help management to understand the sales trend and make decisions for future production and inventory management. {#Question-4}
sales
Example Input:sale_id | product_category | product_id | sale_date | units_sold |
---|---|---|---|---|
1001 | Ryzen | R7-3800X | 2021-06-05 | 20 |
1023 | Ryzen | R5-5600X | 2021-07-20 | 15 |
1109 | Threadripper | TR-3990X | 2021-06-27 | 7 |
1156 | Ryzen | R7-3800X | 2021-07-28 | 12 |
1234 | Threadripper | TR-3990X | 2021-07-03 | 8 |
1290 | Ryzen | R5-5600X | 2021-07-30 | 18 |
1310 | Ryzen | R7-3800X | 2021-07-15 | 22 |
Month | avg_units_sold |
---|---|
June | 13.5 |
July | 16.8 |
Using PostgreSQL, the SQL query for the problem would be:
SELECT TO_CHAR(sale_date, 'Month') as Month, AVG(units_sold) as avg_units_sold FROM sales WHERE product_category = 'Ryzen' AND sale_date BETWEEN '2021-01-01' AND '2021-12-31' GROUP BY Month ORDER BY Month;
This SQL query uses the AVG()
function to calculate the average units sold per month in the past year for 'Ryzen' category. The WHERE
clause filters for records where product_category
is 'Ryzen' and the sales happened in the past year. TO_CHAR(sale_date, 'Month')
function is used to convert the sale date to month. GROUP BY
is used to separate the data into different groups of 'Month'. Finally, the ORDER BY
clause orders the results by 'Month'.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for aggregating sales for products or this Wayfair Y-on-Y Growth Rate Question which is similar for analyzing sales over time.
Denormalization is the process of modifying a database schema in a way that deviates from the typical rules of normalization (1NF, 2NF, 3NF, etc.). There's a few reasons to denormalize a database:
Improved performance: Joins are slow AF, especially when dealing with the massive datasets that are typically used at AMD. Denormalization can improve the performance of a database by reducing the number of joins that are required to retrieve data. This can be particularly useful when the database is being used for querying and reporting purposes, as joins can be expensive and slow.
Scalability: By reducing the amount of data that needs to be read and processed to execute a query, denormalization can enhance the scalability of a database. This can be useful when the database is anticipated to handle a large number of read-only queries (such as in OLAP use cases).
Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.
Because denormalization can create more complex update and delete operations, and pose potential data integrity issues, consider denormalization only if joins are causing performance bottlenecks.
AMD is interested in understanding the click-through conversion rates for their digital product advertisements that lead users to add the product into their cart. Assuming we have two tables:
The first table, ad_clicks
, contains information about users who clicked on the ads.
ad_clicks
Example Input:click_id | user_id | click_timestamp | product_id |
---|---|---|---|
101 | 17 | 2022-03-01 08:15:20 | A1 |
102 | 20 | 2022-03-01 08:46:49 | A2 |
103 | 24 | 2022-03-02 10:02:15 | A1 |
104 | 17 | 2022-03-02 14:20:37 | A3 |
105 | 20 | 2022-03-03 07:55:42 | A2 |
The second table, cart_actions
, contains information about users who added the product from the ad into their cart.
cart_actions
Example Input:action_id | user_id | action_timestamp | product_id |
---|---|---|---|
501 | 17 | 2022-03-01 09:10:10 | A1 |
502 | 20 | 2022-03-01 22:46:37 | A2 |
503 | 24 | 2022-03-03 11:05:12 | A1 |
They want to know the click-through conversion rate for each product based on these actions.
The click-through conversion rate for a product is calculated as the percentage of ad clicks that result in a product being added to the cart. A same user may click the same product ad multiple times and may also add the same product to the cart multiple times.
SELECT ac.product_id, COUNT(DISTINCT ca.action_id) AS Num_Add_to_Cart, COUNT(DISTINCT ac.click_id) AS Num_Ads_Clicks, (COUNT(DISTINCT ca.action_id)::float / NULLIF(COUNT(DISTINCT ac.click_id)::float, 0)) * 100 AS Conversion_Rate FROM ad_clicks ac LEFT JOIN cart_actions ca ON ac.user_id = ca.user_id AND ac.product_id = ca.product_id GROUP BY ac.product_id ORDER BY Conversion_Rate DESC;
This query first joins the ad_clicks
table and cart_actions
table on the basis of user_id
and product_id
. The conversion rate is then computed as the ratio of distinct action_id
to distinct click_id
(representing the number of items added to cart and the number of ads clicked, respectively) for each product. The NULLIF
function is used to protect against division by zero error. The results are then sorted in descending order of conversion rate.
To solve another question about calculating rates, try this TikTok SQL Interview Question on DataLemur's online SQL code editor:
A database index is a data structure that improves the speed of data retrieval operations on a database table.
There are few different types of indexes that can be used in a database:
For a concrete example, say you had a table of AMD customer payments with the following columns:
payment_id
customer_id
payment_amount
payment_date
Here's what a clustered index on the payment_date
column would look like:
CREATE CLUSTERED INDEX payment_date_index ON amd_customer_payments (payment_date)
A clustered index on the payment_date
column would determine the physical order of the records in the table based on the payment_date
. This means that the records with the earliest payment_date
values would be stored together physically in the table, followed by records with later payment_date values. This speeds up queries that filter or sort the data based on the payment_date
, as the records are already physically sorted in the table. For example, if you want to retrieve all the payments made in the month of June, the database can use the clustered index to quickly locate and retrieve the desired records without having to search through the entire table.
Given our product and sales data, can you determine the yearly average sales prices for each of the AMD products?
products
Example Input:product_id | product_name |
---|---|
1 | AMD Ryzen 7 5800X |
2 | AMD Radeon RX 6800 |
3 | AMD Ryzen 5 3600 |
sales
Example Input:sale_id | product_id | sale_date | sale_price |
---|---|---|---|
1 | 1 | 2022-02-05 | 329.99 |
2 | 2 | 2022-03-28 | 579.99 |
3 | 1 | 2022-05-17 | 339.99 |
4 | 3 | 2022-07-13 | 199.99 |
5 | 2 | 2022-08-25 | 599.99 |
6 | 3 | 2023-01-01 | 209.99 |
7 | 1 | 2023-02-05 | 329.99 |
8 | 2 | 2023-03-28 | 589.99 |
9 | 1 | 2023-05-17 | 339.99 |
10 | 3 | 2023-07-13 | 209.99 |
year | product_name | avg_sale_price |
---|---|---|
2022 | AMD Ryzen 7 5800X | 334.99 |
2022 | AMD Radeon RX 6800 | 589.99 |
2022 | AMD Ryzen 5 3600 | 199.99 |
2023 | AMD Ryzen 7 5800X | 334.99 |
2023 | AMD Radeon RX 6800 | 589.99 |
2023 | AMD Ryzen 5 3600 | 209.99 |
SELECT EXTRACT(YEAR FROM sale_date) AS year, p.product_name, ROUND(AVG(s.sale_price), 2) AS avg_sale_price FROM sales s JOIN products p ON s.product_id = p.product_id GROUP BY year, p.product_name;
The query joins the 'sales' and 'products' tables on 'product_id' to bring the product name and sale details together. It then uses the PostgreSQL function EXTRACT to isolate the year from the sale date. It groups the data by the extracted year and product name, and calculates the average sale price of each product for each year. It rounds this average to two decimal places for readability. undefined
You are given a table Customers
for a company like AMD. This table includes a customer_name
attribute that encompasses both column first name and last name, separated by a space.
The company is starting a campaign targeting customers with some specific pattern in their name. Write a SQL query that will help the marketing team find all the customers whose first name starts with 'J' and last name ends with 'son'.
Customers
Example Input:customer_id | customer_name |
---|---|
125 | John Johnson |
837 | Jane Anderson |
353 | Samuel Jackson |
257 | Deborah Samuelson |
761 | Jacob Wilson |
customer_id | customer_name |
---|---|
125 | John Johnson |
761 | Jacob Wilson |
SELECT customer_id, customer_name FROM Customers WHERE customer_name LIKE 'J% son';
This query uses the LIKE
operator to filter the customers based on the pattern provided. The '%' is a wildcard character that represents any number of characters. So 'J% son' will match any customer name that starts with 'J' and ends with 'son', and the space before 'son' ensures that 'son' is at the end of the last name, not just part of it. This will give us all customers whose first name starts with 'J' and last name ends with 'son'.
undefined
MINUS
/ EXCEPT
commands in SQL?For a tangible example of EXCEPT
in PostgreSQL, suppose you were doing an HR Analytics project for AMD, and had access to AMD's contractors and employees data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables.
You could use EXCEPT
operator to find all contractors who never were a employee using this query:
SELECT first_name, last_name FROM amd_contractors EXCEPT SELECT first_name, last_name FROM amd_employees
Note that EXCEPT
is available in PostgreSQL and SQL Server, while MINUS
is the equivalent operator which is available in MySQL and Oracle (but don't worry about knowing which RDBMS supports which exact commands since AMD interviewers aren't trying to trip you up on memorizing SQL syntax).
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the AMD SQL interview is to solve as many practice SQL interview questions as you can!
In addition to solving the above AMD SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like Netflix, Airbnb, and Amazon.
Each DataLemur SQL question has multiple hints, step-by-step solutions and most importantly, there's an online SQL code editor so you can right online code up your query and have it graded.
To prep for the AMD SQL interview it is also wise to practice interview questions from other tech companies like:
But if your SQL query skills are weak, forget about going right into solving questions – improve your SQL foundations with this interactive SQL tutorial.
This tutorial covers things like math functions like CEIL()/FLOOR() and GROUP BY – both of these come up routinely in AMD SQL interviews.
For the AMD Data Science Interview, beyond writing SQL queries, the other types of questions to prepare for are:
The best way to prepare for AMD Data Science interviews is by reading Ace the Data Science Interview. The book's got: