At Coupang, SQL is used across the company for analyzing customer purchasing patterns for targeted marketing campaigns and optimizing inventory management by predicting product demand. That's why Coupang almost always evaluates jobseekers on SQL coding questions during interviews for Data Analytics, Data Science, and Data Engineering jobs.
To help you prepare for the Coupang SQL interview, we've collected 10 Coupang SQL interview questions – able to answer them all?
At Coupang, customer feedback via product reviews is an essential indicator of product quality and customer satisfaction. Your task is to write a SQL query to calculate the monthly average review score for each product. You will use the table named , which has the following columns:
Sample table:
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
This PostgreSQL query groups reviews by month of submission and product_id, then applies the AVG function to compute the average stars for each month-product group. The result is ordered by month and product_id.
For Coupang, an important task might be to identify customers who have made purchases in certain product categories and who live in particular regions. This information can be used for targeted promotions, personalized recommendations and more. Imagine the company has a specific promotion for customers in the Seoul region who have purchased electronics in the past.
Given two tables, and , write a SQL query that returns the customer_id, customer_name, and region for customers who live in Seoul and have purchased electronics.
The table has the following structure:
customer_id | customer_name | region |
---|---|---|
1 | John Doe | Seoul |
2 | Jane Smith | Busan |
3 | Will Kim | Daegu |
4 | Sarah Park | Seoul |
The table has the following structure:
purchase_id | customer_id | product_category |
---|---|---|
5001 | 1 | electronics |
5002 | 2 | books |
5003 | 3 | electronics |
5004 | 4 | health & beauty |
5005 | 1 | books |
This PostgreSQL query joins the and tables on the field. It then filters for customers in the Seoul region who have made a purchase in the electronics product category. Such a SQL query could be helpful in the scenario described above, where targeted promotions for certain regions and product categories are being planned.
Normalization is the process of organizing fields and tables of a database to minimize redundancy and dependency. While there are technically 5 levels (normal forms), the 3 most important normal forms you need to know about for SQL interviews at Coupang are:
Coupang, a large e-commerce company, runs various ads campaign to boost its sales. Users click on these ads, view the products and then may add those products into their cart. The company wants to measure the efficiency of these campaigns by analyzing their click-through conversion rates, specifically from viewing a product to adding that product to the cart.
Auxiliary tables have been created to help in this analysis. The table contains a record each time a user clicks on an ad. The table records an entry each time a user views a product from the ad click. The table records an entry whenever a product is added to the cart.
ad_id | user_id | clicked_at | product_id |
---|---|---|---|
1001 | 101 | 2022-06-01 10:05:45 | 5001 |
1002 | 102 | 2022-06-01 11:15:36 | 5002 |
1003 | 101 | 2022-06-01 12:45:50 | 5001 |
1004 | 102 | 2022-06-01 13:20:45 | 5002 |
view_id | user_id | viewed_at | product_id |
---|---|---|---|
2010 | 101 | 2022-06-01 10:10:45 | 5001 |
2020 | 102 | 2022-06-01 11:20:36 | 5002 |
addition_id | user_id | added_at | product_id |
---|---|---|---|
4001 | 101 | 2022-06-01 12:05:45 | 5001 |
4002 | 102 | 2022-06-01 14:20:36 | 5002 |
The task is to write a SQL query that will give the number of ads clicked, products viewed, and products added to the cart for each product.
The above query first joins with on and to get number of ad clicks and product views for each product. It then uses a subquery to count the number of additions for each product from table. This count function in the subquery and the main query helps in counting the number of distinct ads clicked, products viewed and added to the cart. The COALESCE function ensures that if there are no cart additions for a given product_id, the num_products_added is returned as 0.
To practice a similar problem about calculating rates, try this TikTok SQL question on DataLemur's online SQL coding environment:
An inner join only includes rows from both tables that satisfy the join condition, whereas a full outer join includes all rows from both tables, regardless of whether they match the join condition or not.
For example, suppose you had a table of Coupang orders and Coupang customers.
INNER JOIN: When there is a match in the shared key or keys, rows from both tables are retrieved. In this example, an between the Orders and Customers tables would retrieve rows where the in the Orders table matches the in the Customers table.
FULL OUTER JOIN: A retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
As a data analyst for Coupang, we have a table named 'sales' that logs every sale made on our platform. Each row represents a transaction with columns for the transaction_id, product_id, customer_id, amount, and transaction_date.
In an effort to understand our daily sales performance better, we want to write a query that gives the average amount of sales of each product on each day.
transaction_id | customer_id | transaction_date | product_id | amount |
---|---|---|---|---|
1001 | 111 | 2020-10-20 | 0001 | 100 |
1002 | 222 | 2020-10-20 | 0002 | 150 |
1003 | 111 | 2020-10-21 | 0001 | 50 |
1004 | 222 | 2020-10-23 | 0002 | 200 |
1005 | 333 | 2020-10-25 | 0001 | 150 |
transaction_date | product_id | avg_sales |
---|---|---|
2020-10-20 | 0001 | 100.00 |
2020-10-20 | 0002 | 150.00 |
2020-10-21 | 0001 | 50.00 |
2020-10-23 | 0002 | 200.00 |
2020-10-25 | 0001 | 150.00 |
This query groups the sales data first by the transaction_date then the product_id. The AVG aggregate function is then used to compute the average amount of sales for each product on each day.
A foreign key is a column or group of columns in a table that refers to the primary key in another table. The foreign key constraint helps maintain referential integrity between the two tables.
Let's examine employee data from Coupang's HR database:
:
+------------+------------+------------+------------+ | employee_id| first_name | last_name | manager_id | +------------+------------+------------+------------+ | 1 | Aubrey | Graham | 3 | | 2 | Marshal | Mathers | 3 | | 3 | Dwayne | Carter | 4 | | 4 | Shawn | Carter | | +------------+------------+------------+------------+
In this table, serves as the primary key and functions as a foreign key because it links to the of the employee's manager. This establishes a relationship between Coupang employees and their managers, allowing for easy querying to find an employee's manager or see which employees report to a specific manager.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to connect each employee to their respective department and location.
Suppose Coupang is interested in discovering more about their customer base. A unique marketing initiative is being launched to reach out to customers with unusual names. For this purpose, they are interested in finding any customer whose name contains the word 'star'.
Provide a list of customer IDs and names from their customer database who satisfy this criterion.
customer_id | name |
---|---|
1 | John Doe |
2 | Starla Bright |
3 | Sarah Star |
4 | Michael Jones |
5 | Starson Richards |
customer_id | name |
---|---|
2 | Starla Bright |
3 | Sarah Star |
5 | Starson Richards |
In this query, we're making use of the keyword in SQL and the wildcard character. The character is used to represent any number of characters (including zero characters). Therefore, will match any string that contains 'star' at any position.
This query searches through the column in the table, and selects the and for any name that contains the string 'star'. The names are not case-sensitive. If you need the query to be case-sensitive, use instead of .
In Coupang, the Pricing team would like to understand the impact of discount on sales. They have a table named that records the original price, the discount applied to it (if any), and the quantity sold for each product for each day. They would like you to calculate the total revenue without discount and with discount, then calculate the percentage of revenue retained after applying the discount. Round off the percentage to 2 decimal places.
date | product_id | original_price | discount | quantity_sold |
---|---|---|---|---|
2022-06-01 | 001 | 100 | 20 | 50 |
2022-06-02 | 001 | 100 | 30 | 75 |
2022-06-03 | 001 | 100 | 10 | 25 |
2022-06-01 | 002 | 200 | 10 | 100 |
2022-06-02 | 002 | 200 | 20 | 150 |
2022-06-03 | 002 | 200 | 30 | 50 |
product_id | revenue_without_discount | revenue_with_discount | percentage_revenue_retained |
---|---|---|---|
001 | 12500.00 | 9900.00 | 79.20 |
002 | 58000.00 | 46000.00 | 79.31 |
The above SQL block groups the data by product_id. For each product, it calculates the total revenue without discount by multiplying the original price per unit with the quantity sold. It then calculates the revenue after applying the discount by multiplying the original price per unit (after adjusting for discount) with the quantity sold. The percentage of revenue retained is then calculated by dividing the revenue after discount by the revenue before discount and multiplied by 100. The ROUND() function is used to limit the result to 2 decimal places.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for calculating top grossing items> or this Wayfair Y-on-Y Growth Rate Question which is similar for calculating yearly growth rate sales.
The clause is used to filter the groups created by the clause. It's similar to the clause, but it is used to specify conditions on the groups created by the clause, rather than on the individual rows of the table.
For example, say you were analyzing salaries for analytics employees at Coupang:
This query retrieves the total salary for each Analytics department at Coupang and groups the rows by the specific department (i.e. "Marketing Analytics", "Business Analytics", "Sales Analytics" teams).
The clause then filters the groups to include only Coupang departments where the total salary is greater than $1 million
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Coupang SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the earlier Coupang SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like FAANG and tech startups.
Each SQL question has hints to guide you, step-by-step solutions and best of all, there is an interactive coding environment so you can right online code up your SQL query and have it checked.
To prep for the Coupang SQL interview you can also be useful to practice SQL questions from other tech companies like:
In case your SQL coding skills are weak, don't worry about jumping right into solving questions – go learn SQL with this DataLemur SQL tutorial.
This tutorial covers SQL concepts such as LAG window function and math functions – both of these pop up frequently in Coupang SQL assessments.
Beyond writing SQL queries, the other types of problems covered in the Coupang Data Science Interview are:
To prepare for Coupang Data Science interviews learn more about their internal AI initiatives from the Coupang company blog. Also, read the book Ace the Data Science Interview because it's got: