At Mercari, SQL is used often for extracting and analyzing sales data for market trends, and cleaning and managing product listing data for accurate recommendations. For this reason Mercari asks SQL coding questions during interviews for Data Science, Analytics, and & Data Engineering jobs.
Thus, to help you practice for the Mercari SQL interview, we've collected 8 Mercari SQL interview questions – able to answer them all?
As a data analyst for Mercari, you are required to analyze the user activity on the platform. Power users or VIP uses are individuals who often carry out crucial activities for the business, such as posting listings, making purchases, and providing reviews. Consequently, your task is to identify users who have listed more than 50 items and made more than 20 purchases within the last year.
For this SQL question, you are given two tables: listings
and purchases
.
listings
listing_id | user_id | listing_date | product_id |
---|---|---|---|
6171 | 123 | 06/08/2021 | 50001 |
7802 | 123 | 06/10/2021 | 69852 |
5293 | 123 | 06/18/2021 | 50001 |
6352 | 192 | 07/26/2021 | 69852 |
4517 | 981 | 07/05/2021 | 69852 |
purchases
purchase_id | user_id | purchase_date | product_id |
---|---|---|---|
8910 | 123 | 06/18/2021 | 50001 |
4732 | 192 | 07/26/2021 | 69852 |
2365 | 981 | 07/05/2021 | 69852 |
7845 | 123 | 06/10/2021 | 69852 |
6932 | 123 | 06/08/2021 | 50001 |
Note: Here user_id
is a unique identifier associated with each user. listing_date
and purchase_date
are the dates when a user listed a product and made a purchase, respectively.
WITH listings_count AS ( SELECT user_id, COUNT(*) AS num_listings FROM listings WHERE listing_date >= NOW() - INTERVAL '1 YEAR' GROUP BY user_id ), purchases_count AS ( SELECT user_id, COUNT(*) AS num_purchases FROM purchases WHERE purchase_date >= NOW() - INTERVAL '1 YEAR' GROUP BY user_id ) SELECT l.user_id FROM listings_count l JOIN purchases_count p ON l.user_id = p.user_id WHERE l.num_listings > 50 AND p.num_purchases > 20;
This query first creates two subqueries: one to count the number of products listed by each user in the past year (listings_count
), and another to count the number of products purchased by each user in the past year (purchases_count
). The main query then joins these two subqueries on user_id
and filters for users who have listed over 50 items and purchased over 20 items in the past year.
To solve a related super-user data analysis question on DataLemur's free online SQL code editor, try this recently asked Microsoft SQL interview question:
You have been assigned the task of creating an analysis report for Mercari, the largest community-powered shopping app in Japan. Your boss wants to see a monthly ranking of the products based on total revenue, and how the revenue changes compared with the previous month.
You are given the table "sales", including the columns "sale_id"(unique identifier for the sales), "product_id"(id of the product sold), "user_id"(id of the user who purchased the product), "sale_date"(the date when the sale took place), and "revenue"(the revenue generated from the sale).
sales
Example Input:sale_id | user_id | sale_date | product_id | revenue |
---|---|---|---|---|
1001 | 321 | 06/01/2022 | 8810 | 2500 |
1002 | 654 | 06/10/2022 | 8810 | 3200 |
1003 | 987 | 06/15/2022 | 3910 | 1000 |
1004 | 321 | 07/01/2022 | 8810 | 2700 |
1005 | 444 | 07/10/2022 | 3910 | 1200 |
year_month | product_id | revenue | prev_month_revenue | revenue_change |
---|---|---|---|---|
2022-06 | 8810 | 5700 | null | null |
2022-06 | 3910 | 1000 | null | null |
2022-07 | 8810 | 2700 | 5700 | -3000 |
2022-07 | 3910 | 1200 | 1000 | 200 |
WITH monthly_revenue AS ( SELECT DATE_TRUNC('MONTH', sale_date) AS year_month, product_id, SUM(revenue) AS revenue FROM sales GROUP BY 1,2 ), monthly_revenue_with_lag AS ( SELECT *, LAG(revenue) OVER(PARTITION BY product_id ORDER BY year_month) AS prev_month_revenue FROM monthly_revenue ) SELECT year_month, product_id, revenue, prev_month_revenue, COALESCE(revenue - prev_month_revenue, 'null') AS revenue_change FROM monthly_revenue_with_lag;
This query first calculates the monthly total revenue for each product in a CTE (Common Table Expression), then it calculates the revenue of the previous month for each product with the LAG window function. In the final SELECT statement, it calculates the change in revenue from the previous month and returns the result. The COALESCE function is used to deal with the null value for the first month's data, replacing it with 'null'.
Pro Tip: Window functions are a popular SQL interview topic, so practice every window function problem on DataLemur
No, in 99% of normal cases a UNION ALL
and FULL OUTER JOIN
do NOT produce the same result.
You can think of UNION ALL
more as set addition when it combines the results of two tables.
You can think of a FULL OUTER JOIN
more as set multiplication, producing all combos of table 1 with combos of table 2.
You are working as a data scientist at Mercari, where you are tasked with analyzing data related to user transactions. The user transaction table includes details such as User ID, Product ID, Transaction time and Transaction price.
Your task is to find out the total transaction price for each user in the month of June.
user_transactions
Example Inputtransaction_id | user_id | transaction_time | product_id | transaction_price |
---|---|---|---|---|
111 | 123 | 06/01/2022 00:00:00 | 50001 | 50 |
112 | 265 | 06/05/2022 00:00:00 | 69852 | 100 |
113 | 123 | 06/10/2022 00:00:00 | 50001 | 75 |
114 | 192 | 07/20/2022 00:00:00 | 69852 | 150 |
115 | 981 | 07/05/2022 00:00:00 | 69852 | 200 |
SELECT user_id, EXTRACT(MONTH FROM transaction_time) as month, SUM(transaction_price) as total_price FROM user_transactions WHERE EXTRACT(MONTH FROM transaction_time) = 6 GROUP BY user_id, month;
This query works by extracting the month from the transaction time and only considering the records where the month of the transaction is June. It groups by the user_id and sums up the transaction_price to get the total transaction price for each user for June.
Expected output, given the above table as input:
user_id | month | total_price |
---|---|---|
123 | 6 | 125 |
265 | 6 | 100 |
Then the question assesses the candidate's ability to extract meaningful information from transaction data, a common task for data scientists working in e-commerce companies. It also tests the candidate's knowledge of SQL functions such as EXTRACT()
, SUM()
, GROUP BY
and attention to date-based queries.
A non-relational (NoSQL) database is any database that does not use the typical tabular format of rows and columns like in relational databases.
While knowing the four different types of NoSQL databases is beyond the scope of most Data Analyst & Data Science interviews, Data Engineers at Mercari should vaguely refresh these concepts:
Given the Mercari customer database, write a SQL query to filter out customers who made purchases in the last 6 months and spent over 1000 USD in total. Also, the returned results should include only those customers who have given an average review rating of at least 4 for their purchased products.
customers
Example Input:customer_id | first_name | last_name | join_date | |
---|---|---|---|---|
3221 | John | Doe | john.doe@email.com | 06/13/2020 |
9783 | Jane | Smith | jane.smith@email.com | 02/07/2019 |
4532 | Emily | Johnson | emily.johnson@email.com | 11/21/2021 |
purchases
Example Input:purchase_id | customer_id | product_id | purchase_date | amount_spent |
---|---|---|---|---|
1101 | 3221 | 58901 | 04/12/2022 | 200.00 |
1872 | 9783 | 37842 | 05/16/2022 | 800.00 |
2134 | 3221 | 40127 | 06/05/2022 | 890.00 |
reviews
Example Input:review_id | customer_id | product_id | review_date | stars |
---|---|---|---|---|
4132 | 3221 | 58901 | 04/20/2022 | 5 |
7890 | 3221 | 40127 | 06/12/2022 | 4 |
5678 | 9783 | 37842 | 05/23/2022 | 3 |
SELECT C.customer_id, C.first_name, C.last_name, C.email FROM customers C WHERE C.customer_id IN ( SELECT P.customer_id FROM purchases P WHERE P.purchase_date BETWEEN NOW() - INTERVAL '6 months' AND NOW() GROUP BY P.customer_id HAVING SUM(P.amount_spent) > 1000 ) AND C.customer_id IN ( SELECT R.customer_id FROM reviews R GROUP BY R.customer_id HAVING AVG(R.stars) >= 4 );
This query first filters customers from the purchases
table who made a purchase within the last 6 months and spent more than 1000 USD in total. Then from this filtered list, it further filters customers whose average review rating is at least 4. Finally, these filtered customers are returned from the customers
table.
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1st, 2nd, 3rd normal forms).
Denormalization is done to improve the performance of the database when it is being used for online analytics processing (OLAP), rather than online transaction processing (OLTP) use cases.
For these OLAP use cases, you're bottleneck frequently is joining multiple tables, but de-normalizing your database cuts these outs.
Denormalization can improve query performance by reducing the number of expensive joins required to retrieve data, but it comes with its own drawbacks too. By adding redundant columns, you incur more data storage costs, and there's need for more complex update and delete operations in order to maintain data integrity across all the duplicated data. Thus, it's important to carefully consider the trade-offs involved before implementing denormalization.
Assume we're working with a platform like Mercari which is an online marketplace for buying and selling items. You want to answer the following question: What was the average price of the products sold each month?
sales
Example Input:sale_id | product_id | sell_date | price |
---|---|---|---|
101 | 55 | 06/01/2022 | 700 |
202 | 75 | 06/02/2022 | 150 |
303 | 67 | 07/20/2022 | 950 |
404 | 89 | 07/25/2022 | 500 |
505 | 45 | 08/03/2022 | 1200 |
month | avg_price |
---|---|
6 | 425 |
7 | 725 |
8 | 1200 |
SELECT EXTRACT(MONTH FROM sell_date) AS month, AVG(price) AS avg_price FROM sales GROUP BY 1 ORDER BY 1;
This query first extracts the month from the sell_date
using the EXTRACT
function. It then groups by the extracted month to calculate the average price
for each group which is each month. The ORDER BY
is used to sort the results by the month.
The best way to prepare for a Mercari SQL interview is to practice, practice, practice.
Beyond just solving the earlier Mercari SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Google, Facebook, Microsoft and Amazon.
Each problem on DataLemur has hints to guide you, step-by-step solutions and best of all, there's an online SQL code editor so you can instantly run your query and have it executed.
To prep for the Mercari SQL interview you can also be wise to practice SQL problems from other tech companies like:
But if your SQL foundations are weak, forget about going right into solving questions – improve your SQL foundations with this DataLemur SQL tutorial.
This tutorial covers things like handling timestamps and rank window functions – both of these come up frequently in Mercari interviews.
In addition to SQL query questions, the other topics to prepare for the Mercari Data Science Interview are:
To prepare for Mercari Data Science interviews read the book Ace the Data Science Interview because it's got: