At Mercari, SQL is used often for extracting and analyzing sales data for market trends, and cleaning and managing product listing data for accurate recommendations. For this reason Mercari asks SQL coding questions during interviews for Data Science, Analytics, and & Data Engineering jobs.
Thus, to help you practice for the Mercari SQL interview, we've collected 8 Mercari SQL interview questions – able to answer them all?
As a data analyst for Mercari, you are required to analyze the user activity on the platform. Power users or VIP uses are individuals who often carry out crucial activities for the business, such as posting listings, making purchases, and providing reviews. Consequently, your task is to identify users who have listed more than 50 items and made more than 20 purchases within the last year.
For this SQL question, you are given two tables: and .
listing_id | user_id | listing_date | product_id |
---|---|---|---|
6171 | 123 | 06/08/2021 | 50001 |
7802 | 123 | 06/10/2021 | 69852 |
5293 | 123 | 06/18/2021 | 50001 |
6352 | 192 | 07/26/2021 | 69852 |
4517 | 981 | 07/05/2021 | 69852 |
purchase_id | user_id | purchase_date | product_id |
---|---|---|---|
8910 | 123 | 06/18/2021 | 50001 |
4732 | 192 | 07/26/2021 | 69852 |
2365 | 981 | 07/05/2021 | 69852 |
7845 | 123 | 06/10/2021 | 69852 |
6932 | 123 | 06/08/2021 | 50001 |
Note: Here is a unique identifier associated with each user. and are the dates when a user listed a product and made a purchase, respectively.
This query first creates two subqueries: one to count the number of products listed by each user in the past year (), and another to count the number of products purchased by each user in the past year (). The main query then joins these two subqueries on and filters for users who have listed over 50 items and purchased over 20 items in the past year.
To solve a related super-user data analysis question on DataLemur's free online SQL code editor, try this recently asked Microsoft SQL interview question:
You have been assigned the task of creating an analysis report for Mercari, the largest community-powered shopping app in Japan. Your boss wants to see a monthly ranking of the products based on total revenue, and how the revenue changes compared with the previous month.
You are given the table "sales", including the columns "sale_id"(unique identifier for the sales), "product_id"(id of the product sold), "user_id"(id of the user who purchased the product), "sale_date"(the date when the sale took place), and "revenue"(the revenue generated from the sale).
sale_id | user_id | sale_date | product_id | revenue |
---|---|---|---|---|
1001 | 321 | 06/01/2022 | 8810 | 2500 |
1002 | 654 | 06/10/2022 | 8810 | 3200 |
1003 | 987 | 06/15/2022 | 3910 | 1000 |
1004 | 321 | 07/01/2022 | 8810 | 2700 |
1005 | 444 | 07/10/2022 | 3910 | 1200 |
year_month | product_id | revenue | prev_month_revenue | revenue_change |
---|---|---|---|---|
2022-06 | 8810 | 5700 | null | null |
2022-06 | 3910 | 1000 | null | null |
2022-07 | 8810 | 2700 | 5700 | -3000 |
2022-07 | 3910 | 1200 | 1000 | 200 |
This query first calculates the monthly total revenue for each product in a CTE (Common Table Expression), then it calculates the revenue of the previous month for each product with the LAG window function. In the final SELECT statement, it calculates the change in revenue from the previous month and returns the result. The COALESCE function is used to deal with the null value for the first month's data, replacing it with 'null'.
Pro Tip: Window functions are a popular SQL interview topic, so practice every window function problem on DataLemur
No, in 99% of normal cases a and do NOT produce the same result.
You can think of more as set addition when it combines the results of two tables.
You can think of a more as set multiplication, producing all combos of table 1 with combos of table 2.
You are working as a data scientist at Mercari, where you are tasked with analyzing data related to user transactions. The user transaction table includes details such as User ID, Product ID, Transaction time and Transaction price.
Your task is to find out the total transaction price for each user in the month of June.
transaction_id | user_id | transaction_time | product_id | transaction_price |
---|---|---|---|---|
111 | 123 | 06/01/2022 00:00:00 | 50001 | 50 |
112 | 265 | 06/05/2022 00:00:00 | 69852 | 100 |
113 | 123 | 06/10/2022 00:00:00 | 50001 | 75 |
114 | 192 | 07/20/2022 00:00:00 | 69852 | 150 |
115 | 981 | 07/05/2022 00:00:00 | 69852 | 200 |
This query works by extracting the month from the transaction time and only considering the records where the month of the transaction is June. It groups by the user_id and sums up the transaction_price to get the total transaction price for each user for June.
Expected output, given the above table as input:
user_id | month | total_price |
---|---|---|
123 | 6 | 125 |
265 | 6 | 100 |
Then the question assesses the candidate's ability to extract meaningful information from transaction data, a common task for data scientists working in e-commerce companies. It also tests the candidate's knowledge of SQL functions such as , , and attention to date-based queries.
A non-relational (NoSQL) database is any database that does not use the typical tabular format of rows and columns like in relational databases.
While knowing the four different types of NoSQL databases is beyond the scope of most Data Analyst & Data Science interviews, Data Engineers at Mercari should vaguely refresh these concepts:
Given the Mercari customer database, write a SQL query to filter out customers who made purchases in the last 6 months and spent over 1000 USD in total. Also, the returned results should include only those customers who have given an average review rating of at least 4 for their purchased products.
customer_id | first_name | last_name | join_date | |
---|---|---|---|---|
3221 | John | Doe | john.doe@email.com | 06/13/2020 |
9783 | Jane | Smith | jane.smith@email.com | 02/07/2019 |
4532 | Emily | Johnson | emily.johnson@email.com | 11/21/2021 |
purchase_id | customer_id | product_id | purchase_date | amount_spent |
---|---|---|---|---|
1101 | 3221 | 58901 | 04/12/2022 | 200.00 |
1872 | 9783 | 37842 | 05/16/2022 | 800.00 |
2134 | 3221 | 40127 | 06/05/2022 | 890.00 |
review_id | customer_id | product_id | review_date | stars |
---|---|---|---|---|
4132 | 3221 | 58901 | 04/20/2022 | 5 |
7890 | 3221 | 40127 | 06/12/2022 | 4 |
5678 | 9783 | 37842 | 05/23/2022 | 3 |
This query first filters customers from the table who made a purchase within the last 6 months and spent more than 1000 USD in total. Then from this filtered list, it further filters customers whose average review rating is at least 4. Finally, these filtered customers are returned from the table.
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1st, 2nd, 3rd normal forms).
Denormalization is done to improve the performance of the database when it is being used for online analytics processing (OLAP), rather than online transaction processing (OLTP) use cases.
For these OLAP use cases, you're bottleneck frequently is joining multiple tables, but de-normalizing your database cuts these outs.
Denormalization can improve query performance by reducing the number of expensive joins required to retrieve data, but it comes with its own drawbacks too. By adding redundant columns, you incur more data storage costs, and there's need for more complex update and delete operations in order to maintain data integrity across all the duplicated data. Thus, it's important to carefully consider the trade-offs involved before implementing denormalization.
Assume we're working with a platform like Mercari which is an online marketplace for buying and selling items. You want to answer the following question: What was the average price of the products sold each month?
sale_id | product_id | sell_date | price |
---|---|---|---|
101 | 55 | 06/01/2022 | 700 |
202 | 75 | 06/02/2022 | 150 |
303 | 67 | 07/20/2022 | 950 |
404 | 89 | 07/25/2022 | 500 |
505 | 45 | 08/03/2022 | 1200 |
month | avg_price |
---|---|
6 | 425 |
7 | 725 |
8 | 1200 |
This query first extracts the month from the using the function. It then groups by the extracted month to calculate the average for each group which is each month. The is used to sort the results by the month.
The best way to prepare for a Mercari SQL interview is to practice, practice, practice. Beyond just solving the earlier Mercari SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Google, Facebook, Microsoft and Amazon.
Each problem on DataLemur has hints to guide you, step-by-step solutions and best of all, there's an online SQL code editor so you can instantly run your query and have it executed.
To prep for the Mercari SQL interview you can also be wise to practice SQL problems from other tech companies like:
But if your SQL foundations are weak, forget about going right into solving questions – improve your SQL foundations with this DataLemur SQL tutorial.
This tutorial covers things like handling timestamps and rank window functions – both of these come up frequently in Mercari interviews.
In addition to SQL query questions, the other topics to prepare for the Mercari Data Science Interview are:
To prepare for Mercari Data Science interviews read the book Ace the Data Science Interview because it's got: