At Jingdong Mall (JD.com), SQL is used across the company for analyzing customer purchase patterns and optimizing inventory management based on real-time sales data. That's why JD.com frequently asks SQL questions during interviews for Data Analytics, Data Science, and Data Engineering jobs.
As such, to help you ace the JD.com SQL interview, we've collected 10 Jingdong Mall SQL interview questions – how many can you solve?
JD.com is a huge e-commerce platform that has many products. Each product is reviewed by numerous users. The feedback from these reviews is very crucial to the growth of their product line. As a data analyst, suppose you are asked to provide analysis on JD.com's review data. Write a SQL query to calculate the average star ratings for each product on a monthly basis.
Consider the below table:
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
Your output should have columns: month, product_id, and avg_stars.
month | product_id | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
You can make use of the SQL's to find out the month from the date, and function to calculate the average star rating.
This query aggregates the reviews per product per month and calculates the average star ratings for each of these groupings. The clause is used to order the result set first by month (in ascending order), and then by average star ratings in descending order.
To practice a similar window function question on DataLemur's free online SQL code editor, try this Google SQL Interview Question:
JD.com wants to filter out customer records for their Purchase Records, focusing on customers who made purchases more than three times during the year 2020 and the total value of purchases for each customer is over 10000 RMB.
In the Purchase Records table, every row indicates a purchase made by a customer, with details including CustomerID, PurchaseID, Date (in format YYYY-MM-DD), and Value (of each purchase). Can you help JD.com filter out these specific customer records?
PurchaseID | CustomerID | Date | Value |
---|---|---|---|
1 | 1311 | 2020-05-20 | 3000 |
2 | 4821 | 2020-04-12 | 2000 |
3 | 4541 | 2021-03-30 | 1000 |
4 | 4821 | 2020-09-18 | 5000 |
5 | 4821 | 2020-12-29 | 4000 |
6 | 1311 | 2020-08-21 | 2500 |
7 | 1341 | 2020-09-03 | 6200 |
8 | 1341 | 2020-12-25 | 1500 |
9 | 1311 | 2020-01-21 | 5000 |
10 | 4541 | 2020-02-13 | 4500 |
The output should return the CustomerID, the total amount spent by the customer in 2020, and the total amount of purchases the customer made in 2020.
Example output:
CustomerID | Total Amount | Count of Purchase |
---|---|---|
1311 | 10500 | 3 |
4821 | 11000 | 3 |
This PostgreSQL query uses GROUP BY to group the data by CustomerID and uses SUM and COUNT functions to find out the total value of the purchases and the count of purchases. The HAVING clause checks if a customer made more than three purchases and the total value of the purchases is over 10000 RMB during 2020.
Both and are used to combine the results of two or more SELECT statements into a single result set.
However, only includes one instance of a duplicate, whereas includes duplicates.
Assume that JD.com wants to analyze their digital marketing campaign and look for ways to improve their sales. One method they are using is by studying their click-through conversion rates from viewing a product to adding a product to the cart.
Here's some sample data:
:
view_id | user_id | view_date | product_id |
---|---|---|---|
101 | 255 | 2022-07-01 | 1001 |
102 | 234 | 2022-07-01 | 1002 |
103 | 234 | 2022-07-02 | 1001 |
104 | 970 | 2022-07-02 | 1002 |
105 | 970 | 2022-07-03 | 1001 |
:
cart_id | user_id | cart_date | product_id |
---|---|---|---|
501 | 255 | 2022-07-01 | 1001 |
502 | 234 | 2022-07-01 | 1002 |
We can write a query that calculates the click-through conversion rates for each product based on the views and add to cart events.
In this query, for each product ID, we count the number of add to cart events and divide it by the number of views. We then multiply it by 100 to get the conversion rate as a percentage. The conversion rate is calculated using a left join on the view logs and add to cart logs. This helps us capture the entire universe of views even when they did not translate into an add-to-cart event.
Please note that the conversion rate is being calculated for each product. And since a user may add the product to their cart on the same day as the view event or on any subsequent day, we ensure that we're only counting those cases by adding the condition in the join clause.
Given the sample input data, this query will return the conversion rate for each product. Keep in mind that if the input data had timestamps (not just dates), the query would need modifications to account for that.
To solve a similar problem about calculating rates, try this TikTok SQL Interview Question within DataLemur's online SQL coding environment:
is used to select rows that match a range of values, whereas the operator checks for values in a specified list of values.
For example, say you were a Data Analyst at JD.com and had a table of advertising campaign data.
To find campaigns with between 10k in spend, you could use BETWEEN`:
To find ad campaigns that were run on Facebook and Google's Display Network, you could use :
JD.com wants to conduct marketing campaigns tailored to the users using different email services. For the campaign, they wish to filter out the customer details who are using the email service 'hotmail.com'. Using the SQL keyword , create a query that will return the list of all customers who have registered their email with the domain 'hotmail.com'.
customer_id | name | |
---|---|---|
1001 | John Doe | johndoe@gmail.com |
1002 | Jane Smith | janesmith@hotmail.com |
1003 | Peter Parker | peter_parker@yahoo.com |
1004 | Mary Jane | mary.j@hotmail.com |
1005 | Tony Stark | tony.stark@hotmail.com |
customer_id | name | |
---|---|---|
1002 | Jane Smith | janesmith@hotmail.com |
1004 | Mary Jane | mary.j@hotmail.com |
1005 | Tony Stark | tony.stark@hotmail.com |
This query uses the LIKE keyword to filter the customers table for records where the email field ends with '@hotmail.com'. The '%' symbol is a wildcard that matches any sequence of characters, thus, the condition '%@hotmail.com' matches any email that ends with '@hotmail.com', effectively selecting all customers who use Hotmail as their email service.
"The clause in SQL allows you to select records that are unique, eliminating duplicates.
For a tangible example, say you had a table of JD.com employees:
first_name | job_title |
---|---|
Akash | Data Analyst |
Brittany | Data Scientist |
Carlos | Data Engineer |
Diego | Data Engineer |
Eva | Data Analyst |
If you were doing an HR Analytics project and you wanted to get all the unique job titles that currently worked at the company, you would write the following SQL query:
The output would give you 3 distinct job titles at JD.com:
job_title |
---|
Data Analyst |
Data Scientist |
Data Engineer |
JD.com is interested in knowing which products are most purchased by which customer, and how the customer's reviews are affecting the average rating of these products. They would like to have a user-product table joined with their reviews table.
So, the question is: "Can you write a SQL query to identify the most purchased product by each user and calculate the average star rating of that product by looking at all the reviews?"
purchase_id | user_id | product_id | purchase_date |
---|---|---|---|
1 | 123 | 50001 | 06/01/2022 |
2 | 123 | 50001 | 06/05/2022 |
3 | 265 | 69852 | 06/02/2022 |
4 | 362 | 50001 | 06/15/2022 |
5 | 192 | 69852 | 07/01/2022 |
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
user_id | product_id | avg_stars |
---|---|---|
123 | 50001 | 4.00 |
265 | 69852 | 4.00 |
362 | 50001 | 3.00 |
In this query, first, we create a CTE that aggregates the count of each product bought by a user. Then we select the one which was purchased the most by each user. Next, we calculate the average stars for each product in the table. Finally, we join these two tables using the to get the most purchased product by each user and its average stars.
Because joins come up so often during SQL interviews, try an interactive SQL join question from Spotify:
JD.com is in the process of conducting a sale event where the products in the "books" category having a review score of 4 and above will be sold at a 20% discount. For each product in the "books" category, calculate the discounted price, rounded to the nearest integer. Also, identify if a product is featured in the sale (1 = sale, 0 = non-sale) based on the average review score. Create a column "sale_identity" to reflect this.
product_id | category | price |
---|---|---|
50001 | "books" | 185 |
69852 | "electronics" | 300 |
33421 | "books" | 120 |
85692 | "books" | 200 |
99531 | "books" | 150 |
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 | 50001 | 4 |
7802 | 265 | 06/10/2022 | 69852 | 4 |
8513 | 172 | 06/12/2022 | 33421 | 5 |
5293 | 128 | 06/18/2022 | 50001 | 3 |
6352 | 334 | 07/26/2022 | 85692 | 4 |
4569 | 473 | 08/10/2022 | 99531 | 5 |
9002 | 386 | 08/11/2022 | 50001 | 5 |
product | rounded_price | sale_identity |
---|---|---|
50001 | 148 | 1 |
33421 | 96 | 1 |
85692 | 160 | 0 |
99531 | 120 | 1 |
This query first performs a group by operation on the reviews table to compute the average rating for each product. If the average rating is 4 or above, a discounted rate of 20% is applied. This information is used to compute and fields. The query then joins the products table with this result, based on product_id, to calculate the rounded price and provide the sale identity for each product in the 'books' category.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for calculating averages based on product reviews or this McKinsey 3-Topping Pizzas Question which is similar for discount pricing calculation.
A foreign key is a column or group of columns in a table that refers to the primary key in another table. The foreign key constraint helps maintain referential integrity between the two tables. The table with the foreign key is called the child table, while the table with the candidate key is called the parent or referenced table.
For example, consider a database with two tables: and customer_idcustomer_idcustomer_id` column in JD.com customers table.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the JD.com SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the above JD.com SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Amazon, Microsoft, Meta, and smaller tech companies.
Each problem on DataLemur has hints to guide you, full answers and most importantly, there is an online SQL coding environment so you can right online code up your SQL query answer and have it executed.
To prep for the JD.com SQL interview you can also be wise to practice SQL problems from other tech companies like:
But if your SQL foundations are weak, don't worry about going right into solving questions – refresh your SQL knowledge with this DataLemur SQL tutorial.
This tutorial covers SQL concepts such as using ORDER BY and creating pairs via SELF-JOINs – both of which show up frequently in JD.com interviews.
In addition to SQL interview questions, the other types of questions to prepare for the JD.com Data Science Interview are:
To prepare for JD.com Data Science interviews read the book Ace the Data Science Interview because it's got: