At Alibaba and it's vast amount of subsidiaries, SQL is used for analyzing e-commerce trends, studying customer behavior patterns, and for optimizing the performance of databases used in Alibaba cloud. That's why Alibaba asks SQL coding questions during interviews for Data Science, Data Engineering and Data Analytics jobs. To help you prepare for the Alibaba SQL interview, here's 10 Alibaba SQL interview questions to practice – can you solve them?
You're given a table containing the item count for each order on Alibaba, along with the frequency of orders that have the same item count. Write a query to retrieve the mode of the order occurrences. Additionally, if there are multiple item counts with the same mode, the results should be sorted in ascending order.
Column Name | Type |
---|---|
item_count | integer |
order_occurrences | integer |
item_count | order_occurrences |
---|---|
1 | 500 |
2 | 1000 |
3 | 800 |
mode |
---|
2 |
Based on the example output, the value of 1000 corresponds to the highest frequency among all item counts. This means that item count of 2 has occurred 1000 times, making it the mode of order occurrences.
To practice this question on our interactive coding platform AND see an alternate answer, try this Alibaba Labs SQL Interview Question:
As a data analyst in Alibaba, your manager is interested in the performance of products based on customer reviews. You are tasked to analyze the reviews data; For each product_id, calculate the monthly average star rating.
The reviews table schema is provided below:
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
8401 | 341 | 2022-06-14 | 99741 | 4 |
5782 | 225 | 2022-06-24 | 93322 | 5 |
4943 | 378 | 2022-07-10 | 99741 | 3 |
6722 | 151 | 2022-07-22 | 93322 | 5 |
4017 | 980 | 2022-07-25 | 93322 | 2 |
The dataset provides the review_id(unique identifier for each review), user_id(identifier for each user), submit_date(the date when the review was submitted), product_id(identifier for each product sold by Alibaba), stars(rating given by user, the higher the better).
Your output should return the month (mth), the product_id, and their monthly average star rating(rounded to two decimal places) for each product id. Reviews without any ratings should be excluded.
mth | product_id | avg_stars |
---|---|---|
6 | 99741 | 4.00 |
6 | 93322 | 5.00 |
7 | 93322 | 3.50 |
7 | 99741 | 3.00 |
In this query, we first extract the month from the submit_date by using PostgreSQL's EXTRACT function. Then, we calculate the average rating for each product using AVG function. This function automatically excludes all NULL values - those reviews without any ratings. We round it to two decimal places for better readability. Grouping by mth and product_id ensures we calculate monthly averages for each product separately. Finally, we order results by product_id and mth for easier interpretation.
For more window function practice, solve this Uber SQL problem within DataLemur's online SQL coding environment:
You are trying to find the median number of items bought per order on Alibaba, rounded to 1 decimal point.
However, instead of doing analytics on all Alibaba orders, you have access to a summary table, which describes how many items were in an order, and the number of orders that had that many items.
Column Name | Type |
---|---|
item_count | integer |
order_occurrences | integer |
item_count | order_occurrences |
---|---|
1 | 500 |
2 | 1000 |
3 | 800 |
4 | 1000 |
median |
---|
3.0 |
To solve this question go to DataLemur interactive SQL coding environment.
As an inventory manager for Alibaba, it's essential to keep track of the stock information for the various products sold on the platform. Your task is to design a database that helps keep track of product information and inventory levels. You need to create a PostgreSQL query that returns the product's name, category, and its current stock level.
In your database design, you decide to have two tables - and . The table stores information about each product like its , , and . The table, on the other hand, stores current stock level for each product in the column.
Assume that the product_id is unique for each product and is used as a foreign key in the table.
product_id | product_name | product_category |
---|---|---|
1 | Basketball | Sports |
2 | Laptop | Electronics |
3 | Winter Jacket | Clothing |
product_id | stock |
---|---|
1 | 100 |
2 | 50 |
3 | 200 |
To fetch the current stock levels for all products, you'd join the table with the table on the .
product_name | product_category | stock |
---|---|---|
Basketball | Sports | 100 |
Laptop | Electronics | 50 |
Winter Jacket | Clothing | 200 |
The query returns the name, category, and current stock level for each product listed in the database. The JOIN ensures that even if a product doesn't have a corresponding record in the table (perhaps it's a new product that hasn't had its inventory set yet), it will still be included in the results, and its will just show as NULL. undefined
The COALESCE() function returns the first non-NULL value from a list of values. This function is often used to replace a NULL with some default value, so that you can then take a or of some column without NULLs messing things up.
For example, suppose you ran a customer satisfaction survey for Alibaba and had statements like "I'd buy from Alibaba again". In the survey, customers would then answer how strongly they agreed with a statement on a scale of 1 to 5 (strongly disagree, disagree, neutral, agreee, strongly agree).
Because attention spans are short, many customers skipped many of the questions, and thus our survey data might be filled with NULLs:
customer_id | question_id | agree_scale |
---|---|---|
101 | 1 | 4 |
101 | 2 | 5 |
202 | 1 | 4 |
202 | 2 | NULL |
303 | 1 | 5 |
303 | 2 | NULL |
Before doing further analytics on this customer survey data, you could replace the NULLs in the column with the value of (because that corresponds to the default 'neutral' answer) using the function:
This would result in the following:
customer_id | question_id | agree_scale |
---|---|---|
101 | 1 | 4 |
101 | 2 | 5 |
202 | 1 | 4 |
202 | 2 | 3 |
303 | 1 | 5 |
303 | 2 | 3 |
As part of the Alibaba database management team, your task is to write a SQL query that filters out customer orders data based on the following conditions:
You have two tables at your disposal: and .
order_id | customer_id | product_id | submission_date | product_price |
---|---|---|---|---|
7854 | 354 | 5051 | 08/10/2022 | 350 |
5698 | 152 | 8024 | 06/15/2022 | 150 |
8452 | 493 | 7219 | 07/22/2022 | 500 |
7526 | 378 | 6862 | 05/12/2022 | 899 |
8421 | 367 | 5051 | 09/25/2022 | 199 |
customer_id | first_name | last_name | country |
---|---|---|---|
354 | Li | Fang | China |
152 | Mohammed | Al-Salem | Saudi Arabia |
493 | Xin | Ri | China |
378 | Maria | Rodriguez | Spain |
367 | Anne | Taylor | USA |
This query joins the and tables based on the field. Then, it filters out the results based on the conditions specified above. As a result, the output will only include the orders that meet all the specified conditions. undefined
An inner join only includes rows from both tables that satisfy the join condition, whereas a full outer join includes all rows from both tables, regardless of whether they match the join condition or not.
For example, suppose you had a table of Alibaba orders and Alibaba customers.
INNER JOIN: When there is a match in the shared key or keys, rows from both tables are retrieved. In this example, an between the Orders and Customers tables would retrieve rows where the in the Orders table matches the in the Customers table.
FULL OUTER JOIN: A retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
Given data of user behavior events generated on Alibaba's platform, we wish to calculate the click-through-conversion rate for their digital products. Specifically, we want to know the percentage of users who viewed a product, started to add it to their cart, and successfully added it to their cart in the month of July 2022.
For this question, we have two tables. The table, which tracks each action on the site, and the table, which lists product details.
event_id | user_id | event | event_time | product_id |
---|---|---|---|---|
101 | 456 | view_product | 07/12/2022 10:35:00 | 1001 |
102 | 123 | add_to_cart | 07/12/2022 10:36:00 | 1001 |
103 | 123 | add_to_cart_success | 07/12/2022 10:37:00 | 1001 |
104 | 789 | view_product | 07/15/2022 11:40:00 | 1002 |
105 | 789 | add_to_cart | 07/15/2022 11:41:00 | 1002 |
product_id | product_name |
---|---|
1001 | Digital Book |
1002 | Digital Music |
Given the problem, we'll need to calculate the click-through-conversion rate. We can do this with a SQL query:
This query first constructs a temporary table of the count of each step of the conversion funnel for each product only for the month of July 2022. After this, we calculate the clickthrough rates for each product by dividing the count of a certain event type by the count of the prior step in the conversion funnel. Lastly, we multiply by 100 to convert the ratios into percentages. If the denominator is 0 this would lead to division by 0 errors so is used to avoid this by converting it to .
This would help Alibaba to comprehensively analyze the conversion rates of their digital product funnels and then take necessary actions to improve the rates.
To solve another question about calculating rates, solve this SQL interview question from TikTok within DataLemur's interactive coding environment:
For Alibaba, an e-commerce company, management may want to track their sales performance. Specifically, they may want to find out the average monthly sales for each product.
Please consider the following and tables:
sale_id | sale_date | product_id | quantity_sold |
---|---|---|---|
1 | 2022-06-01 | 200 | 50 |
2 | 2022-06-15 | 300 | 60 |
3 | 2022-06-30 | 200 | 90 |
4 | 2022-07-15 | 400 | 20 |
5 | 2022-07-25 | 300 | 80 |
product_id | product_name |
---|---|
200 | iPhone |
300 | Samsung Galaxy |
400 | Huawei |
The question is to write a SQL query that shows the average monthly sales for each product.
This query first associates each sale with the product name using data from the and table. It then groups the data by month and product, and calculates the monthly average quantity sold for each product. The function is used to extract the month from the sale date. The result is ordered by month and then by the average sales in descending order to easily identify the best-selling products each month.
For example, for June 2022, we get the average sales for iPhone and Samsung Galaxy separately. Then for July 2022, we get the average sales for Huawei and Samsung Galaxy separately. This information can help Alibaba make informed decisions about inventory and marketing for each product. undefined
The clause is used to remove all duplicate records from a query.
For example, if you had a table of open jobs Alibaba was hiring for, and wanted to see what are all the unique job titles that were currently available at the company, you could write the following query:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Beyond just solving the above Alibaba SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Microsoft, Google, and Facebook.
Each interview question has hints to guide you, step-by-step solutions and most importantly, there is an interactive SQL code editor so you can instantly run your SQL query answer and have it checked.
To prep for the Alibaba SQL interview you can also be helpful to solve interview questions from other tech companies like:
However, if your SQL query skills are weak, forget about going right into solving questions – improve your SQL foundations with this interactive SQL tutorial.
This tutorial covers SQL topics like CASE/WHEN statements and handling missing data (NULLs) – both of these pop up routinely during Alibaba interviews.
For the Alibaba Data Science Interview, beyond writing SQL queries, the other types of questions which are covered:
The best way to prepare for Alibaba Data Science interviews is by reading Ace the Data Science Interview. The book's got: