At fashion retailer Zalando, SQL is typically used for analyzing customer shopping patterns to optimize product recommendations. Unsurprisingly this is why Zalando frequently asks SQL query questions in interviews for Data Science, Analytics, and & Data Engineering jobs.
Thus, to help you ace the Zalando SQL interview, we've collected 8 Zalando SQL interview questions – able to solve them?
At Zalando, the reviews of the products are crucial to understand customer satisfaction and product quality over time. For the given dataset, write a SQL query to calculate the monthly average rating for each product. You need to return a table with , , and .
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
1 | 123 | 2022-06-08 | 001 | 4 |
2 | 265 | 2022-07-08 | 002 | 5 |
3 | 987 | 2022-06-15 | 001 | 3 |
4 | 453 | 2022-07-18 | 001 | 5 |
5 | 234 | 2022-07-20 | 002 | 3 |
mth | product_id | avg_stars |
---|---|---|
6 | 001 | 3.5 |
7 | 001 | 5.0 |
7 | 002 | 4.0 |
To solve this problem, we can benefit from PostgreSQL window functions, particularly the function and function to get the month of the review submission date.
The clause in the query specifies how rows are divided into groups for calculating the average. So by grouping by 'product_id' and the extract month, we get the average stars for each product per month. The order by clause then sorts the output by 'product_id' and 'mth'. Please note that 'stars' has to be a numeric field for function to work.
To solve another window function question on DataLemur's free online SQL coding environment, solve this Google SQL Interview Question:
As a data analyst at Zalando, you are given a task to examine and analyze the inventory management system of the company. You have two database tables:
Your task is to write a SQL query that returns the product id, the number of item units sold, units available, and the units returned for each product.
product_id | product_name | price |
---|---|---|
105 | Leather Jacket | 299.99 |
205 | Ankle Boots | 179.99 |
305 | Trench Coat | 499.99 |
inventory_id | product_id | status | add_date |
---|---|---|---|
1101 | 105 | Sold | 2020-01-25 00:00:00 |
1102 | 105 | Returned | 2020-01-26 00:00:00 |
1103 | 205 | Available | 2021-02-21 00:00:00 |
1104 | 205 | Sold | 2021-03-11 00:00:00 |
1105 | 305 | Available | 2021-04-01 00:00:00 |
In this SQL query, we start by joining the and tables on the column. Next, we use the SQL statement within the aggregate function to count the number of units sold, available and returned by product id.
Here's some strategies that can generally speed up a slow SQL query:
While there's an entire industry dedicated to database performance tuning , for Data Analyst and Data Science interviews at Zalando, knowing the above basics should be more than enough to ace the SQL interview! Data Engineers on the other hand should know more about the command and understand a bit more about a query-execution plan.
Zalando is a popular European e-commerce company, specializing in fashion and lifestyle products. For a data analyst role in this company, you might be asked to find the average number of items included in each order on the platform. Let's consider the following data tables:
order_id | user_id | order_date |
---|---|---|
5432 | 1001 | 07/01/2022 |
2468 | 1050 | 07/02/2022 |
1010 | 1102 | 07/02/2022 |
5050 | 1001 | 07/03/2022 |
3592 | 1005 | 07/04/2022 |
item_id | order_id | product_id | quantity |
---|---|---|---|
501 | 5432 | 10501 | 1 |
502 | 5432 | 30702 | 2 |
503 | 2468 | 20550 | 1 |
504 | 1010 | 30015 | 1 |
505 | 1010 | 10055 | 2 |
506 | 5050 | 20002 | 1 |
507 | 3592 | 15022 | 3 |
508 | 3592 | 10021 | 1 |
With the above two tables available, write a SQL query to find the average number of items per order.
This query first counts the number of items in each order using a nested subquery. The outer query then calculates the average of these counts. The result will provide the average number of items included in each order.
A correlated sub-query is one that relies on the outer query and cannot be run independently. It uses the outer query to filter or transform data by referencing a column from it, while the outer query uses the results of the inner query. On the other hand, a non-correlated sub-query is not linked to the outer query and can be run on its own. It does not reference any columns from the outer query and is used to retrieve additional data for the outer query.
An example correlated sub-query:
This correlated subquery retrieves the names and salaries of Zalando employees who make more than the average salary for their department. The subquery references the department column in the main query's FROM clause (e1.department) and uses it to filter the rows of the subquery's FROM clause (e2.department).
An example non-correlated sub-query:
This non-correlated subquery retrieves the names and salaries of Zalando employees who make more than the average salary for the Data Analytics department (which honestly should be very few people since Data Analysts are awesome).
Anways, the subquery is independent of the main query and can be considered a standalone query. Its output (the average salary for the Data Analytics department) is then used in the main query to filter the rows of the table.
As an ecommerce company, Zalando sells various types of products like clothing, shoes, accessories, beauty products etc. The following task requires you to find the product category which has generated the highest sales for each month.
product_id | category | price |
---|---|---|
1001 | Shoes | 75 |
1002 | Clothes | 30 |
1003 | Accessories | 15 |
1004 | Beauty | 20 |
1005 | Clothes | 50 |
sale_id | product_id | quantity | sale_date |
---|---|---|---|
1 | 1001 | 10 | 06/01/2022 00:00:00 |
2 | 1002 | 8 | 06/02/2022 00:00:00 |
3 | 1003 | 15 | 06/03/2022 00:00:00 |
4 | 1004 | 7 | 07/01/2022 00:00:00 |
5 | 1005 | 12 | 07/02/2022 00:00:00 |
month | category | total_sales |
---|---|---|
6 | Shoes | 750 |
6 | Clothes | 240 |
6 | Accessories | 225 |
7 | Beauty | 140 |
7 | Clothes | 600 |
First, we do a join on the 'sales' and 'products' tables using the 'product_id' column, and then we group the records by the month of sale date and product category. The SUM aggregate function is used to calculate the total sales for each category in each month. The order of output is based on the month and total sales in descending order.
Clustered indexes have a special characteristic in that the order of the rows in the database corresponds to the order of the rows in the index. This is why a table can only have one clustered index, but it can have multiple non-clustered indexes.
The main difference between clustered and non-clustered indexes is that the database tries to maintain the order of the data in the database to match the order of the corresponding keys in the clustered index. This can improve query performance as it provides a linear-access path to the data stored in the database.
Zalando needs to analyze their sales by month, to better understand their sales trends over time. Write a query to calculate the total sales made and the number of orders for each product on a monthly basis, from the beginning of 2022.
We have a table with the following schema:
sales_id | product_id | sale_date | sale_price |
---|---|---|---|
101 | ZAL001 | 2022-01-15 | 100.00 |
102 | ZAL002 | 2022-01-30 | 150.00 |
103 | ZAL001 | 2022-02-10 | 100.00 |
104 | ZAL003 | 2022-02-25 | 200.00 |
105 | ZAL002 | 2022-02-28 | 150.00 |
The output will contain the , of the sale (in format), total for the product in the month, and the for the product in the month.
product_id | month | total_sales | order_count |
---|---|---|---|
ZAL001 | 202201 | 100.00 | 1 |
ZAL002 | 202201 | 150.00 | 1 |
ZAL001 | 202202 | 100.00 | 1 |
ZAL002 | 202202 | 150.00 | 1 |
ZAL003 | 202202 | 200.00 | 1 |
Here is the SQL query block:
In this answer, function is used to convert the dates to a string representation in the format . Then and functions are used to calculate total sales and order count respectively for each product per month. We consider only the sales that were made in the year 2022 as signified by the clause. The and clauses are used to group and sort the results product-wise and then month-wise.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the above Zalando SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like FAANG and tech startups.
Each SQL question has hints to guide you, step-by-step solutions and best of all, there is an interactive coding environment so you can right in the browser run your query and have it executed.
To prep for the Zalando SQL interview it is also useful to solve SQL problems from other tech companies like:
However, if your SQL coding skills are weak, forget about diving straight into solving questions – go learn SQL with this interactive SQL tutorial.
This tutorial covers SQL concepts such as WHERE vs. HAVING and handling date/timestamp data – both of these pop up routinely during SQL interviews at Zalando.
In addition to SQL interview questions, the other topics to practice for the Zalando Data Science Interview are:
To prepare for Zalando Data Science interviews read the book Ace the Data Science Interview because it's got: