At Delivery Hero, SQL is used quite frequently for analyzing customer order patterns and underlies their Data Mesh Platform. That's why Delivery Hero almost always evaluates jobseekers on SQL coding questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.
So, to help you ace the Delivery Hero SQL interview, this blog covers 9 Delivery Hero SQL interview questions – able to answer them all?
Delivery Hero is a leading global online food delivery platform. Assume you have a 'orders' table which tracks every order made by customers. The higher the number of orders from a customer, the more valuable the customer is to the business.
Your task here is to write a SQL query that can identify the top 5 customers who made the highest number of orders in the last 30 days.
Consider the below as the full data in the 'orders' table:
This PostgreSQL query works as follows - It filters out orders which have been made in the last 30 days using the WHERE clause. It then groups these orders by customer_id and counts the number of orders made by each customer. It finally sorts these customers in descending order of order_count to get the customers with the highest number of orders and limits the output to the top five customers.
To work on a similar customer analytics SQL question where you can solve it interactively and have your SQL code instantly graded, try this Walmart SQL Interview Question:
Delivery Hero would obviously be interested in tracking delivery times for their orders. As an interviewee, you might be asked to write a SQL query to analyze the average delivery time on a monthly basis using a window function.
Consider the following hypothetical datasets:
order_id | customer_id | restaurant_id | order_placed | order_delivered |
---|---|---|---|---|
1 | 100 | 420 | 01/01/2022 12:00:00 | 01/01/2022 12:30:00 |
2 | 101 | 142 | 02/01/2022 13:00:00 | 02/01/2022 13:40:00 |
3 | 102 | 642 | 02/01/2022 18:00:00 | 02/01/2022 18:45:00 |
4 | 103 | 420 | 03/01/2022 11:00:00 | 03/01/2022 11:35:00 |
5 | 104 | 420 | 03/01/2022 20:00:00 | 03/01/2022 20:25:00 |
In this dataset, and are both timestamps.
Your task is to calculate the average delivery time for orders each month. The delivery time for an order can be calculated by subtracting the timestamp from the timestamp.
The PostgreSQL query for this might look like the following,
In this query, is used to group orders by month. The function is used to calculate the time difference between when the and in seconds, and then we divide by 60 to get the time in minutes. This is then averaged with the SQL function.
To practice a related window function SQL problem on DataLemur's free interactive coding environment, solve this Google SQL Interview Question:
Constraints are just rules for your DBMS to follow when updating/inserting/deleting data.
Say you had a table of Delivery Hero employees, and their salaries, job titles, and performance review data. Here's some examples of SQL constraints you could implement:
NOT NULL: This constraint could be used to ensure that certain columns in the employee table, such as the employee's first and last name, cannot contain NULL values.
UNIQUE: This constraint could be used to ensure that the employee ID is unique. This would prevent duplicate entries in the employee table.
PRIMARY KEY: This constraint could be used to combine the NOT NULL and UNIQUE constraints to create a primary key for the employee table. The employee ID could serve as the primary key.
FOREIGN KEY: This constraint could be used to establish relationships between the employee table and other tables in the database. For example, you could use a foreign key to link the employee ID to the department ID in a department table to track which department each employee belongs to.
CHECK: This constraint could be used to ensure that certain data meets specific conditions. For example, you could use a CHECK constraint to ensure that salary values are always positive numbers.
DEFAULT: This constraint could be used to specify default values for certain columns. For example, you could use a DEFAULT constraint to set the employee hire date to the current date if no value is provided when a new employee is added to the database.
Delivery Hero is interested in understanding the efficiency and timeliness of their deliveries. Specifically, they would like to know which cities achieve the fastest delivery times on average. As an Database Analyst, you are tasked to design a relational database and write an SQL query to find out average delivery time for each city.
Assume the datamodel contains two tables - and .
order_id | delivery_id | city_id | order_time | delivery_time |
---|---|---|---|---|
1001 | 2001 | 1 | 2023-02-01 09:30:00 | 2023-02-01 10:00:00 |
1002 | 2002 | 2 | 2023-02-01 09:45:00 | 2023-02-01 10:30:00 |
1003 | 2003 | 1 | 2023-02-01 10:00:00 | 2023-02-01 10:45:00 |
1004 | 2004 | 3 | 2023-02-01 11:00:00 | 2023-02-01 11:40:00 |
1005 | 2005 | 3 | 2023-02-01 11:15:00 | 2023-02-01 11:50:00 |
city_id | city_name |
---|---|
1 | Berlin |
2 | Hamburg |
3 | Munich |
The PostgreSQL query to get the average delivery time for each city is as follows:
EXPLAINATION:
The query joins and tables on . For each city, it calculates the time difference between and , converts it into minutes (using ), then calculates the average of these time differences. The result is sorted by in ascending order to show the cities with the fastest average delivery times at the top.
The function is used to remove NULLs, and replace them with another value.
For example, say you were a Data Analyst at Delivery Hero and were doing a customer analytics project to see who the most marketing-engaged customers were, and had access to the below table.
delivery_hero_customers:
customer_id | email_engagement | sms_engagement |
---|---|---|
101 | very_active | not_opted_in |
201 | un-subscribed | NULL |
301 | NULL | not_opted_in |
401 | not_active | very_active |
501 | very_active | mildly_active |
303 | NULL | NULL |
Before you could procede, you had to remove the NULLs, and replace them with the default value (not_active for email, and not_opted_in for SMS). To do this, you'd run the following query:
This would get you the following output:
customer_id | email_engagement | sms_engagement |
---|---|---|
101 | very_active | not_opted_in |
201 | un-subscribed | not_opted_in |
301 | not_active | not_opted_in |
401 | not_active | very_active |
501 | very_active | mildly_active |
303 | not_active | not_opted_in |
As a Delivery Hero data analyst, your task is to identify customers that meet a specific set of delivery requirements.
Specifically, you need to filter out customers who have:
The data is scattered across three tables: customers, orders, and order_items.
The table contains customer information:
customer_id | name | city | country |
---|---|---|---|
1 | John | Berlin | Germany |
2 | Sarah | Paris | France |
3 | Mark | Berlin | Germany |
4 | Lucy | London | UK |
The table records customer orders:
order_id | customer_id | order_date |
---|---|---|
101 | 1 | 2022-09-01 |
102 | 2 | 2022-09-02 |
103 | 3 | 2022-09-03 |
104 | 1 | 2022-09-04 |
105 | 1 | 2022-09-05 |
And the table contains individual items of each order:
order_item_id | order_id | product |
---|---|---|
1001 | 101 | Pizza |
1002 | 102 | Burger |
1003 | 103 | Pizza |
1004 | 104 | Pizza |
1005 | 105 | Pizza |
Here is a sample PostgreSQL query you could write to solve this problem:
This query first joins the three tables together. It makes sure that the city is Berlin and the country is Germany and selects only those records for which the order date is within the last 30 days and the product is Pizza. It groups by customer_id and name and gets only those records where the count of distinct order_id is greater than 3.
This query helps pipeline the customers that meet Delivery Hero's specific criteria, helping the company make data-driven decisions.
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1NF, 2NF, 3NF, etc.).
This is typically done to improve the performance of the database when it is being used for online analytics processing (OLAP), rather than online transaction processing (OLTP) use cases.
Denormalization can improve query performance by reducing the number of expensive joins required to retrieve data, but it comes with it's own drawbacks too. By adding redundant columns, you incur more data storage costs, and there's need for more complex update and delete operations in order to maintain data integrity across all the duplicated data. Thus, it's important to carefully consider the trade-offs involved before implementing denormalization.
Delivery Hero is interested in targeting marketing to customers who live in Berlin. From the table, find all customers who have a city name that contains the word 'Berlin'.
customer_id | first_name | last_name | address | city | |
---|---|---|---|---|---|
9834 | John | Doe | john.doe@email.com | 123 Anonymous St | Berlin-Pankow |
5643 | Jane | Smith | jane.smith@email.com | 456 Anonymous Ave | Berlin-Mitte |
3465 | Frank | White | frank.white@email.com | 789 Anonymous Blv | New York |
7489 | Jake | Green | jake.green@email.com | 234 Anonymous way | London |
customer_id | first_name | last_name | address | city | |
---|---|---|---|---|---|
9834 | John | Doe | john.doe@email.com | 123 Anonymous St | Berlin-Pankow |
5643 | Jane | Smith | jane.smith@email.com | 456 Anonymous Ave | Berlin-Mitte |
This SQL query will retrieve all customer records from the table where the field contains the string 'berlin' (case insensitive). The LIKE keyword in SQL allows us to use wildcards ('%') in our string comparisons, enabling us to match portions of a string. Here, we've enclosed 'berlin' in '%', which will match any string that contains 'berlin', at any position. Using the LOWER function ensures that the match is case-insensitive, so it will correctly match 'Berlin', 'BERLIN', 'berlin', etc.
Consider the and tables of the Delivery Hero database. The table contains records of all transactions and the table includes the list of all products, their prices, and their IDs. Assume that at some point a product can be sold at a discount.
Write a SQL query that can:
sale_id | sale_date | product_id | quantity | sale_price |
---|---|---|---|---|
1001 | 08/01/2021 00:00:00 | 2000 | 5 | 2600 |
1002 | 08/15/2021 00:00:00 | 2001 | 3 | 1800 |
1003 | 08/30/2021 00:00:00 | 2000 | 2 | 1200 |
1004 | 09/05/2021 00:00:00 | 2001 | 1 | 700 |
1005 | 09/10/2021 00:00:00 | 2000 | 6 | 3300 |
product_id | product_price |
---|---|
2000 | 650 |
2001 | 750 |
This solution calculates the total discount for each month and for each product by subtracting the sale price from the product price (for each unit sold), then sums it up for all units sold. Then, it calculates the sqrt of total_sale and rounds it to the nearest whole number. The resulting rows are filtered to exclude products where the discount was not given or zero.
The two most similar questions to "Calculate Monthly Discounts and Round-off for Each Product" are "Y-on-Y Growth Rate" and "Average Review Ratings".
The "Y-on-Y Growth Rate" question involves dealing with transactions and calculation of growth rates. This relates to the part of the original question where you need to calculate discounts.
The "Average Review Ratings" problem involves calculating averages grouped by month, which mirrors the monthly calculations in your question.
Here is the markdown output based on these findings:
To practice a very similar question try this interactive Wayfair Y-on-Y Growth Rate Question which is similar for calculating transaction values or this Amazon Average Review Ratings Question which is similar for monthly calculations.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Beyond just solving the above Delivery Hero SQL interview questions, you should also solve the 200+ SQL exercises on DataLemur which come from companies like Google, Microsoft and Silicon Valley startups.
Each exercise has hints to guide you, detailed solutions and best of all, there is an interactive SQL code editor so you can right in the browser run your query and have it checked.
To prep for the Delivery Hero SQL interview you can also be useful to solve SQL problems from other tech companies like:
But if your SQL coding skills are weak, forget about diving straight into solving questions – improve your SQL foundations with this SQL tutorial for Data Analytics.
This tutorial covers SQL concepts such as advantages of CTEs vs. subqueries and WHERE vs. HAVING – both of which show up routinely in Delivery Hero SQL assessments.
Beyond writing SQL queries, the other topics covered in the Delivery Hero Data Science Interview are:
The best way to prepare for Delivery Hero Data Science interviews is by reading Ace the Data Science Interview. The book's got: