At grocery technology company Ocado, SQL does the heavy lifting for analyzing online shopping patterns and optimizing warehouse logistics. So, it shouldn't surprise you that Ocado asks SQL coding questions in interviews for Data Science, Analytics, and & Data Engineering jobs.
As such, to help you ace the Ocado SQL interview, we've curated 11 Ocado SQL interview questions – can you solve them?
Ocado is an online supermarket, and one of the ways to identify "power users" or "VIP users" for such a business is to find customers who have made frequent and large amount of transactions. We have two tables, and . The table contains the , , and , while the table contains the , , and . Please write a SQL query to find the top 5 customers who have the highest total transaction amount for the last year.
order_id | customer_id | order_date |
---|---|---|
101 | 123 | 02/15/2021 |
102 | 265 | 03/10/2021 |
103 | 362 | 04/18/2021 |
104 | 192 | 05/26/2021 |
105 | 981 | 06/05/2021 |
order_id | product_id | quantity | price |
---|---|---|---|
101 | 50001 | 5 | 10 |
102 | 69852 | 2 | 15 |
103 | 50001 | 3 | 10 |
104 | 69852 | 1 | 15 |
105 | 69852 | 2 | 15 |
Below is the PostgreSQL query that will solve the problem.
This query first joins the and tables on . It then filters for orders that were made in the last year. The total transaction amount for each customer is calculated as the sum of multiplied by for each of their orders. The result is grouped by and ordered in descending order of the total transaction amount. The at the end of the command ensures we only get the top 5 customers.
To solve a super-customer analysis question on DataLemur's free interactive coding environment, try this Microsoft Azure Cloud SQL Interview Question:
Ocado is interested in understanding product performance from customers' reviews. Could you write a SQL query to calculate the average monthly rating of each product? To do this, you will need to extract the month from the submit_date and group by it along with the product_id. Use the function to compute the average rating.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
month | product | avg_rating |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
This query first extracts the month part from the submit_date using the function. Then it groups the records by month and product_id. Finally, it averages the stars for each group using the function. The results are ordered by month and product for easy reading. Please notice that the Ocado uses the PostgreSQL database, therefore the syntax might not work in other SQL dialects.
To practice a similar window function interview problem which uses RANK() on DataLemur's free interactive coding environment, solve this Amazon SQL Interview Question:
In SQL, a join retrieves rows from multiple tables and combines them into a single result set.
Four JOIN types can be found in SQL. For an example of each one, suppose you had a table of Ocado orders and Ocado customers.
INNER JOIN: Rows from both tables are retrieved when there is a match in the shared key or keys. An between the Orders and Customers tables would return only rows where the in the Orders table matches the in the Customers table.
LEFT JOIN: A retrieves all rows from the left table (in this case, the Orders table) and any matching rows from the right table (the Customers table). If there is no match in the right table, NULL values will be returned for the right table's columns.
RIGHT JOIN: A combines all rows from the right table (in this case, the Customers table) and any matching rows from the left table (the Orders table). If there is no match in the left table, NULL values will be displayed for the left table's columns.
FULL OUTER JOIN: A combines all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be displayed for the columns of the non-matching table.
Ocado - an online supermarket - wants to optimize its warehouse stock management. They have several suppliers for different products. They want to know which products are most frequently purchased, and which supplier provides those products. This information will help them manage their stock more effectively.
The company has two tables:
- records each purchase of a product by a customer
purchase_id | customer_id | purchase_date | product_id | quantity |
---|---|---|---|---|
1 | 200 | 2022-07-01 | 1 | 4 |
2 | 300 | 2022-07-02 | 2 | 1 |
3 | 400 | 2022-07-02 | 3 | 2 |
4 | 500 | 2022-07-03 | 1 | 3 |
5 | 600 | 2022-07-04 | 1 | 2 |
- gives detailed information about each product
product_id | product_name | supplier_id |
---|---|---|
1 | Bananas | 1 |
2 | Apples | 1 |
3 | Pears | 2 |
Given these two tables, write a query that will show the products with the highest cumulative quantity ordered, along with the supplier that provides the product.
This query first creates a temporary table (product_purchases) that aggregates the total quantity of each product purchased. It then joins this table with the products table to display the product name and supplier ID. It orders the results by the total quantity in descending order, so the most purchased products (and their supplier) are at the top of the result set.
A cross-join, also referred to as a cartesian join, is a type of JOIN that returns the cross-product of two tables. In a cross-join, each row from the first table is paired with every row from the second table, resulting in a new table with a row for each possible combination of rows from the two input tables.
For example, say you worked on the Marketing Analytics team at Ocado, and were tasked to understand what advertising copy (text) goes best with what advertising creative (the photo that gets used along with the text copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
You could this query to generate all possible combinations of ad copy and ad creative to help you create more effective ads for Ocado. Just be careful: if you had 1,000 rows of ad copy and 1,000 different image creatives, the resulting cross-join would have 1 million rows! As a result, it's important to use cross-joins judiciously, and make sure that you're not generating more data than you need to.
Ocado wants to filter down its customers to those who made a purchase in the 'Vegetables' category last month for a special promotional email. Write a SQL command which can be used to filter down the customer base for this purpose.
customer_id | first_name | last_name | |
---|---|---|---|
1 | Jane | Doe | jane.doe@gmail.com |
2 | John | Smith | john.smith@gmail.com |
3 | Amy | Johnson | amy.johnson@hotmail.com |
4 | Mike | Davis | mike.davis@yahoo.com |
order_id | customer_id | order_date | category | product_name |
---|---|---|---|---|
001 | 1 | 2021-08-12 | Fruits | Apple |
002 | 2 | 2021-07-15 | Vegetables | Carrot |
003 | 1 | 2021-07-20 | Vegetables | Broccoli |
004 | 3 | 2021-07-22 | Dairy | Milk |
005 | 4 | 2021-08-02 | Bakery | Bread |
This SQL query joins the table with the table based on . The WHERE constraint is then applied to filter out orders that were made within the last month, by comparing against the date of 1 month ago, as well as orders that fall into the 'Vegetables' category. This will give a list comprising of customer details who bought a vegetable last month.
The SQL command merges the results of multiple statements and keeps only those rows that are present in all sets.
For example, say you were doing an HR Analytics project for Ocado, and had access to Ocado's employees and contractors data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables. You could use operator to find all contractors who also show up in the employees table:
As a data analyst at Ocado - an online supermarket, we are interested in knowing the average cost of orders placed by each customer within a given year for better inventory management and promotional planning. Can you write a query to calculate the average cost of orders placed by each customer in 2020?
Consider the following schema for Table:
order_id | customer_id | order_date | total_price |
---|---|---|---|
7198 | 1172 | 2020-07-10 | 56.75 |
5261 | 1548 | 2020-11-13 | 47.80 |
3419 | 1172 | 2020-09-12 | 30.45 |
6523 | 1548 | 2020-04-05 | 73.20 |
4561 | 1172 | 2020-02-08 | 48.60 |
customer_id | avg_order_price_2020 |
---|---|
1172 | 45.27 |
1548 | 60.50 |
The above query works by first filtering out orders that were placed in the year 2020. Then it groups records by and calculates the average for each group. The result is a list of customers along with the average cost of their orders in 2020.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for calculation of top items per category or this Walmart Histogram of Users and Purchases Question which is similar for the analysis of customer's purchases.
For a company like Ocado, it's often crucial to filter customer records based on their emails; to detect and separate business accounts from personal ones. Business emails often have a distinct pattern, for instance, they are more likely to contain the string '"ocado.com'" than personal emails.
Given a customer record database, can you write a SQL query to find all records of customers whose email ends with '"ocado.com'"?
Assume we have the following table:
customer_id | first_name | last_name | |
---|---|---|---|
C001 | John | Doe | johndoe@ocado.com |
C002 | Jane | Smith | janesmith@gmail.com |
C003 | Emily | Clark | emilyclark@ocado.com |
C004 | Robert | Brown | robertbrown@yahoo.com |
C005 | Maria | Garcia | mariagarcia@ocado.com |
Can you provide a list of customer_ids and emails of those who have email ending with "ocado.com"?
Here is a SQL query that can be used to solve this problem:
This query will filter the table and return the and of customers whose ends with 'ocado.com'.
customer_id | |
---|---|
C001 | johndoe@ocado.com |
C003 | emilyclark@ocado.com |
C005 | mariagarcia@ocado.com |
With this query result, the company can now identify and separate the business account customers from their database.
In database schema design, a one-to-one relationship between two entities means that each entity is linked to a single instance of the other. For example, the relationship between a car and a license plate is one-to-one because each car has only one license plate, and each license plate belongs to one car.
In contrast, a one-to-many relationship is when one entity can be associated with multiple instances of the other entity. An example of this is the relationship between a teacher and their classes - a teacher can teach multiple classes, but each class is only associated with one teacher.
Imagine you're a data analyst at Ocado. Your manager has asked you to analyze customer orders' distribution on the basis of regions and categories. The company has two relevant tables, and . Here is the structure and some sample data from these tables:
customer_id | first_name | last_name | region | |
---|---|---|---|---|
101 | John | Doe | john.doe@gmail.com | North |
102 | Jane | Smith | jane.smith@yahoo.com | South |
103 | Sam | Johnson | sam.johnson@hotmail.com | East |
104 | Emma | Williams | emma.williams@aol.com | West |
105 | Mike | Brown | mike.brown@gmail.com | East |
order_id | customer_id | category | product | order_date | price |
---|---|---|---|---|---|
201 | 101 | Fresh Food | Apples | 02/01/2022 | 7 |
202 | 101 | Bakery | Bread | 02/03/2022 | 3 |
203 | 102 | Drinks | Soda | 02/05/2022 | 2 |
204 | 103 | Fresh Food | Banana | 02/07/2022 | 5 |
205 | 104 | Household | Paper Towels | 02/09/2022 | 10 |
Write a SQL query that will return a list of regions, the total number of orders, and total revenue, sorted by revenue in descending order.
To answer this question, we need to join the table and table using the field, and then aggregate the data within the resulting table based on the regions.
Here is the SQL query for this:
This query starts by performing an inner join on the and tables based on customer id. Once the tables are joined, we use the clause to group the data by region. Then, within each group, we use the function to get the total number of orders and the function to calculate the total revenue. Finally, we use the clause to sort our results by revenue in descending order.
Since joins come up routinely during SQL interviews, try this interactive Snapchat SQL Interview question using JOINS:
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Ocado SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the above Ocado SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like Facebook, Google and unicorn tech startups.
Each exercise has multiple hints, detailed solutions and best of all, there's an interactive coding environment so you can easily right in the browser your SQL query answer and have it executed.
To prep for the Ocado SQL interview it is also wise to practice interview questions from other tech companies like:
But if your SQL query skills are weak, forget about going right into solving questions – go learn SQL with this interactive SQL tutorial.
This tutorial covers things like HAVING and aggreage functions like MIN()/MAX() – both of these show up frequently in Ocado interviews.
Beyond writing SQL queries, the other types of problems to prepare for the Ocado Data Science Interview are:
To prepare for Ocado Data Science interviews read the book Ace the Data Science Interview because it's got: