At Instacart, SQL is used to generate customer insights into how people use the app to shop for groceries. That's why Instacart LOVES to ask SQL query questions during interviews for Data Science and Data Engineering positions.
To help you practice for the Instacart SQL interview, we've collected 9 Instacart SQL interview questions – able to solve them?
Given the and datasets, write a SQL query to analyze the product sales trend over time. Specifically, for each month and for each product, compute the total number of orders, the total product sales and the growth rate from the previous month.
order_id | user_id | order_date |
---|---|---|
1 | 1 | 2022-01-01 |
2 | 1 | 2022-01-02 |
3 | 2 | 2022-01-02 |
4 | 2 | 2022-02-01 |
5 | 3 | 2022-02-02 |
order_id | product_id | product_price |
---|---|---|
1 | 1 | 5.0 |
1 | 2 | 10.0 |
2 | 1 | 5.0 |
3 | 2 | 10.0 |
4 | 1 | 5.0 |
5 | 2 | 10.0 |
mth | product_id | total_orders | total_sales | growth_rate |
---|---|---|---|---|
1 | 1 | 2 | 10.0 | NaN |
1 | 2 | 2 | 20.0 | NaN |
2 | 1 | 1 | 5.0 | -0.50 |
2 | 2 | 1 | 10.0 | -0.50 |
In this query, we first join and to compute the monthly sales and order count for each product. Then, we use the function in a window function to fetch the sales of the previous month. Finally, we compute the growth rate using the current and previous month's sales.
Pro Tip: Window functions are a frequent SQL interview topic, so practice every window function problem on DataLemur
Instacart is an American company that specializes in delivering groceries from your favorite stores right to your doorstep. As an analyst in the Instacart team, one of your responsibilities is to provide valuable insights from the database which contains several tables with useful information about users and their orders. Given the tables , , , and , your task is to write a SQL query to extract the most frequently bought item from each store.
user_id | name | registration_date |
---|---|---|
1 | John Doe | 2022-01-01 |
2 | Jane Doe | 2022-02-01 |
store_id | name |
---|---|
1 | Store A |
2 | Store B |
item_id | name | category |
---|---|---|
1 | Bananas | Fruit |
2 | Apples | Fruit |
3 | Milk | Dairy |
order_id | user_id | store_id | order_date |
---|---|---|---|
1 | 1 | 1 | 2022-06-01 |
2 | 2 | 1 | 2022-06-02 |
3 | 1 | 2 | 2022-07-01 |
order_id | item_id | quantity |
---|---|---|
1 | 1 | 2 |
1 | 2 | 1 |
2 | 1 | 3 |
3 | 3 | 1 |
In this query, we join the , , , and tables using their common keys. We group the results by store and item name and then count the quantity of each item. The result shows the most frequently bought item from each store and its quantity. The clause arranges the output in a manner that the most frequently bought items per store are listed first.
An inner join only includes rows from both tables that satisfy the join condition, whereas a full outer join includes all rows from both tables, regardless of whether they match the join condition or not.
For example, suppose you had a table of Instacart orders and Instacart customers.
INNER JOIN: When there is a match in the shared key or keys, rows from both tables are retrieved. In this example, an between the Orders and Customers tables would retrieve rows where the in the Orders table matches the in the Customers table.
FULL OUTER JOIN: A retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
Given a table 'orders' with columns 'order_id', 'user_id', 'order_date' and a table 'order_items' with columns 'item_id', 'order_id', 'product', and 'quantity' representing orders placed on the Instacart app, calculate the average number of items () per order.
order_id | user_id | order_date |
---|---|---|
1 | 123 | 06/08/2022 |
2 | 456 | 06/10/2022 |
3 | 123 | 06/12/2022 |
4 | 789 | 06/12/2022 |
item_id | order_id | product | quantity |
---|---|---|---|
1 | 1 | Apple | 4 |
2 | 1 | Banana | 2 |
3 | 2 | Cherry | 10 |
4 | 3 | Donut | 3 |
5 | 4 | Elderberry | 1 |
order_id | avg_items |
---|---|
1 | 3 |
2 | 10 |
3 | 3 |
4 | 1 |
The inner join operation merges the order_items table and the orders table based on the order_id. The clause groups the data according to order_id, and the function calculates the average quantity of items per order.
To practice a very similar question try this interactive Alibaba Compressed Mean Question which is similar for needing to calculate average order quantities or this Walmart Histogram of Users and Purchases Question which is similar for requiring an understanding of user transactions and purchases.
The clause works similarly to the clause, but it is used to filter the groups of rows created by the clause rather than the rows of the table themselves.
For example, say you were analyzing Instacart sales data:
This query retrieves the total sales for all products in each region, and uses the clause to only sales made after January 1, 2023. The rows are then grouped by region and the clause filters the groups to include only those with total sales greater than $500k.
Given two tables, and from a company like Instacart, calculate the Click-Through-Rate (CTR) on a particular product and the conversion rate from viewing a product to adding a product to the cart.
click_id | user_id | click_date | product_id |
---|---|---|---|
1500 | 101 | 06/06/2022 09:30:00 | 100 |
2300 | 102 | 06/07/2022 15:00:00 | 200 |
3200 | 101 | 06/07/2022 21:05:00 | 100 |
4200 | 103 | 06/08/2022 11:00:00 | 200 |
5400 | 102 | 06/08/2022 16:30:00 | 100 |
cart_id | user_id | add_cart_date | product_id |
---|---|---|---|
3500 | 101 | 06/06/2022 18:20:00 | 100 |
4600 | 102 | 06/07/2022 19:45:00 | 200 |
5600 | 103 | 06/08/2022 19:00:00 | 200 |
CTR is calculated with the formula (number of add_to_carts / number of clicks) * 100.
And Conversion Rate is calculated as (number of purchase / number of add_to_carts) * 100.
This query works by joining the table with the and tables on the and fields. It then calculates the total number of clicks, add_to_carts and purchases for each product using the function. Finally, it calculates the CTR and conversion rate by dividing the number of add_to_carts and purchases by the number of clicks and add_to_carts respectively, then multiplying by 100 to get a percentage. Note that it is using to avoid excluding products that have not been added to a cart or purchased.
To solve a similar SQL problem on DataLemur's free interactive coding environment, try this SQL interview question asked by Facebook:
Stored procedures are a lot like functions in programming. They're used to encapsulate and organize business logic into one unit of code, and they can accept multiple input parameters and return multiple output values.
For a concrete example, suppose you were a Data Scientist on the Marketing Analytics team at Instacart. A task you would encounter freqently would be to calculate the conversion rate for Instacart's ads over a certain time period, for which you would write the following stored procedure:
To use this conversion rate stored procedure, you could call it like this:
As an analyst at Instacart, one of your responsibilities is to gain insights into customer behavior through the data you have. For this task, you'll need to retrieve the average total cost of orders placed by each user per month. The total cost of an order is the sum of the cost of all items in that order.
For simplicity, let's assume that the data is present in two tables - and . The table contains details about each order placed, and the table contains details about the items in each order, including the cost of the item in that order.
Please note that the user_id in the table corresponds to the user who placed the order, and the order_id is a unique identifier for the order. This same order_id is used in the table.
order_id | user_id | order_date |
---|---|---|
232 | 354 | 02/13/2023 |
234 | 354 | 02/14/2023 |
255 | 489 | 02/20/2023 |
313 | 354 | 03/05/2023 |
375 | 489 | 03/16/2023 |
order_item_id | order_id | item_cost |
---|---|---|
4321 | 232 | 12.99 |
4322 | 232 | 3.49 |
4323 | 234 | 19.99 |
4324 | 234 | 4.99 |
4325 | 255 | 13.99 |
4326 | 313 | 18.99 |
4327 | 375 | 15.99 |
user_id | month | average_total_cost |
---|---|---|
354 | 2 | 20.23 |
489 | 2 | 13.99 |
354 | 3 | 18.99 |
489 | 3 | 15.99 |
In this query, we first create a subquery that calculates the total cost for each order by summing the item costs in the table. Then, we join the table and the subquery on the order_id column.
Finally, we group by the user_id and month (extracted from order_date) and take the average of the total cost for each user's orders in each month.
Suppose Instacart is running a marketing campaign where they give discounts on items on specific days. The discounts are stored in a separate table along with the date. The amount of discount differs every day. Can you write a PostgreSQL query to calculate the average discount and the total sum of the price paid (price - discount) for each product?
order_id | product_id | order_date | price |
---|---|---|---|
1001 | 50001 | 2022-06-08 | 10.00 |
1002 | 69852 | 2022-06-10 | 15.00 |
1003 | 50001 | 2022-06-18 | 10.00 |
1004 | 69852 | 2022-07-26 | 15.00 |
1005 | 69852 | 2022-07-05 | 15.00 |
discount_id | product_id | discount_date | discount |
---|---|---|---|
901 | 50001 | 2022-06-08 | 1.00 |
902 | 69852 | 2022-06-10 | 0.50 |
903 | 50001 | 2022-06-18 | 0.75 |
904 | 69852 | 2022-07-26 | 1.00 |
905 | 69852 | 2022-07-05 | 0.25 |
This query first joins the orders table and the discounts table on both the order_date and the product_id. It uses a LEFT JOIN to ensure that all the orders are included even if they do not have a corresponding discount. The COALESCE function is used to replace the NULL values (for orders without a discount) to 0. Then, it groups the results by product_id, and calculates the average discount and the total sum of the price paid for each product.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for dealing with product prices and sum calculations or this Stripe Repeated Payments Question which is similar for dealing with a time-specific feature of the transactions.
The key to acing a Instacart SQL interview is to practice, practice, and then practice some more! Besides solving the earlier Instacart SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Google, Microsoft and Silicon Valley startups.
Each SQL question has multiple hints, fully explained answers along with a discussion board to see how others solved it and best of all, there's an interactive coding environment so you can right in the browser run your SQL query and have it graded.
To prep for the Instacart SQL interview you can also be a great idea to practice SQL questions from other tech companies like:
But if your SQL coding skills are weak, forget about jumping right into solving questions – go learn SQL with this free SQL for Data Analytics course.
This tutorial covers things like filtering data with WHERE and aggregate functions – both of these show up routinely in Instacart SQL interviews.
Besides SQL interview questions, the other question categories tested in the Instacart Data Science Interview are:
To prepare for Instacart Data Science interviews read the book Ace the Data Science Interview because it's got: