logo

8 Shopify SQL Interview Questions (Updated 2024)

Updated on

February 6, 2024

At Shopify, SQL is used all the damn time for analyzing e-commerce data for insights and troubleshooting issues with their customers' online stores. They even have their own version of SQL called ShopifyQL used to query merchant data. Beause of this, Shopify almost always asks SQL coding questions in interviews for Data Science, Analytics, and & Data Engineering jobs.

So, to prepare for the Shopify SQL Assessment, we've collected 8 Shopify SQL interview questions to practice – can you solve them?

Shopify SQL Logo

8 Shopify SQL Interview Questions

SQL Question 1: Analyze Average Product Review Ratings Per Month

As part of Shopify's data analysis team, you are tasked to analyze the customer review data to get insights on the product performance over time. Given a table named 'reviews' with columns 'review_id', 'user_id', 'submit_date', 'product_id' and 'stars', write a SQL query to calculate the average star rating given by the users on each product per month.

Example Input:

Example Output:

Answer:


The above query calculates the average reviews per month for each product. We exploit PostgreSQL's window functions for this - specifically, the AVG function. We PARTITION BY the product_id and month to group the average calculations by each product's monthly reviews. The EXTRACT function is used to get the month from the date column. The ORDER BY clause orders the output by month and product id.

Pro Tip: Window functions are a popular SQL interview topic, so practice every window function problem on DataLemur

DataLemur SQL Questions

SQL Question 2: Analyze sales data to measure product performance

Shopify is interested in analyzing the sales performance of its products. They would like to measure the total sales, total quantity sold, and average sales price over a specific time period for each product.

They have two main tables, and . Here is some sample data.

Example Input:

product_idproduct_namecategory
1Mobile Phone AElectronics
2Men's Running ShoesSportswear
3Women's Casual HandbagFashion

Example Input:

sales_idproduct_idsale_datequantitysale_price
1101/01/20223300
2101/02/20222280
3201/03/20225150
4301/04/20221200
5201/05/20222140

Shopify would like a PostgreSQL query that returns a table with columns for product_name, category, total sales, total quantity, and average selling price for each product. They want the data grouped by product and sorted in descending order by total sales.

Answer:


This query first joins the and tables on product_id then groups by product name and category. By using the SUM aggregate function, we are able to compute the total sales and total quantity for each product. We then compute the average selling price by dividing the total sales by the total quantity. The results are sorted in descending order by total sales to easily identify the best-selling products.

SQL Question 3: Can you explain the distinction between a correlated and a non-correlated sub-query?

A correlated sub-query is a sub-query that depends on the outer query and cannot be executed separately. It uses the outer query to filter or transform data by referencing a column from it, and the outer query uses the results of the inner query. In contrast, a non-correlated sub-query is independent of the outer query and can be executed on its own. It does not reference any columns from the outer query and is used to retrieve additional data needed by the outer query.

Here is an example of a correlated sub-query:


This query selects the and total sales of all Shopify customers in the sales table whose are greater than the average of their own customer group. The sub-query in this case is correlated with the outer query, as it references the column from the outer query ().

Here is an example of a non-correlated sub-query:


The sub-query in this case is non-correlated, as it does not reference any columns from the outer query.

Performance-wise, correlated sub-queries are generally slower to execute, as they have to be re-evaluated for each row of the outer query, while non-correlated sub-queries are faster, as they only have to be executed once.

Shopify SQL Interview Questions

SQL Question 4: Find Customers With Unfulfilled Orders

Shopify wants to analyze their customers' order records to find those customers who have unfulfilled orders older than 15 days. As part of this process, Shopify would like you to create a SQL query to filter and present these customer records based on the unfulfilled orders' details.

Example Input:
order_idcustomer_idorder_dateproduct_namefulfilled
13492022-08-01Laptoptrue
24822022-08-20Desktopfalse
35912022-09-05Mousetrue
43492022-09-01Keyboardfalse
54822022-09-15Laptop Standfalse
Example Output:
customer_idorder_dateproduct_name
4822022-08-20Desktop
3492022-09-01Keyboard

Answer:


This PostgreSQL query selects the customer_id, order_date, and product_name from the orders table where the orders are not fulfilled () and the current date is over 15 days past the order date (). The resulting table lists the customer_id, the date of the order, and the product_name for any unfulfilled orders older than 15 days.

undefined

SQL Question 5: What do the SQL commands / do?

The / operator is used to remove to return all rows from the first SELECT statement that are not returned by the second SELECT statement.

Note that is available in PostgreSQL and SQL Server, while MINUS is available in MySQL and Oracle (but don't worry about knowing which DBMS supports which exact commands since Shopify interviewers aren't trying to trip you up on memorizing SQL syntax).

For a tangible example of in PostgreSQL, suppose you were doing an HR Analytics project for Shopify, and had access to Shopify's contractors and employees data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables.

You could use operator to find all contractors who never were a employee using this query:


SQL Question 6: Calculate Average Sales per Product per Month

Imagine you are a data analyst working at Shopify and you are asked to calculate the average sales per product per month. The company wants to understand the sales performance to strategize their product marketing accordingly. You are given the table which logs every sale made.

Note: The sales table has the selling price of each product, not their original price. It allows for price variations over time and discounts.

Example Input:
sale_idsale_dateproduct_idprice
280501/15/20213428$20
720902/17/20212351$25
903403/04/20213428$22
373801/26/20212351$30
125402/11/20212351$23

Answer:


This query extracts the month from each sale date and groups the sales by the month and the product_id. Then, it calculates the average price for each group. The result is a list of monthly average sales for each product. The results are ordered by month and product_id for easy reading. undefined

SQL Question 7: Could you provide a list of the join types in SQL and explain what each one does?

Joins in SQL allow you to combine data from different tables based on a shared key or set of keys.

Four JOIN types are available in SQL. For an example of each one, say you had sales data exported from Shopify's Salesforce CRM stored in a PostgreSQL database, and had access to two tables: and .

  • : retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the Sales table and the Customers table would retrieve only the rows where the customer_id in the Sales table matches the customer_id in the table.
  • : retrieves all rows from the left table (in this case, the sales table) and any matching rows from the right table (the table). If there is no match in the right table, NULL values will be returned for the right table's columns.
  • : retrieves all rows from the right table (in this case, the customers table) and any matching rows from the left table (the sales table). If there is no match in the left table, NULL values will be returned for the left table's columns.
  • : retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.

SQL Question 8: Find Customers by Email Domain

As part of Shopify's research, they want to understand the distribution of their customer's email domains (like gmail.com, yahoo.com, etc). You are given a table containing all customer records. Can you write a SQL query that filters this data to find all the customers that are using a Gmail account?

Below is the example of the "customers" table:

Example Input:

Your task is to write a query that filters the customer information for only those customers with a Gmail account.

Answer:

The SQL query you could use in PostgreSQL could look something like this:


This statement begins by specifying the columns we want to select (all). It then determines the table we need - "customers". The WHERE clause uses the keyword to filter the "email" column for those that contains "@gmail.com". The result will be a table that only includes records for customers with a Gmail account. undefined

How To Prepare for the Shopify SQL Interview

Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Shopify SQL interview is to solve as many practice SQL interview questions as you can! Besides solving the above Shopify SQL interview questions, you should also solve the 200+ SQL exercises on DataLemur which come from companies like Google, Facebook, Microsoft and Amazon. DataLemur Questions

Each SQL question has multiple hints, step-by-step solutions and best of all, there is an online SQL coding environment so you can right in the browser run your query and have it executed.

To prep for the Shopify SQL interview it is also wise to solve SQL problems from other tech companies like:

However, if your SQL skills are weak, don't worry about going right into solving questions – refresh your SQL knowledge with this SQL tutorial for Data Analytics.

SQL tutorial for Data Analytics

This tutorial covers topics including HAVING and handling NULLs in SQL – both of which show up frequently during Shopify SQL assessments.

Shopify Data Science Interview Tips

What Do Shopify Data Science Interviews Cover?

For the Shopify Data Science Interview, in addition to SQL query questions, the other types of questions to prepare for are:

  • Probability & Stats Questions
  • Coding Questions in Python or R
  • Product Analytics Questions
  • ML Modelling Questions
  • Resume-Based Behavioral Questions

Shopify Data Scientist

How To Prepare for Shopify Data Science Interviews?

To prepare for Shopify Data Science interviews read the book Ace the Data Science Interview because it's got:

  • 201 interview questions taken from Microsoft, Amazon & startups
  • a crash course on SQL, Product-Sense & ML
  • over 900+ 5-star reviews on Amazon

Acing Data Science Interview