Data Analytics, Data Science, and Data Engineering employees at Stitch Fix write SQL queries to analyze customer behavior patterns, such as purchase frequency, product preferences, and return rates, as well as to personalize shopping experiences by retrieving data on customer interactions, product recommendations, and inventory levels from multiple databases. That is why Stitch Fix often asks SQL coding interview questions.
Thus, to help you prepare, we've collected 8 Stitch Fix SQL interview questions – how many can you solve?
Stitch Fix is a personal styling service company. Every month, its styling team curates a box of fashion items (clothes, shoes, accessories) for their customers or 'users'. The users only pay for what they keep and return the rest items.
Given a table of user orders, your task is to write a SQL query that identifies the top 10 users who have spent the most (i.e. these could be our 'whale users').
For this task, we will define 'spent the most' as having the highest total order value across all orders, which means you should consider both the quantity of items they kept and the price of each item.
The table has the following structure:
order_id | user_id | order_date | item_id | quantity_kept | price_per_item |
---|---|---|---|---|---|
1 | 452 | 04/22/2022 | 6001 | 2 | 120 |
2 | 789 | 05/10/2022 | 3810 | 1 | 99 |
3 | 123 | 06/18/2022 | 7600 | 3 | 45 |
4 | 452 | 07/20/2022 | 3302 | 1 | 70 |
5 | 123 | 08/05/2022 | 4352 | 2 | 55 |
This query calculates the total amount spent by each user by multiplying the column by the' column, and then summing this for each user. The users are then sorted in descending order based on the total amount spent to identify the top 10 'whale users'.
To practice a related customer analytics question on DataLemur's free interactive coding environment, try this Microsoft SQL Interview problem:
Imagine you had a table of Stitch Fix employee salary data. Write a SQL query to find the top three highest paid employees within each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
Solve this problem directly within the browser on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
If the code above is hard to understand, you can find a detailed solution with hints here: Top 3 Department Salaries.
While knowing this answer is beyond the scope of most Data Analyst & Data Science interviews, Data Engineers at Stitch Fix should be at least aware of SQL vs. NoSQL databases.
Relational databases and non-relational (NoSQL) databases have some key differences, particularly in terms of how data is stored. Whereas relational databases have tables, made up of rows and columns, NoSQL databases use various data models like:
This added flexibility makes NoSQL databases well-suited for handling non-tabular data or data with a constantly changing format. However, this flexibility comes at the cost of ACID compliance, which is a set of properties (atomic, consistent, isolated, and durable) that ensure the reliability and integrity of data in a database. While most relational databases are ACID-compliant, NoSQL databases may not provide the same level of guarantees.
As a Data Analyst at Stitch Fix, one of your tasks includes analyzing product reviews over time. Your task is to write a SQL query to find out the average rating (stars) for each product per month. The table contains product reviews. Each review includes a review ID, user ID, review submission date, product ID, and a rating ranging from 1 to 5 stars.
Use the following data structure for the table:
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
In PostgreSQL, you can use the function to get month from the . Here is the SQL query.
This query first extracts the month from the and then groups by this extracted month and . For each group, it calculates the average of . Finally, it orders the result by month and product. The output will show the average rating for each product per month.
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
p.s. Window functions show up pretty often during SQL interviews, so practice the 27+ window function questions on DataLemur
Here's some strategies that can generally speed up a slow SQL query:
While there's an entire industry dedicated to database performance tuning , for Data Analyst and Data Science interviews at Stitch Fix, knowing the above basics should be more than enough to ace the SQL interview! Data Engineers on the other hand should know more about the command and understand a bit more about a query-execution plan.
You've been asked to analyze reviews for Stitch Fix, an online personal styling service. Specifically, your task is to calculate the average star rating of each item, per month.
Create a SQL query that will provide the output: the month, the product ID and the average stars it received in that month.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2020-06-08 | 50001 | 4 |
7802 | 265 | 2020-06-10 | 69852 | 4 |
5293 | 362 | 2020-06-18 | 50001 | 3 |
6352 | 192 | 2020-07-26 | 69852 | 3 |
4517 | 981 | 2020-07-05 | 69852 | 2 |
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
This query works by first extracting the month from the column. It then groups by both the extracted month and the , and at last it calculates the average rating of each product per each month. Please note that the function and the aggregate function are both standard SQL functions provided by PostgreSQL.
A join in SQL allows you to retrieve data from multiple tables and combine it into a single set of results.
In SQL, there are four different types of JOINs. To demonstrate each kind, imagine you had two database tables, an advertising campaigns table which had information on Google Ads keywords and how much was bid for each keyword, and a sales table, which has data on how many products were sold and which Google Ads keyword drove that sale.
: An INNER JOIN retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the table and the table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the table matches the keyword in the table.
: A LEFT JOIN retrieves all rows from the left table (in this case, the table) and any matching rows from the right table (the Sales table). If there is no match in the right table, values will be returned for the right table's columns.
: A RIGHT JOIN retrieves all rows from the right table (in this case, the Sales table) and any matching rows from the left table (the table). If there is no match in the left table, values will be returned for the left table's columns.
: A FULL OUTER JOIN retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, values will be returned for the columns of the non-matching table.
Stitch Fix's inventory management team needs to track the average price change of products across each month for budgeting and future price adjustment considerations. Write a SQL query to calculate the absolute value of the average monthly change of product prices with the difference rounded to the nearest cent. Assume prices are always adjusted at the first day of the month. Product price changes are calculated as (price_this_month - price_last_month).
Example input:
date | product_id | price |
---|---|---|
2022-01-01 | 101 | 100.00 |
2022-02-01 | 101 | 105.00 |
2022-03-01 | 101 | 102.00 |
2022-04-01 | 101 | 104.00 |
2022-01-01 | 102 | 150.00 |
2022-02-01 | 102 | 148.00 |
2022-03-01 | 102 | 152.00 |
2022-04-01 | 102 | 150.00 |
Example output:
month | product_id | avg_price_change |
---|---|---|
2 | 101 | 5.00 |
3 | 101 | 3.00 |
4 | 101 | 2.00 |
2 | 102 | 2.00 |
3 | 102 | 4.00 |
4 | 102 | 2.00 |
The above query makes use of PostgreSQL's window function to compare prices between each month and the previous month for each product. The absolute difference between each current month's price and the previous month's price is found using the function, and then the function is used to find the absolute average monthly price change per product, rounded to the nearest cent using the function. The resulting dataset is then grouped by month and product_id.
To practice a very similar question try this interactive Wayfair Y-on-Y Growth Rate Question which is similar for using time-series and window functions or this Amazon Average Review Ratings Question which is similar for calculating averages and grouping by month.
The key to acing a Stitch Fix SQL interview is to practice, practice, and then practice some more! In addition to solving the earlier Stitch Fix SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Netflix, Google, and Amazon.
Each problem on DataLemur has multiple hints, step-by-step solutions and most importantly, there is an online SQL code editor so you can instantly run your SQL query answer and have it executed.
To prep for the Stitch Fix SQL interview it is also useful to solve SQL questions from other fashion & ecommerce companies like:
Uncover the innovative ways Stitch Fix is using data science to revolutionize the fashion industry!
However, if your SQL coding skills are weak, forget about diving straight into solving questions – improve your SQL foundations with this free SQL tutorial.
This tutorial covers SQL concepts such as finding NULLs and LEAD/LAG – both of these show up often in Stitch Fix SQL assessments.
Besides SQL interview questions, the other question categories covered in the Stitch Fix Data Science Interview are:
I believe the best way to study for Stitch Fix Data Science interviews is to read the book Ace the Data Science Interview.
It has 201 data interview questions sourced from tech companies like Google & Microsoft. The book's also got a refresher on Product Analytics, SQL & ML. And finally it's helped a TON of people, which is how it's gotten to be an Amazon best-seller with over 1000+ 5-star reviews.
While the book is more technical in nature, it's also important to prepare for the Stitch Fix behavioral interview. Start by understanding the company's values and mission.