Data Scientists, Analysts, and Data Engineers at CHS use SQL queries to extract agricultural data for predictive models, helping them forecast trends and make informed decisions. It is also essential for managing stock inventory databases, allowing them to improve supply chain processes and ensure efficient resource allocation, for this reason, CHS asks jobseekers SQL interview questions.
So, to help you prep, here's 10 CHS SQL interview questions – able to solve them?
CHS is an ecommerce platform that sells various products. They define their "Whale" customers as those who have purchased at least 10 different products, and who have spent more than $2000 in total, within the last year.
Write a SQL query to identify these high value customers. Assume we have access to the following tables:
orders
Example Input:order_id | user_id | order_date | total_cost |
---|---|---|---|
101 | 123 | 2021-10-15 | $500 |
102 | 485 | 2022-01-07 | $100 |
103 | 123 | 2022-02-16 | $300 |
104 | 266 | 2022-03-05 | $50 |
105 | 123 | 2022-08-10 | $1200 |
order_items
Example Input:order_item_id | order_id | product_id |
---|---|---|
1 | 101 | 1001 |
2 | 101 | 1002 |
3 | 102 | 1003 |
4 | 103 | 1002 |
5 | 103 | 1004 |
6 | 105 | 1005 |
7 | 105 | 1006 |
8 | 105 | 1001 |
9 | 105 | 1007 |
10 | 105 | 1008 |
11 | 105 | 1009 |
SELECT o.user_id FROM orders o JOIN order_items oi ON o.order_id = oi.order_id WHERE o.order_date > CURRENT_DATE - INTERVAL '1 year' GROUP BY o.user_id HAVING COUNT(DISTINCT oi.product_id) >= 10 AND SUM(o.total_cost) > 2000;
This query first filters orders that were placed within the last year. We then join orders
with order_items
to connect purchased products with users. The GROUP BY
clause groups the data by user ids. The HAVING
clause then filters out the 'whale' users, i.e., users who have purchased at least 10 different products (as indicated by COUNT(DISTINCT oi.product_id) >= 10
) and have spent more than $2000 (as indicated by SUM(o.total_cost) > 2000
). This query will return a list of user_ids who qualify as 'whale' customers.
To work on another SQL customer analytics question where you can solve it interactively and have your SQL query automatically checked, try this Walmart SQL Interview Question:
Discover CHS' news and stories to learn about their commitment to agriculture and energy solutions that support communities and farmers! Keeping up with CHS's updates can help you appreciate the vital role they play in the agricultural supply chain.
Given a table of CHS employee salary data, write a SQL query to find employees who make more than their own boss.
employees
Example Input:employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns $8,000, surpassing her manager, William Davis who earns 7,800.
Code your solution to this interview question and run your code right in DataLemur's online SQL environment:
First, we perform a SELF-JOIN where we treat the first employee
table (mgr
) as the managers' table and the second employee
table (emp
) as the employees' table. Then we use a WHERE
clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
SELECT emp.employee_id AS employee_id, emp.name AS employee_name FROM employee AS mgr INNER JOIN employee AS emp ON mgr.employee_id = emp.manager_id WHERE emp.salary > mgr.salary;
If the solution above is tough, you can find a detailed solution with hints here: Employee Salaries Higher Than Their Manager.
An inner join only includes rows from both tables that satisfy the join condition, whereas a full outer join includes all rows from both tables, regardless of whether they match the join condition or not.
For a concrete example, imagine you had two database tables, an advertising campaigns table which had information on Google Ads keywords and how much was bid for each keyword, and a CHS sales table, which has data on how many products were sold and which Google Ads keyword drove that sale.
INNER JOIN
: retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the Advertising_Campaigns
table and the Sales
table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the Advertising_Campaigns
table matches the keyword in the Sales
table.
FULL OUTER JOIN
: retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL
values will be returned for the columns of the non-matching table.
CHS is a multinational company that sells a wide range of products. The company would like to analyze how their product sales are fluctuating on daily basis. Therefore, they wish to calculate the average daily sales for each product over the past month.
You are given a sales
table with the following schema:
sales
Example Input:sales_id | product_id | transaction_date | quantity |
---|---|---|---|
1 | 101 | 2022-10-01 | 10 |
2 | 101 | 2022-10-01 | 15 |
3 | 102 | 2022-10-01 | 8 |
4 | 101 | 2022-10-02 | 20 |
5 | 102 | 2022-10-02 | 12 |
6 | 102 | 2022-10-02 | 10 |
7 | 102 | 2022-10-03 | 5 |
The goal is to write a SQL query that returns a table with two columns: product_id
and average_daily_sales
, where average_daily_sales
is the average number of daily sales of each product over the past month. The results should be sorted by product_id
.
product_id | average_daily_sales |
---|---|
101 | 15.00 |
102 | 11.67 |
SELECT product_id, AVG(quantity) OVER (PARTITION BY product_id ORDER BY transaction_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) as average_daily_sales FROM sales WHERE transaction_date >= CURRENT_DATE - INTERVAL '30 days' ORDER BY product_id;
This query uses a window function to calculate the average number of sales (quantity
) for each product over the past month, partitioning by product_id
and ordering by transaction_date
. The ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
clause is used to calculate the moving average over the past 30 days.
p.s. Window functions show up super often during SQL interviews, so practice the 27+ window function questions on DataLemur
There are several normal forms that define the rules for normalizing a database:
A database is in first normal form (1NF) if it meets the following criteria:
A database is in second normal form (2NF) if it meets the following criteria:
Said another way, to achieve 2NF, besides following all the rules from 1NF all the columns in a given table should be dependent only on that table's primary key.
A database is in third normal form (3NF) if it meets the following criteria:
A transitive dependency means that a piece of data in one column is derived from another column. For example, it wouldn't make sense to keep a column called "user's age" and "user's birthdate" (because age can be derived from birthdate.
While there's also a 4th and 5th normal form, it's too pedantic and hence out-of-scope to know for the CHS SQL interview.
As a data analyst at CHS (Company Holding Superstores), a chain of supermarkets, you have been asked to find out the average monthly sales per product. The company wants to understand the average sales to analyze the performance of each product and make decisions for future product inventory.
Products are sold at several different locations so you have been given two tables sales
and products
.
sales
Example Input:sale_id | product_id | store_id | qty | sale_date |
---|---|---|---|---|
1001 | 1 | 101 | 15 | 01/01/2022 |
1002 | 1 | 102 | 20 | 01/01/2022 |
1003 | 2 | 101 | 30 | 01/01/2022 |
1004 | 3 | 103 | 25 | 01/02/2022 |
1005 | 3 | 101 | 20 | 01/02/2022 |
product
Example Input:product_id | product_name |
---|---|
1 | Apples |
2 | Bananas |
3 | Oranges |
The output should display the month, the product name, and the average sales of each product for that month.
SELECT EXTRACT(MONTH FROM s.sale_date) AS month, p.product_name, AVG(s.qty) AS avg_sales_per_product FROM sales as s JOIN product as p on s.product_id = p.product_id GROUP BY month, p.product_name;
This PostgreSQL query uses the EXTRACT
function to get the month from the sale_date
and the AVG
function to calculate the average quantity sold per product. The information is displayed alongside the product name by joining the sales
and product
tables on product_id
. The results are grouped by month and product name, providing average sales per product for each month.
To practice a very similar question try this interactive Wayfair Y-on-Y Growth Rate Question which is similar for requiring analysis of sales data over time or this Amazon Average Review Ratings Question which is similar for requiring calculation of average metrics related to products.
COALESCE()
do?The COALESCE()
function returns the first non-NULL value from a list of values. This function is often used to replace a NULL with some default value, so that you can then take a SUM()
or AVG()
of some column without NULLs messing things up.
For example, suppose you ran a customer satisfaction survey for CHS and had statements like "I'd buy from CHS again". In the survey, customers would then answer how strongly they agreed with a statement on a scale of 1 to 5 (strongly disagree, disagree, neutral, agree, strongly agree).
Because attention spans are short, many customers skipped many of the questions, and thus our survey data might be filled with NULLs:
customer_id | question_id | agree_scale |
---|---|---|
101 | 1 | 4 |
101 | 2 | 5 |
202 | 1 | 4 |
202 | 2 | NULL |
303 | 1 | 5 |
303 | 2 | NULL |
Before doing further analytics on this customer survey data, you could replace the NULLs in the agree_level
column with the value of 3
(because that corresponds to the default 'neutral' answer) using the COALESCE
function:
SELECT customer_id, question_id, COALESCE(agree_scale, 3) as agree_scale FROM chs_customer_survey;
This would result in the following:
customer_id | question_id | agree_scale |
---|---|---|
101 | 1 | 4 |
101 | 2 | 5 |
202 | 1 | 4 |
202 | 2 | 3 |
303 | 1 | 5 |
303 | 2 | 3 |
For a company, CHS, you are given a table of sales transactions 'sales'. Each entry in the table records the transaction id, the date of the transaction, the product id of the product sold, and the quantity sold. The question is: for each month, which product has the highest total quantity sold?
sales
Example Input:sales
table:transaction_id | transaction_date | product_id | quantity |
---|---|---|---|
101 | 2022-01-05 | 50001 | 3 |
102 | 2022-01-15 | 50001 | 5 |
103 | 2022-01-25 | 50002 | 6 |
104 | 2022-02-05 | 50002 | 7 |
105 | 2022-02-15 | 50001 | 3 |
106 | 2022-03-05 | 50002 | 2 |
output
table:month_yr | top_product | total_quantity |
---|---|---|
2022-01 | 50002 | 6 |
2022-02 | 50002 | 7 |
2022-03 | 50002 | 2 |
The PostgreSQL query to solve this problem is as follows:
WITH monthly_sales AS ( SELECT to_char(transaction_date, 'YYYY-MM') AS month_yr, product_id, SUM(quantity) AS total_quantity FROM sales GROUP BY to_char(transaction_date, 'YYYY-MM'), product_id ), ranked_sales AS ( SELECT month_yr, product_id, total_quantity, rank() OVER ( PARTITION BY month_yr ORDER BY total_quantity DESC ) AS rank FROM monthly_sales ) SELECT month_yr, product_id AS top_product, total_quantity FROM ranked_sales WHERE rank = 1;
This query does the following:
GROUP BY
clause on the transaction_date (converted to month and year format) and the product_id. The total_quantity is calculated using the SUM()
function.rank()
window function. The highest selling product(s) in each month will have a rank of 1.Assume that CHS is an e-commerce company and you are a data analyst there. The company tracks every purchase made by its customers and stores this data in a PostgreSQL database.
The company has two tables: customers
and orders
. The customers
table contains information about the company's customers, and the orders
table contains information about all the orders made by the customers.
Both tables have a customer_id
field, which is the primary key in the customers
table and a foreign key in the orders
table. This key is used to link the two tables.
Given these tables, write a SQL query that calculates the average total monthly sales of each product the company sells. Note that each row in the orders
table represents a single order of a product by a customer.
customers
table:customer_id | first_name | last_name |
---|---|---|
1 | Alice | Thompson |
2 | Bob | Reynolds |
3 | Charlie | Johnson |
4 | David | Williams |
orders
table:order_id | customer_id | order_date | product | total_sale |
---|---|---|---|---|
1 | 1 | 2022-06-01 | Book | 25.00 |
2 | 1 | 2022-06-15 | Headphones | 100.00 |
3 | 2 | 2022-07-01 | Book | 25.00 |
4 | 3 | 2022-07-10 | Laptop | 500.00 |
5 | 4 | 2022-07-20 | Headphones | 150.00 |
6 | 2 | 2022-08-01 | Laptop | 450.00 |
SELECT EXTRACT(MONTH FROM orders.order_date) AS month, orders.product, AVG(orders.total_sale) as average_sale FROM orders GROUP BY month, orders.product ORDER BY month, average_sale;
This SQL query first extracts the month from the order_date
. It then groups the data by month
and product
, and calculates the average total sale for each combination. Finally, it sorts the results by month
(ascending) and average_sale
(descending).
Because join questions come up routinely during SQL interviews, try this interactive Snapchat SQL Interview question using JOINS:
Database normalization is the process of breaking down a table into smaller and more specific tables and defining relationships between them via foreign keys. This minimizes redundancy, and creates a database that's more flexible, scalable, and easier to maintain. It also helps to ensure the integrity of the data by minimizing the risk of data inconsistencies and anomalies.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the CHS SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the above CHS SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like Google, Facebook, Microsoft and Amazon.
Each DataLemur SQL question has multiple hints, step-by-step solutions and most importantly, there is an online SQL coding environment so you can easily right in the browser your query and have it checked.
To prep for the CHS SQL interview it is also wise to solve SQL questions from other food and facilities companies like:
In case your SQL query skills are weak, forget about jumping right into solving questions – strengthen your SQL foundations with this SQL interview tutorial.
This tutorial covers topics including different types of joins and using LIKE – both of which pop up often during SQL job interviews at CHS.
Beyond writing SQL queries, the other types of questions tested in the CHS Data Science Interview are:
To prepare for the CHS Data Science interview have a deep understanding of the company's cultural values – this will be clutch for acing the behavioral interview. For technical interviews prepare by reading Ace the Data Science Interview. The book's got: