Data Scientists, Analysts, and Data Engineers at CHS use SQL queries to extract agricultural data for predictive models, helping them forecast trends and make informed decisions. It is also essential for managing stock inventory databases, allowing them to improve supply chain processes and ensure efficient resource allocation, for this reason, CHS asks jobseekers SQL interview questions.
So, to help you prep, here's 10 CHS SQL interview questions – able to solve them?
CHS is an ecommerce platform that sells various products. They define their "Whale" customers as those who have purchased at least 10 different products, and who have spent more than $2000 in total, within the last year.
Write a SQL query to identify these high value customers. Assume we have access to the following tables:
order_id | user_id | order_date | total_cost |
---|---|---|---|
101 | 123 | 2021-10-15 | $500 |
102 | 485 | 2022-01-07 | $100 |
103 | 123 | 2022-02-16 | $300 |
104 | 266 | 2022-03-05 | $50 |
105 | 123 | 2022-08-10 | $1200 |
order_item_id | order_id | product_id |
---|---|---|
1 | 101 | 1001 |
2 | 101 | 1002 |
3 | 102 | 1003 |
4 | 103 | 1002 |
5 | 103 | 1004 |
6 | 105 | 1005 |
7 | 105 | 1006 |
8 | 105 | 1001 |
9 | 105 | 1007 |
10 | 105 | 1008 |
11 | 105 | 1009 |
This query first filters orders that were placed within the last year. We then join with to connect purchased products with users. The clause groups the data by user ids. The clause then filters out the 'whale' users, i.e., users who have purchased at least 10 different products (as indicated by ) and have spent more than $2000 (as indicated by ). This query will return a list of user_ids who qualify as 'whale' customers.
To work on another SQL customer analytics question where you can solve it interactively and have your SQL query automatically checked, try this Walmart SQL Interview Question:
Given a table of CHS employee salary data, write a SQL query to find employees who make more than their own boss.
employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns $8,000, surpassing her manager, William Davis who earns 7,800.
Code your solution to this interview question and run your code right in DataLemur's online SQL environment:
First, we perform a SELF-JOIN where we treat the first table () as the managers' table and the second table () as the employees' table. Then we use a clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
If the solution above is tough, you can find a detailed solution with hints here: Employee Salaries Higher Than Their Manager.
An inner join only includes rows from both tables that satisfy the join condition, whereas a full outer join includes all rows from both tables, regardless of whether they match the join condition or not.
For a concrete example, imagine you had two database tables, an advertising campaigns table which had information on Google Ads keywords and how much was bid for each keyword, and a CHS sales table, which has data on how many products were sold and which Google Ads keyword drove that sale.
: retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the table and the table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the table matches the keyword in the table.
: retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, values will be returned for the columns of the non-matching table.
CHS is a multinational company that sells a wide range of products. The company would like to analyze how their product sales are fluctuating on daily basis. Therefore, they wish to calculate the average daily sales for each product over the past month.
You are given a table with the following schema:
sales_id | product_id | transaction_date | quantity |
---|---|---|---|
1 | 101 | 2022-10-01 | 10 |
2 | 101 | 2022-10-01 | 15 |
3 | 102 | 2022-10-01 | 8 |
4 | 101 | 2022-10-02 | 20 |
5 | 102 | 2022-10-02 | 12 |
6 | 102 | 2022-10-02 | 10 |
7 | 102 | 2022-10-03 | 5 |
The goal is to write a SQL query that returns a table with two columns: and , where is the average number of daily sales of each product over the past month. The results should be sorted by .
product_id | average_daily_sales |
---|---|
101 | 15.00 |
102 | 11.67 |
This query uses a window function to calculate the average number of sales () for each product over the past month, partitioning by and ordering by . The clause is used to calculate the moving average over the past 30 days.
p.s. Window functions show up super often during SQL interviews, so practice the 27+ window function questions on DataLemur
There are several normal forms that define the rules for normalizing a database:
A database is in first normal form (1NF) if it meets the following criteria:
A database is in second normal form (2NF) if it meets the following criteria:
Said another way, to achieve 2NF, besides following all the rules from 1NF all the columns in a given table should be dependent only on that table's primary key.
A database is in third normal form (3NF) if it meets the following criteria:
A transitive dependency means that a piece of data in one column is derived from another column. For example, it wouldn't make sense to keep a column called "user's age" and "user's birthdate" (because age can be derived from birthdate.
While there's also a 4th and 5th normal form, it's too pedantic and hence out-of-scope to know for the CHS SQL interview.
As a data analyst at CHS (Company Holding Superstores), a chain of supermarkets, you have been asked to find out the average monthly sales per product. The company wants to understand the average sales to analyze the performance of each product and make decisions for future product inventory.
Products are sold at several different locations so you have been given two tables and .
sale_id | product_id | store_id | qty | sale_date |
---|---|---|---|---|
1001 | 1 | 101 | 15 | 01/01/2022 |
1002 | 1 | 102 | 20 | 01/01/2022 |
1003 | 2 | 101 | 30 | 01/01/2022 |
1004 | 3 | 103 | 25 | 01/02/2022 |
1005 | 3 | 101 | 20 | 01/02/2022 |
product_id | product_name |
---|---|
1 | Apples |
2 | Bananas |
3 | Oranges |
The output should display the month, the product name, and the average sales of each product for that month.
This PostgreSQL query uses the function to get the month from the and the function to calculate the average quantity sold per product. The information is displayed alongside the product name by joining the and tables on . The results are grouped by month and product name, providing average sales per product for each month.
To practice a very similar question try this interactive Wayfair Y-on-Y Growth Rate Question which is similar for requiring analysis of sales data over time or this Amazon Average Review Ratings Question which is similar for requiring calculation of average metrics related to products.
The function returns the first non-NULL value from a list of values. This function is often used to replace a NULL with some default value, so that you can then take a or of some column without NULLs messing things up.
For example, suppose you ran a customer satisfaction survey for CHS and had statements like "I'd buy from CHS again". In the survey, customers would then answer how strongly they agreed with a statement on a scale of 1 to 5 (strongly disagree, disagree, neutral, agree, strongly agree).
Because attention spans are short, many customers skipped many of the questions, and thus our survey data might be filled with NULLs:
customer_id | question_id | agree_scale |
---|---|---|
101 | 1 | 4 |
101 | 2 | 5 |
202 | 1 | 4 |
202 | 2 | NULL |
303 | 1 | 5 |
303 | 2 | NULL |
Before doing further analytics on this customer survey data, you could replace the NULLs in the column with the value of (because that corresponds to the default 'neutral' answer) using the function:
This would result in the following:
customer_id | question_id | agree_scale |
---|---|---|
101 | 1 | 4 |
101 | 2 | 5 |
202 | 1 | 4 |
202 | 2 | 3 |
303 | 1 | 5 |
303 | 2 | 3 |
For a company, CHS, you are given a table of sales transactions 'sales'. Each entry in the table records the transaction id, the date of the transaction, the product id of the product sold, and the quantity sold. The question is: for each month, which product has the highest total quantity sold?
transaction_id | transaction_date | product_id | quantity |
---|---|---|---|
101 | 2022-01-05 | 50001 | 3 |
102 | 2022-01-15 | 50001 | 5 |
103 | 2022-01-25 | 50002 | 6 |
104 | 2022-02-05 | 50002 | 7 |
105 | 2022-02-15 | 50001 | 3 |
106 | 2022-03-05 | 50002 | 2 |
month_yr | top_product | total_quantity |
---|---|---|
2022-01 | 50002 | 6 |
2022-02 | 50002 | 7 |
2022-03 | 50002 | 2 |
The PostgreSQL query to solve this problem is as follows:
This query does the following:
Assume that CHS is an e-commerce company and you are a data analyst there. The company tracks every purchase made by its customers and stores this data in a PostgreSQL database.
The company has two tables: and . The table contains information about the company's customers, and the table contains information about all the orders made by the customers.
Both tables have a field, which is the primary key in the table and a foreign key in the table. This key is used to link the two tables.
Given these tables, write a SQL query that calculates the average total monthly sales of each product the company sells. Note that each row in the table represents a single order of a product by a customer.
customer_id | first_name | last_name |
---|---|---|
1 | Alice | Thompson |
2 | Bob | Reynolds |
3 | Charlie | Johnson |
4 | David | Williams |
order_id | customer_id | order_date | product | total_sale |
---|---|---|---|---|
1 | 1 | 2022-06-01 | Book | 25.00 |
2 | 1 | 2022-06-15 | Headphones | 100.00 |
3 | 2 | 2022-07-01 | Book | 25.00 |
4 | 3 | 2022-07-10 | Laptop | 500.00 |
5 | 4 | 2022-07-20 | Headphones | 150.00 |
6 | 2 | 2022-08-01 | Laptop | 450.00 |
This SQL query first extracts the month from the . It then groups the data by and , and calculates the average total sale for each combination. Finally, it sorts the results by (ascending) and (descending).
Because join questions come up routinely during SQL interviews, try this interactive Snapchat SQL Interview question using JOINS:
Database normalization is the process of breaking down a table into smaller and more specific tables and defining relationships between them via foreign keys. This minimizes redundancy, and creates a database that's more flexible, scalable, and easier to maintain. It also helps to ensure the integrity of the data by minimizing the risk of data inconsistencies and anomalies.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the CHS SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the above CHS SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like Google, Facebook, Microsoft and Amazon.
Each DataLemur SQL question has multiple hints, step-by-step solutions and most importantly, there is an online SQL coding environment so you can easily right in the browser your query and have it checked.
To prep for the CHS SQL interview it is also wise to solve SQL questions from other food and facilities companies like:
In case your SQL query skills are weak, forget about jumping right into solving questions – strengthen your SQL foundations with this SQL interview tutorial.
This tutorial covers topics including different types of joins and using LIKE – both of which pop up often during SQL job interviews at CHS.
Beyond writing SQL queries, the other types of questions tested in the CHS Data Science Interview are:
To prepare for the CHS Data Science interview have a deep understanding of the company's cultural values – this will be clutch for acing the behavioral interview. For technical interviews prepare by reading Ace the Data Science Interview. The book's got: