3M employees use SQL to analyze manufacturing data, allowing them to pinpoint areas for improving production efficiency and enhancing product quality. They also use it for checking customer databases to predict sales trends, allowing them to make informed decisions about product development and marketing strategies, which is why 3M asks SQL coding questions during interviews for Data Science, Analytics, and Data Engineering jobs.
Thus, to help you prep, here's 9 3M SQL interview questions – can you answer each one?
3M is a multinational conglomerate company that produces over 60,000 products. In our imaginary scenario, let's say they started an online platform where users can purchase their products directly. You are asked to identify the power users of their platform. Power users are determined by two factors: Their purchase frequency and the amount spent. A power user is categorised as someone who has made more than 50 purchases in the last month and has spent more than a total of $1000.
users
Example Input:user_id | username |
---|---|
1001 | user_A |
1002 | user_B |
1003 | user_C |
1004 | user_D |
purchases
Example Input:purchase_id | user_id | purchase_date | product_id | amount_spent |
---|---|---|---|---|
151 | 1001 | 06/08/2022 00:00:00 | 20001 | 30 |
152 | 1002 | 06/10/2022 00:00:00 | 20001 | 40 |
153 | 1001 | 06/18/2022 00:00:00 | 20001 | 40 |
154 | 1002 | 07/26/2022 00:00:00 | 20001 | 40 |
155 | 1003 | 07/05/2022 00:00:00 | 20001 | 20 |
SELECT u.username, COUNT(p.purchase_id) AS purchase_frequency, SUM(p.amount_spent) AS total_spent FROM users u JOIN purchases p ON u.user_id = p.user_id WHERE p.purchase_date BETWEEN (CURRENT_DATE - INTERVAL '1 month') AND CURRENT_DATE GROUP BY u.username HAVING COUNT(p.purchase_id) > 50 AND SUM(p.amount_spent) > 1000;
This SQL query will join the users table with the purchases table on user_id and then filter for purchases made in the last month. It then groups by username and selects users who have made more than 50 purchases and spent more than $1000 in the last month, thus identifying the power users at 3M.
To work on a similar customer analytics SQL question where you can code right in the browser and have your SQL query automatically checked, try this Walmart SQL Interview Question:
Dive into 3M's insights on machine learning and artificial intelligence, showcasing innovative applications that are shaping the future! Learning about 3M's advancements can help you appreciate the transformative impact of technology on improving processes and creating new opportunities.
Given a table of 3M employee salary information, write a SQL query to find the top 3 highest earning employees within each department.
employees
Example Input:employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department
Example Input:department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
Write a SQL query for this problem directly within the browser on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
WITH ranked_salary AS ( SELECT name, salary, department_id, DENSE_RANK() OVER ( PARTITION BY department_id ORDER BY salary DESC) AS ranking FROM employee ) SELECT d.department_name, rs.name, rs.salary FROM ranked_salary AS rs INNER JOIN department AS d ON rs.department_id = d.department_id WHERE rs.ranking <= 3 ORDER BY d.department_id, rs.salary DESC, rs.name ASC;
If the solution above is tough, you can find a detailed solution here: Top 3 Department Salaries.
FOREIGN KEY
constraint?A FOREIGN KEY
is a field in a table that references the PRIMARY KEY
of another table. It creates a link between the two tables and ensures that the data in the FOREIGN KEY
field is valid.
For example, if you have a table of 3M customers and an orders table, the customer_id column in the orders table could be a FOREIGN KEY
that references the id column (which is the primary key) in the 3M customers table.
The FOREIGN KEY
constraint helps maintain the integrity of the data in the database by preventing the insertion of rows in the FOREIGN KEY
table that do not have corresponding entries in the PRIMARY KEY
table. It also enforces the relationship between the two tables and prevents data from being deleted from the PRIMARY KEY
table if it is still being referenced in the FOREIGN KEY
table.
As a data analyst for 3M, your task is to analyze the monthly sales of different products. Here are the details:
sales
table has the following columns: sale_id
(int), product_id
(int), sale_date
(date), quantity
(int).product_id
and then by month
.sales
Sample Input:sale_id | product_id | sale_date | quantity |
---|---|---|---|
101 | 1001 | 05/08/2022 | 50 |
102 | 1002 | 06/08/2022 | 20 |
103 | 1001 | 05/10/2022 | 30 |
104 | 1002 | 06/12/2022 | 40 |
105 | 1003 | 07/08/2022 | 50 |
106 | 1003 | 07/10/2022 | 60 |
107 | 1001 | 05/18/2022 | 40 |
108 | 1002 | 06/22/2022 | 30 |
109 | 1003 | 07/28/2022 | 70 |
month | product_id | total_quantity |
---|---|---|
5 | 1001 | 120 |
6 | 1002 | 90 |
7 | 1003 | 180 |
SELECT EXTRACT(MONTH FROM sale_date) as month, product_id, SUM(quantity) OVER (PARTITION BY product_id, EXTRACT(MONTH FROM sale_date)) as total_quantity FROM sales ORDER BY product_id, month;
This SQL query works by partitioning our sales
table by both product_id
and month of sale_date
. Then, by using the SUM
window function on quantity
, it calculates the total monthly sales quantity of each product. Finally, the resultset is ordered by product_id
and month.
Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur
A non-relational (NoSQL) database is any database that does not use the typical tabular format of rows and columns like in relational databases.
While knowing the four different types of NoSQL databases is beyond the scope of most Data Analyst & Data Science interviews, Data Engineers at 3M should vaguely refresh these concepts:
As part of the product development team at 3M, you are tasked with assessing customer satisfaction for your products based on the star ratings provided in the reviews submitted by the users.
For each product, you are required to filter and record average star ratings on a monthly basis. Only consider reviews where the submit date is in the year 2022 and the user_id is even. Additionally, only include products that have received more than three reviews in that month.
reviews
Example Input:review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 124 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 266 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 2 |
4517 | 982 | 07/05/2022 00:00:00 | 69852 | 1 |
4092 | 334 | 07/15/2022 00:00:00 | 50001 | 3 |
mth | product_id | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 50001 | 3.00 |
SELECT EXTRACT(MONTH FROM submit_date) as mth, product_id, AVG(stars) as avg_stars FROM reviews WHERE EXTRACT(YEAR FROM submit_date) = 2022 AND (user_id % 2) = 0 GROUP BY mth, product_id HAVING COUNT(review_id) > 3 ORDER BY mth, product_id;
This query filters the reviews from the year 2022 and from users with an even user_id
. It calculates the average star rating per product per month, while only including products that received more than three reviews in a given month. The GROUP BY
clause groups the reviews by month and then by product. The HAVING
clause filters out products that have less than four reviews in a given month.
A join in SQL combines rows from two or more tables based on a shared column or set of columns.
Four types of JOINs exist in SQL. To demonstrate each one, say you had a table of 3M orders and 3M customers.
INNER JOIN: When there is a match in the shared key or keys, rows from both tables are retrieved. In this example, an INNER JOIN
between the Orders
and Customers
tables would retrieve rows where the customer_id
in the Orders
table matches the customer_id
in the Customers
table.
LEFT JOIN: A LEFT JOIN
retrieves all rows from the left table (in this case, the Orders
table) and any matching rows from the right table (the Customers
table). If there is no match in the right table, NULL values will be returned for the right table's columns.
RIGHT JOIN: A RIGHT JOIN
retrieves all rows from the right table (in this case, the Customers
table) and any matching rows from the left table (the Orders
table). If there is no match in the left table, NULL values will be returned for the left table's columns.
FULL OUTER JOIN: A FULL OUTER JOIN
retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
As a data analyst at 3M, a multinational conglomerate corporation known for its research and development of industrial and consumer products, you are tasked with finding the average sales per product category across different regions.
Data is organized across two tables: sales_data
and product_categories
.
sales_data
Example Input:sales_id | product_id | region_id | sale_date | units_sold |
---|---|---|---|---|
1115 | 5 | 710 | 08/03/2022 | 27 |
1012 | 2 | 605 | 10/02/2022 | 13 |
2013 | 5 | 206 | 19/04/2022 | 32 |
1910 | 3 | 907 | 13/05/2022 | 18 |
2312 | 4 | 312 | 20/06/2022 | 24 |
product_categories
Example Input:category_id | product_id | category_name |
---|---|---|
522 | 5 | Healthcare |
414 | 2 | Consumer Electronics |
608 | 3 | Safety & Industrial |
905 | 4 | Transportation |
We want to obtain the average units sold per product category, and the output should include the category name and the average units sold, in descending order.
category | average_units_sold |
---|---|
Safety & Industrial | 18.00 |
Transportation | 24.00 |
Consumer Electronics | 13.00 |
Healthcare | 29.50 |
SELECT pc.category_name AS category, AVG(sd.units_sold) AS average_units_sold FROM sales_data sd JOIN product_categories pc ON sd.product_id = pc.product_id GROUP BY pc.category_name ORDER BY average_units_sold DESC;
In this query, we first join the two tables using the product_id
field that is common to both. Then we group the data by category_name
(which represents the product categories) and calculate the average units_sold
for each group. Finally, we order by average_units_sold
in descending order to get the categories with the highest average sales at the top.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for identification of high performing products or this Wayfair Y-on-Y Growth Rate Question which is similar for sales-related analysis.
3M Corporation launched several digital marketing campaigns. The marketing team needs a report to understand the efficacy of these campaigns, particularly the click-through-rate (CTR). The CTR is defined as the number of unique users who clicked an ad divided by the number of unique users to whom the ad was shown. Note that users might see the same ad multiple times in the campaign.
You are provided with two tables. The ads_displayed
table logs every instance in which an ad was served to a user. The ads_clicked
table logs every instance in which a user clicked on an ad.
ads_displayed
Example Input:ad_id | user_id | display_date |
---|---|---|
907 | 543 | 07/01/2022 |
832 | 112 | 07/02/2022 |
872 | 306 | 07/02/2022 |
832 | 552 | 07/10/2022 |
907 | 552 | 07/15/2022 |
ads_clicked
Example Input:ad_id | user_id | click_date |
---|---|---|
907 | 543 | 07/02/2022 |
832 | 112 | 07/06/2022 |
872 | 112 | 07/11/2022 |
907 | 552 | 07/18/2022 |
Write a PostgreSQL query to compute the CTR for each ad in July 2022.
WITH displayed AS ( SELECT ad_id, COUNT(DISTINCT user_id) AS num_displays FROM ads_displayed WHERE DATE_PART('month', display_date) = 7 AND DATE_PART('year', display_date) = 2022 GROUP BY ad_id ), clicked AS ( SELECT ad_id, COUNT(DISTINCT user_id) AS num_clicks FROM ads_clicked WHERE DATE_PART('month', click_date) = 7 AND DATE_PART('year', click_date) = 2022 GROUP BY ad_id ) SELECT displayed.ad_id, displayed.num_displays, COALESCE(clicked.num_clicks, 0) AS num_clicks, (COALESCE(clicked.num_clicks, 0)::decimal / displayed.num_displays) AS ctr FROM displayed LEFT JOIN clicked ON clicked.ad_id = displayed.ad_id;
This query first creates two subqueries: displayed
and clicked
, aggregating the count of distinct users for each ad from ads_displayed
and ads_clicked
tables, respectively, for the month of July in 2022.
The main query then left joins clicked
onto displayed
on ad_id
, ensuring all ads that were displayed (even if not clicked) are included. It calculates the CTR by dividing the number of clicks by the number of displays, using the COALESCE
function to treat null values (i.e., if an ad was displayed but not clicked) as zeroes in computing the CTR.
To solve a similar SQL problem on DataLemur's free interactive SQL code editor, try this Facebook SQL Interview question:
The best way to prepare for a 3M SQL interview is to practice, practice, practice. In addition to solving the earlier 3M SQL interview questions, you should also solve the 200+ DataLemur interview questions which come from companies like tech companies and industrial chemical companies like 3M.
Each problem on DataLemur has multiple hints, step-by-step solutions and most importantly, there's an interactive coding environment so you can instantly run your query and have it graded.
To prep for the 3M SQL interview you can also be wise to solve SQL problems from other industrial chemical companies like:
However, if your SQL foundations are weak, don't worry about jumping right into solving questions – go learn SQL with this free SQL tutorial.
This tutorial covers SQL topics like removing NULLs and joining a table to itself – both of these show up routinely in SQL interviews at 3M.
Besides SQL interview questions, the other question categories to practice for the 3M Data Science Interview include:
To prepare for 3M Data Science interviews read the book Ace the Data Science Interview because it's got:
Don't forget about the behavioral interview – prepare for it using this guide on behavioral interview questions.