3M employees use SQL to analyze manufacturing data, allowing them to pinpoint areas for improving production efficiency and enhancing product quality. They also use it for checking customer databases to predict sales trends, allowing them to make informed decisions about product development and marketing strategies, which is why 3M asks SQL coding questions during interviews for Data Science, Analytics, and Data Engineering jobs.
Thus, to help you prep, here's 9 3M SQL interview questions – can you answer each one?
3M is a multinational conglomerate company that produces over 60,000 products. In our imaginary scenario, let's say they started an online platform where users can purchase their products directly. You are asked to identify the power users of their platform. Power users are determined by two factors: Their purchase frequency and the amount spent. A power user is categorised as someone who has made more than 50 purchases in the last month and has spent more than a total of $1000.
user_id | username |
---|---|
1001 | user_A |
1002 | user_B |
1003 | user_C |
1004 | user_D |
purchase_id | user_id | purchase_date | product_id | amount_spent |
---|---|---|---|---|
151 | 1001 | 06/08/2022 00:00:00 | 20001 | 30 |
152 | 1002 | 06/10/2022 00:00:00 | 20001 | 40 |
153 | 1001 | 06/18/2022 00:00:00 | 20001 | 40 |
154 | 1002 | 07/26/2022 00:00:00 | 20001 | 40 |
155 | 1003 | 07/05/2022 00:00:00 | 20001 | 20 |
This SQL query will join the users table with the purchases table on user_id and then filter for purchases made in the last month. It then groups by username and selects users who have made more than 50 purchases and spent more than $1000 in the last month, thus identifying the power users at 3M.
To work on a similar customer analytics SQL question where you can code right in the browser and have your SQL query automatically checked, try this Walmart SQL Interview Question:
Dive into 3M's insights on machine learning and artificial intelligence, showcasing innovative applications that are shaping the future! Learning about 3M's advancements can help you appreciate the transformative impact of technology on improving processes and creating new opportunities.
Given a table of 3M employee salary information, write a SQL query to find the top 3 highest earning employees within each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
Write a SQL query for this problem directly within the browser on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
If the solution above is tough, you can find a detailed solution here: Top 3 Department Salaries.
A is a field in a table that references the of another table. It creates a link between the two tables and ensures that the data in the field is valid.
For example, if you have a table of 3M customers and an orders table, the customer_id column in the orders table could be a that references the id column (which is the primary key) in the 3M customers table.
The constraint helps maintain the integrity of the data in the database by preventing the insertion of rows in the table that do not have corresponding entries in the table. It also enforces the relationship between the two tables and prevents data from being deleted from the table if it is still being referenced in the table.
As a data analyst for 3M, your task is to analyze the monthly sales of different products. Here are the details:
sale_id | product_id | sale_date | quantity |
---|---|---|---|
101 | 1001 | 05/08/2022 | 50 |
102 | 1002 | 06/08/2022 | 20 |
103 | 1001 | 05/10/2022 | 30 |
104 | 1002 | 06/12/2022 | 40 |
105 | 1003 | 07/08/2022 | 50 |
106 | 1003 | 07/10/2022 | 60 |
107 | 1001 | 05/18/2022 | 40 |
108 | 1002 | 06/22/2022 | 30 |
109 | 1003 | 07/28/2022 | 70 |
month | product_id | total_quantity |
---|---|---|
5 | 1001 | 120 |
6 | 1002 | 90 |
7 | 1003 | 180 |
This SQL query works by partitioning our table by both and month of . Then, by using the window function on , it calculates the total monthly sales quantity of each product. Finally, the resultset is ordered by and month.
Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur
A non-relational (NoSQL) database is any database that does not use the typical tabular format of rows and columns like in relational databases.
While knowing the four different types of NoSQL databases is beyond the scope of most Data Analyst & Data Science interviews, Data Engineers at 3M should vaguely refresh these concepts:
As part of the product development team at 3M, you are tasked with assessing customer satisfaction for your products based on the star ratings provided in the reviews submitted by the users.
For each product, you are required to filter and record average star ratings on a monthly basis. Only consider reviews where the submit date is in the year 2022 and the user_id is even. Additionally, only include products that have received more than three reviews in that month.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 124 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 266 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 2 |
4517 | 982 | 07/05/2022 00:00:00 | 69852 | 1 |
4092 | 334 | 07/15/2022 00:00:00 | 50001 | 3 |
mth | product_id | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 50001 | 3.00 |
This query filters the reviews from the year 2022 and from users with an even . It calculates the average star rating per product per month, while only including products that received more than three reviews in a given month. The clause groups the reviews by month and then by product. The clause filters out products that have less than four reviews in a given month.
A join in SQL combines rows from two or more tables based on a shared column or set of columns.
Four types of JOINs exist in SQL. To demonstrate each one, say you had a table of 3M orders and 3M customers.
INNER JOIN: When there is a match in the shared key or keys, rows from both tables are retrieved. In this example, an between the and tables would retrieve rows where the in the table matches the in the table.
LEFT JOIN: A retrieves all rows from the left table (in this case, the table) and any matching rows from the right table (the table). If there is no match in the right table, NULL values will be returned for the right table's columns.
RIGHT JOIN: A retrieves all rows from the right table (in this case, the table) and any matching rows from the left table (the table). If there is no match in the left table, NULL values will be returned for the left table's columns.
FULL OUTER JOIN: A retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
As a data analyst at 3M, a multinational conglomerate corporation known for its research and development of industrial and consumer products, you are tasked with finding the average sales per product category across different regions.
Data is organized across two tables: and .
sales_id | product_id | region_id | sale_date | units_sold |
---|---|---|---|---|
1115 | 5 | 710 | 08/03/2022 | 27 |
1012 | 2 | 605 | 10/02/2022 | 13 |
2013 | 5 | 206 | 19/04/2022 | 32 |
1910 | 3 | 907 | 13/05/2022 | 18 |
2312 | 4 | 312 | 20/06/2022 | 24 |
category_id | product_id | category_name |
---|---|---|
522 | 5 | Healthcare |
414 | 2 | Consumer Electronics |
608 | 3 | Safety & Industrial |
905 | 4 | Transportation |
We want to obtain the average units sold per product category, and the output should include the category name and the average units sold, in descending order.
category | average_units_sold |
---|---|
Safety & Industrial | 18.00 |
Transportation | 24.00 |
Consumer Electronics | 13.00 |
Healthcare | 29.50 |
In this query, we first join the two tables using the field that is common to both. Then we group the data by (which represents the product categories) and calculate the average for each group. Finally, we order by in descending order to get the categories with the highest average sales at the top.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for identification of high performing products or this Wayfair Y-on-Y Growth Rate Question which is similar for sales-related analysis.
3M Corporation launched several digital marketing campaigns. The marketing team needs a report to understand the efficacy of these campaigns, particularly the click-through-rate (CTR). The CTR is defined as the number of unique users who clicked an ad divided by the number of unique users to whom the ad was shown. Note that users might see the same ad multiple times in the campaign.
You are provided with two tables. The table logs every instance in which an ad was served to a user. The table logs every instance in which a user clicked on an ad.
ad_id | user_id | display_date |
---|---|---|
907 | 543 | 07/01/2022 |
832 | 112 | 07/02/2022 |
872 | 306 | 07/02/2022 |
832 | 552 | 07/10/2022 |
907 | 552 | 07/15/2022 |
ad_id | user_id | click_date |
---|---|---|
907 | 543 | 07/02/2022 |
832 | 112 | 07/06/2022 |
872 | 112 | 07/11/2022 |
907 | 552 | 07/18/2022 |
Write a PostgreSQL query to compute the CTR for each ad in July 2022.
This query first creates two subqueries: and , aggregating the count of distinct users for each ad from and tables, respectively, for the month of July in 2022.
The main query then left joins onto on , ensuring all ads that were displayed (even if not clicked) are included. It calculates the CTR by dividing the number of clicks by the number of displays, using the function to treat null values (i.e., if an ad was displayed but not clicked) as zeroes in computing the CTR.
To solve a similar SQL problem on DataLemur's free interactive SQL code editor, try this Facebook SQL Interview question:
The best way to prepare for a 3M SQL interview is to practice, practice, practice. In addition to solving the earlier 3M SQL interview questions, you should also solve the 200+ DataLemur interview questions which come from companies like tech companies and industrial chemical companies like 3M.
Each problem on DataLemur has multiple hints, step-by-step solutions and most importantly, there's an interactive coding environment so you can instantly run your query and have it graded.
To prep for the 3M SQL interview you can also be wise to solve SQL problems from other industrial chemical companies like:
However, if your SQL foundations are weak, don't worry about jumping right into solving questions – go learn SQL with this free SQL tutorial.
This tutorial covers SQL topics like removing NULLs and joining a table to itself – both of these show up routinely in SQL interviews at 3M.
Besides SQL interview questions, the other question categories to practice for the 3M Data Science Interview include:
To prepare for 3M Data Science interviews read the book Ace the Data Science Interview because it's got:
Don't forget about the behavioral interview – prepare for it using this guide on behavioral interview questions.