At M3, Inc, SQL is used all the damn time for extracting critical patient data from various databases, and analyzing medical research data from the m3 platform. Because of this, M3 LOVES to ask SQL questions in interviews for Data Analyst, Data Science, and BI jobs.
As such, to help you study for the M3 SQL interview, we've curated 9 M3, Inc SQL interview questions – able to answer them all?
Company M3 has a database of users who frequently perform purchases and leave reviews. M3 categorizes its power users as those who have purchased and reviewed at least 5 distinct products in the last month. For the purpose of this question, please identify these power users.
purchase_id | user_id | purchase_date | product_id |
---|---|---|---|
1123 | 234 | 06/08/2022 00:00:00 | 50001 |
2890 | 234 | 06/10/2022 00:00:00 | 69852 |
3542 | 234 | 06/18/2022 00:00:00 | 50002 |
4551 | 234 | 06/20/2022 00:00:00 | 59686 |
5712 | 234 | 06/22/2022 00:00:00 | 51142 |
6212 | 456 | 06/11/2022 00:00:00 | 53890 |
6542 | 456 | 06/12/2022 00:00:00 | 50104 |
7952 | 789 | 06/30/2022 00:00:00 | 51235 |
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 234 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 234 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 234 | 06/18/2022 00:00:00 | 50002 | 3 |
4517 | 456 | 07/05/2022 00:00:00 | 69852 | 2 |
7564 | 234 | 06/20/2022 00:00:00 | 59686 | 4 |
8345 | 234 | 06/22/2022 00:00:00 | 51142 | 3 |
This query works by joining the purchases and reviews tables on the user_id and product_id fields to ensure we only consider products that have been both purchased and reviewed by the same user. It then filters for purchases made in the last month using the date_trunc function to get the first of the last month and comparing it to the purchase_date. The GROUP BY and HAVING clauses are used to find users who have purchased and reviewed at least 5 distinct products.
To practice a related customer analytics question on DataLemur's free online SQL coding environment, try this Microsoft SQL Interview problem:
For a subscription-based company, M3, we have a dataset of all transactions, including both purchases and refunds. The schema of the table named is as follows:
transaction_id | product_id | transaction_date | user_id | amount_paid |
---|---|---|---|---|
1234 | 101 | 01/05/2022 | 1515 | 10.00 |
1235 | 102 | 01/15/2022 | 1516 | -5.00 |
1236 | 101 | 02/01/2022 | 1517 | 10.00 |
1237 | 102 | 02/10/2022 | 1518 | 15.00 |
1238 | 101 | 03/01/2022 | 1519 | -10.00 |
Write a SQL query to find the average amount of revenue generated per product per month. The revenue from each product is the sum of the field for that product, taking into account both positive values (revenue from purchases) and negative values (refunds).
The output should contain the following columns:
Arrange the output by and in ascending order.
product_id | month | average_monthly_revenue |
---|---|---|
101 | 1 | 10.00 |
101 | 2 | 10.00 |
101 | 3 | -10.00 |
102 | 1 | -5.00 |
102 | 2 | 15.00 |
This query uses a window function, specifically , to calculate the average monthly income for each product. The clause is used to form separate groups for each combination of product and month. The function then operates on each of these partitions separately, computing the average within each one. The clause sorts the results first by , then by , both in ascending order.
For more window function practice, try this Uber SQL problem within DataLemur's interactive SQL code editor:
A primary key is a column (or set of columns) in a table that uniquely identifies each row in the table. It cannot contain null values and must be unique across all rows in the table.
A foreign key is a column (or set of columns) in a table that references the primary key of another table. It is used to establish a relationship between the two tables. A foreign key can contain null values, and multiple rows in the referencing table can reference the same row in the referenced table.
For example, consider a database with two tables: and . The M3 customers table might have a primary key column called , while the M3 orders table might have a foreign key column called that references the column in the table. This establishes a relationship between the two tables, such that each row in the orders table corresponds to a specific M3 customer.
You are a data analyst at M3, an e-commerce company. The company is interested in promoting its repeat customers, users who have made more than one purchase in a specific month. The marketing department has a budget for a promotional campaign for customers who purchased a product more than once in the month of August 2022.
Your task is to generate a report that shows customers who qualify for the promotion. The report should contain customer_id, product_id, and the number of purchases for the products purchased more than once by the customer in August.
Here is the table for your reference.
purchase_id | customer_id | purchase_date | product_id |
---|---|---|---|
1234 | 6789 | 08/08/2022 00:00:00 | 50001 |
9876 | 6789 | 08/15/2022 00:00:00 | 50001 |
4567 | 6543 | 08/18/2022 00:00:00 | 69852 |
7645 | 8765 | 08/26/2022 00:00:00 | 69852 |
1726 | 6789 | 08/28/2022 00:00:00 | 69852 |
Your task is to write a PostgreSQL query that generates the desired output.
This query filters the table for records in August 2022, groups them by customer and product, and includes only those groups where the count (the number of purchases) is more than 1. It then returns the , , and (the number of times each product was purchased by each customer in August 2022). The results are ordered by and to make it more readable.
customer_id | product_id | number_of_purchases |
---|---|---|
6789 | 50001 | 2 |
6789 | 69852 | 1 |
Both window functions are used to find rows at a given offset from the current row. However, will give you the rows AFTER the current row you. On the other hand, will give you the rows BEFORE the current row.
M3 is a company with various departments and multiple employees, and they want to determine how they distribute their salaries. They are interested to find out the average salary per department.
Here is how you can model this problem:
employee_id | first_name | last_name | department_id | salary |
---|---|---|---|---|
1 | John | Doe | 101 | 75000 |
2 | Jane | Doe | 102 | 80000 |
3 | Jim | Smith | 103 | 90000 |
4 | Jill | Johnson | 101 | 85000 |
5 | Jake | Williams | 102 | 95000 |
department_id | department_name |
---|---|
101 | IT |
102 | Marketing |
103 | Sales |
We would like to find out the average salary per department. For each department, please provide its name and the average salary of its employees.
department_name | avg_salary |
---|---|
IT | 80000 |
Marketing | 87500 |
Sales | 90000 |
This query joins the employees and departments tables on the department_id field. It then groups the result by department_name and calculates the average (AVG) salary in each group. The result is a list of departments and their corresponding average salaries.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for requiring grouping and aggregate calculations or this Wayfair Y-on-Y Growth Rate Question which is similar for requiring calculations based on different groups.
The primary key of a table is a column or set of columns that serves as a unique identifier for each row. It ensures that all rows are distinct and does not allow null values.
For example, say you had stored some Facebook ad campaign data that M3 ran:
The column uniquely identifies each row in the table, and the PRIMARY KEY constraint ensures that no two rows have the same . This helps to maintain the integrity of the data in the table by preventing duplicate rows.
The primary key is also an important part of the table because it allows you to easily identify and reference specific campaigns in your Facebook Ad data. You can use it to join to other tables in the database, such as a table containing data on the results of the campaigns.
Suppose you work at M3, a company that sells various types of mobile phones. Your manager wants to understand more about the sales details of these devices and asks you to answer the following question: "For each individual mobile device sold by M3, can you calculate two figures - the total sales quantity and the average sale price? Please provide this analysis in a ordered result table, with the highest quantity sold at the top."
Here's some mock data for the table:
sales_id | mobile_id | sales_date | price |
---|---|---|---|
1 | 101 | 06/01/2021 | 250 |
2 | 102 | 06/20/2021 | 300 |
3 | 101 | 06/30/2021 | 250 |
4 | 103 | 07/05/2021 | 400 |
5 | 102 | 07/10/2021 | 300 |
6 | 101 | 07/30/2021 | 250 |
We want to transform this data into the following output:
mobile_id | total_sales_qty | avg_price |
---|---|---|
101 | 3 | 250.00 |
102 | 2 | 300.00 |
103 | 1 | 400.00 |
Please answer this query using PostgreSQL.
This SQL script will count the total sales quantity for each mobile phone and calculate the average price. The result is returned in order of descending total sales quantity, which means the mobile ID with the most sales quantity will be at the top.
Description: You are working as a data analyst at the company, M3. The company has a database table called which contains customer records. The columns in the table represent the customer ID, first name, last name, email and the registered date. In the following question, find all the customer records where the customer's first name starts with the letter 'M' and was registered after '2020-01-01'.
customer_id | first_name | last_name | registered_date | |
---|---|---|---|---|
11 | Martin | Perez | martin.perez@domain.com | 2021-01-15 |
21 | Mabel | Lopez | mabel.lopez@domain.com | 2020-02-10 |
35 | John | Smith | john.smith@domain.com | 2021-06-20 |
42 | Mario | Rossi | mario.rossi@domain.com | 2019-11-01 |
56 | Harold | Finch | harold.finch@domain.com | 2020-09-10 |
customer_id | first_name | last_name | registered_date | |
---|---|---|---|---|
11 | Martin | Perez | martin.perez@domain.com | 2021-01-15 |
21 | Mabel | Lopez | mabel.lopez@domain.com | 2020-02-10 |
This query uses the LIKE keyword in SQL to filter out customers whose first name starts with 'M'. The 'M%' indicates that we are looking for any records where the first name starts with 'M' and is followed by any sequence of characters. The AND keyword is used to also include the condition that the registered date must be after '2020-01-01'. This query will return all columns from the table for records that meet these criteria.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the M3 SQL interview is to solve as many practice SQL interview questions as you can! In addition to solving the earlier M3 SQL interview questions, you should also solve the 200+ SQL Interview Questions on DataLemur which come from companies like FAANG and tech startups.
Each exercise has multiple hints, step-by-step solutions and best of all, there's an online SQL coding environment so you can easily right in the browser your query and have it executed.
To prep for the M3 SQL interview you can also be helpful to solve interview questions from other tech companies like:
But if your SQL skills are weak, forget about jumping right into solving questions – refresh your SQL knowledge with this free SQL tutorial.
This tutorial covers SQL concepts such as ordering data and CASE/WHEN/ELSE statements – both of these come up frequently in SQL job interviews at M3.
Beyond writing SQL queries, the other topics covered in the M3 Data Science Interview are:
The best way to prepare for M3 Data Science interviews is by reading Ace the Data Science Interview. The book's got: