At McKinsey, SQL is used to extract and manipulate client data on sales performance and market trends for analytics projects, as well as to build data models that inform business strategy decisions by identifying areas for improvement and growth opportunities. This is why McKinsey includes SQL questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.
As such, to help you practice for the McKinsey SQL interview, we'll cover 11 McKinsey SQL interview questions in this article.
You’re a consultant for a major pizza chain that will be running a promotion where all 3-topping pizzas will be sold for a fixed price, and are trying to understand the costs involved.
Given a list of pizza toppings, consider all the possible 3-topping pizzas, and print out the total cost of those 3 toppings. Sort the results with the highest total cost on the top followed by pizza toppings in ascending order.
Break ties by listing the ingredients in alphabetical order, starting from the first ingredient, followed by the second and third.
Column Name | Type |
---|---|
topping_name | varchar(255) |
ingredient_cost | decimal(10,2) |
topping_name | ingredient_cost |
---|---|
Pepperoni | 0.50 |
Sausage | 0.70 |
Chicken | 0.55 |
Extra Cheese | 0.40 |
pizza | total_cost |
---|---|
Chicken,Pepperoni,Sausage | 1.75 |
Chicken,Extra Cheese,Sausage | 1.65 |
Extra Cheese,Pepperoni,Sausage | 1.60 |
Chicken,Extra Cheese,Pepperoni | 1.45 |
To see not just one but TWO additional solutions try out this McKinsey SQL Interview Question on DataLemur!
Given a table of McKinsey employee salary information, write a SQL query to find the 2nd highest salary at the company.
employee_id | salary |
---|---|
1 | 2500 |
2 | 800 |
3 | 1000 |
4 | 1200 |
second_highest_salary |
---|
1200 |
Code your solution to this question directly within the browser on DataLemur:
You can find a step-by-step solution here: 2nd Highest Salary
A correlated sub-query is a sub-query that depends on the outer query and cannot be executed separately. It uses the outer query to filter or transform data by referencing a column from it, and the outer query uses the results of the inner query. In contrast, a non-correlated sub-query is independent of the outer query and can be executed on its own. It does not reference any columns from the outer query and is used to retrieve additional data needed by the outer query.
Here is an example of a correlated sub-query:
This query selects the and total sales of all McKinsey customers in the sales table whose are greater than the average of their own customer group. The sub-query in this case is correlated with the outer query, as it references the column from the outer query ().
Here is an example of a non-correlated sub-query:
The sub-query in this case is non-correlated, as it does not reference any columns from the outer query.
Performance-wise, correlated sub-queries are generally slower to execute, as they have to be re-evaluated for each row of the outer query, while non-correlated sub-queries are faster, as they only have to be executed once.
As an analyst at McKinsey, you are asked to analyze McKinsey's hypothetical online platform where different products are reviewed by users. Your task is to write a SQL query using window functions that returns the average user rating of each product for each month.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
This query first groups the data by product and month with . Then, it uses the function to calculate the average number of stars for each product each month. The function truncates the submission date to the month (making all days and lower units same within each month) making it easier to group by month. The clause orders the result by month and product for better readability.
mth | product | avg_stars |
---|---|---|
2022-06-01 00:00:00 | 50001 | 3.50 |
2022-06-01 00:00:00 | 69852 | 4.00 |
2022-07-01 00:00:00 | 69852 | 2.50 |
p.s. Window functions show up pretty frequently during SQL interviews, so practice the 27+ window function questions on DataLemur
One creative way is to use the window function ROW_NUMBER() and partition by whichver columns you are loooking for duplicates in. If any row has a row_number() more than 1, then it's a duplicate!
You could also use COUNT(DISTINCT col1) vs. COUNT(*) and see if they return different numbers, indicating the prescence of duplicates!
McKinsey, being a global management consulting firm, conducts multiple consulting projects in various industries for many clients worldwide. The firm wishes to analyze its project data to guide decision-making processes. It aims to answer questions such as - What is the total revenue earned from different industries? Which clients have been with us for the longest duration and what is their total spending?
Assume that the McKinsey database has two tables:
Here is the sample data for our problem:
project_id | consultant_id | client_id | startDate | endDate | revenue |
---|---|---|---|---|---|
1001 | 135 | 651 | 2019-01-15 | 2019-04-15 | 50000 |
1002 | 246 | 193 | 2020-05-01 | 2020-08-30 | 40000 |
1003 | 135 | 651 | 2020-09-15 | 2021-01-15 | 55000 |
1004 | 246 | 193 | 2021-02-01 | 2021-07-01 | 42000 |
1005 | 135 | 910 | 2021-07-15 | 2022-01-15 | 48000 |
client_id | name | industry |
---|---|---|
651 | Company A | Finance |
193 | Company B | Technology |
910 | Company C | Healthcare |
Write a SQL query to determine the total revenue from each industry.
The PostgreSQL query to solve this problem would be:
This query first joins the and tables on . Then it groups the result by and computes the sum of the for each industry. The total revenue from each industry is returned by this query.
A transaction is a one or more SQL commands which are executed as a singular unit if the transaction -commits- (or no execution happens if the transaction -aborts-).
For transactions, a DBMS is supposed to enforce the following ACID properties: Atomicity, Consistency, Isolation, & Durability.
Here's what each one means:
As you can see, it's pretty important for the multiple databases where McKinsey store's it's data to be ACID-compliant!
To optimize the targeting of marketing campaigns, you're interested in the customers who have spent over $10000 in total and are located either in the 'New York' or 'Los Angeles' area. Write a SQL query to find these customers from the and tables.
The table has these columns: , , , , and . The table has these columns: , , , , , and .
customer_id | customer_name | city | state | zipcode |
---|---|---|---|---|
1 | John Doe | New York | NY | 10001 |
2 | Jane Smith | Los Angeles | CA | 90001 |
order_id | customer_id | product_id | quantity | unit_price | order_date |
---|---|---|---|---|---|
1 | 1 | 001 | 5 | 2000 | 06/08/2022 00:00:00 |
2 | 1 | 002 | 2 | 3000 | 06/10/2022 00:00:00 |
3 | 2 | 001 | 1 | 5000 | 06/18/2022 00:00:00 |
4 | 2 | 002 | 3 | 1500 | 07/26/2022 00:00:00 |
This SQL query first joins the and tables on the field. Then it filters the results to include only those customers located in 'New York' or 'Los Angeles'. Finally, it aggregates the total amount spent by each customer and only includes those who have spent over $10000.
As an analytics data analyst at McKinsey, we are given a database with consultation details where each consultation has a rating (on a scale of 1 to 5) by the client. Each consultancy project is associated with a specific sector of service offered by McKinsey like 'Finance', 'Healthcare', 'Government', 'Retail' etc. Write a SQL query to find the average rating of the consultancies by sector.
consultation_id | project_id | consultant_id | client_id | end_date | rating |
---|---|---|---|---|---|
1001 | 15001 | 2001 | 3001 | 06/06/2022 | 5 |
1002 | 15002 | 2002 | 3002 | 06/06/2022 | 4 |
1003 | 15001 | 2001 | 3003 | 06/08/2022 | 4 |
1004 | 15003 | 2003 | 3004 | 06/10/2022 | 5 |
1005 | 15002 | 2002 | 3005 | 07/08/2022 | 3 |
project_id | sector |
---|---|
15001 | Finance |
15002 | Healthcare |
15003 | Government |
sector | average_rating |
---|---|
Finance | 4.50 |
Healthcare | 3.50 |
Government | 5.00 |
This query first joins the and tables on . It then groups the rows by and calculates the average for each sector.
The function is used to remove NULLs, and replace them with another value.
For example, say you were a Data Analyst at McKinsey and were doing a customer analytics project to see who the most marketing-engaged customers were, and had access to the below table.
mckinsey_customers:
customer_id | email_engagement | sms_engagement |
---|---|---|
101 | very_active | not_opted_in |
201 | un-subscribed | NULL |
301 | NULL | not_opted_in |
401 | not_active | very_active |
501 | very_active | mildly_active |
303 | NULL | NULL |
Before you could procede, you had to remove the NULLs, and replace them with the default value (not_active for email, and not_opted_in for SMS). To do this, you'd run the following query:
This would get you the following output:
customer_id | email_engagement | sms_engagement |
---|---|---|
101 | very_active | not_opted_in |
201 | un-subscribed | not_opted_in |
301 | not_active | not_opted_in |
401 | not_active | very_active |
501 | very_active | mildly_active |
303 | not_active | not_opted_in |
As a business analyst at McKinsey, you are often required to filter down the client records database to find records that match a specific pattern or string. For instance, you might be tasked with finding all clients who are from the 'Finance' industry. Write an SQL query which will extract all client records where the column contains the string 'Finance'.
client_id | client_name | industry | region | account_manager |
---|---|---|---|---|
3721 | Goldman Sachs | Finance | North America | Alex |
7865 | Amazon | E-commerce | North America | Maria |
6373 | Chase Bank | Finance and Banking | North America | Alex |
5412 | Alibaba Group | E-commerce | Asia | Sophie |
7853 | Citigroup | Finance | North America | Alex |
client_id | client_name | industry | region | account_manager |
---|---|---|---|---|
3721 | Goldman Sachs | Finance | North America | Alex |
6373 | Chase Bank | Finance and Banking | North America | Alex |
7853 | Citigroup | Finance | North America | Alex |
This SQL query uses the LIKE keyword in combination with the wildcard character (%) to filter for all client records where the column contains the string 'Finance'. As the % symbol is used both before and after 'Finance', this will ensure that the query returns any values that contain 'Finance', irrespective of what comes before or after it.
The key to acing a McKinsey SQL interview is to practice, practice, and then practice some more! Beyond just solving the above McKinsey SQL interview questions, you should also solve the 200+ SQL Interview Questions on DataLemur which come from companies like Google, Uber, and Microsoft.
Each interview question has multiple hints, detailed solutions and best of all, there is an interactive coding environment so you can right in the browser run your query and have it graded.
Learn how McKinsey's Digital & Analytics is helping governments innovate, transform, and thrive in a rapidly changing world!
To prep for the McKinsey SQL interview you can also be wise to solve interview questions from other management consulting companies like:
However, if your SQL skills are weak, forget about going right into solving questions – improve your SQL foundations with this SQL interview tutorial.
This tutorial covers things like math functions like ROUND()/CEIL() and Subqueries – both of these show up frequently in SQL job interviews at McKinsey.
In addition to SQL query questions, the other types of problems to practice for the McKinsey Data Science Interview are:
I believe the optimal way to study for McKinsey Data Science interviews is to read the book I wrote: Ace the Data Science Interview.
It has 201 data interview questions sourced from companies like Google, Tesla, & Goldman Sachs. The book's also got a crash course on Product Analytics, SQL & ML. And finally it's helped thousands of people land their dream job in data, which is why it's earned 1000+ reviews on Amazon and a 4.5-star rating.
Also to learn more about how data science and consulting tie together read about these 7 real world examples of data science consulting!
While the book is more technical in nature, it's also important to prepare for the McKinsey behavioral interview. Start by reading the company's values and mission.