Data Analysts & Data Scientists at Deloitte use SQL to analyze financial datasets, such as income statements and balance sheets, to identify trends and forecast revenue growth. It is also used to manage client databases to track customer interactions and preferences for strategic consulting projects, which is why Deloitte includes SQL interview questions in its hiring process to assess candidates' skills.
So, to help you study for the Deloitte SQL interview, here's 9 Deloitte SQL interview questions in this article.
Deloitte makes use of a log system to track the completion of various critical tasks by its employees. This system logs which task was completed, the completion time, and the employee who completed the task. Deloitte needs to identify 'VIP' users for its system - these are employees who complete more than 20 critical tasks per week consistently over the period of a month.
Create a SQL query that identifies the VIP users for each month using an table. Mark employees who complete the above-mentioned frequency as 1, else 0.
The table has the following columns:
log_id | user_id | activity_type | timestamp |
---|---|---|---|
1201 | 32 | critical | 06/02/2022 09:00:00 |
1302 | 45 | non-critical | 06/03/2022 10:30:00 |
1403 | 32 | critical | 06/04/2022 14:44:00 |
1504 | 67 | critical | 06/05/2022 15:22:00 |
1605 | 32 | critical | 06/06/2022 10:54:00 |
1706 | 32 | non-critical | 06/07/2022 11:21:00 |
Your output table should be:
Month | Year | userID | isVIP |
---|
This query works by grouping data by user_id, month, and year. For each group, it checks if the count of 'critical' activities is more than 80 (which is 20 times 4 weeks). If it's more than 80, it marks the user as a VIP (1), otherwise, it marks them as not a VIP (0).
To solve a super-customer analysis question on DataLemur's free online SQL code editor, try this Microsoft SQL Interview problem:
Suppose there was a table of Deloitte employee salary data. Write a SQL query to find the employees who earn more than their direct manager.
employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns $8,000, surpassing her manager, William Davis who earns 7,800.
Write a SQL query for this question directly within the browser on DataLemur:
First, we perform a SELF-JOIN where we treat the first table () as the managers' table and the second table () as the employees' table. Then we use a clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
If the code above is hard to understand, you can find a step-by-step solution with hints here: Employees Earning More Than Managers.
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1st, 2nd, 3rd normal forms).
Here's 3 reasons to de-normalize a database at Deloitte:
Improved performance: Denormalization can often improve performance by reducing the number of expensive join operations required to retrieve data. This is particularly useful when the database is being used for online analytical processing (OLAP) as frequent joins across multiple tables can be slow and costly.
Scalability: Denormalization can increase the scalability of a database by requiring less data to be read and processed when executing a query. This can be beneficial when the database is expected to handle a high volume of queries.
Simplification: One way to simplify the design of a database is by using denormalization to reduce the number of tables and relationships that need to be managed. This can make the database easier to understand and maintain.
Given a database of product reviews, can you write a SQL query in PostgreSQL to calculate the average monthly stars (ratings) given to products? For example, if product "50001" received ratings of "4" and "3" in June, its average rating for that month would be "3.5". You should structure the output in a way that allows a user to easily see the average product ratings for each month.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
month | product_id | average_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
Taking into account timezones could affect the calculation, in this example we are considering PostgreSQL's function to truncate the date at the month level without considering timezones.
This query groups the product reviews by month and product_id, then calculates the average stars (ratings) for each group. The results are then ordered first by month and then product_id for ease of reading.
To practice another window function question on DataLemur's free interactive coding environment, solve this Amazon BI Engineer interview question:
When using , only rows that are identical in both sets will be returned.
For a concrete example, say you were a Data Analyst supporting the Sales Analytics team at Deloitte, and data on potential sales leads lived in both Salesforce and Hubspot CRMs. To write a query to analyze leads created before 2023 started, that show up in BOTH CRMs, you would use the command:
As an analyst at Deloitte, you are given a database that consists of customer records and the recorded transactions of these customers. Your task is to write a SQL query that allows you to filter the customers who are based in New York and who have made at least one purchase over $1000 in the category of "Electronics" since 2020.
This information is housed in two tables, and . The structure of these tables is as follows:
customer_id | first_name | last_name | city | state |
---|---|---|---|---|
1 | John | Doe | New York | NY |
2 | Jane | Smith | Los Angeles | CA |
3 | Alice | Johnson | New York | NY |
4 | Bob | Williams | Chicago | IL |
5 | Charlie | Brown | New York | NY |
order_id | customer_id | order_date | category | order_value |
---|---|---|---|---|
1 | 1 | 01/05/2020 | Electronics | 1500 |
2 | 2 | 02/10/2021 | Furniture | 800 |
3 | 1 | 03/15/2022 | Electronics | 700 |
4 | 3 | 04/20/2020 | Electronics | 2000 |
5 | 5 | 05/25/2022 | Furniture | 1500 |
This query will return the customer_id, first_name and last_name from the data joined with data where the customers' state is New York, the orders' category is Electronics, the order value is more than $1000 and the order date is after or on 1st January, 2020. Finally, it filters down the customers who have had at least one such order (as indicated by ).
The clause is used to filter the groups created by the clause. It's similar to the clause, but it is used to specify conditions on the groups created by the clause, rather than on the individual rows of the table.
For example, say you were analyzing salaries for analytics employees at Deloitte:
This query retrieves the total salary for each Analytics department at Deloitte and groups the rows by the specific department (i.e. "Marketing Analytics", "Business Analytics", "Sales Analytics" teams).
The clause then filters the groups to include only Deloitte departments where the total salary is greater than $1 million.
Given a mock dataset, calculate the click-through conversion rate for advertisements run by Deloitte. For each advertisement, calculate the percentage of times a viewer clicked on the advertisement and then went on to add a product to their cart.
Assume we are given two tables, and . The table contains information about the advertisements and their IDs. The table tracks every action taken by users -- from ad clicks to product add to cart.
ad_id | campaign_id | product_id |
---|---|---|
1 | A | 100 |
2 | A | 101 |
3 | B | 102 |
4 | B | 103 |
user_action_id | user_id | action | ad_id | product_id | action_date |
---|---|---|---|---|---|
1 | 123 | "Clicked ad" | 1 | 100 | "01/01/2022 00:00:00" |
2 | 123 | "Added product to cart" | null | 100 | "01/01/2022 00:05:00" |
3 | 456 | "Clicked ad" | 2 | 101 | "01/02/2022 00:00:00" |
4 | 456 | "Added product to cart" | null | 101 | "01/03/2022 00:00:00" |
5 | 789 | "Clicked ad" | 3 | 102 | "01/04/2022 00:00:00" |
Above PostgreSQL query first separates out the click ad actions and add to cart actions and counts them according to ad_id and product_id respectively using Common Table Expressions (CTE). Then, a JOIN operation is performed on click_actions, add_to_cart_actions, and ads on common ids and final conversion rate is calculated using the formula (total_add_to_cart / total_clicks) * 100. The result will be the ad_id, campaign_id and the calculated conversion rate for each ad.
To practice a similar problem on DataLemur's free interactive SQL code editor, try this Facebook SQL Interview question:
Deloitte is analyzing the performance of various products based on customer ratings. Each product has multiple sizes, each with its own rating. The larger the size, the heavier the impact on the final rating. We need to calculate the weighted average rating for each product.
Here are the tables for product ratings and product sizes:
product_id | size_id | rating |
---|---|---|
1 | 1 | 4 |
1 | 2 | 5 |
1 | 3 | 3 |
2 | 1 | 5 |
2 | 2 | 2 |
2 | 3 | 4 |
size_id | weight_factor |
---|---|
1 | 0.1 |
2 | 0.3 |
3 | 0.6 |
product_id | weighted_avg_rating |
---|---|
1 | 3.7 |
2 | 3.6 |
In this query, we first join the table and table on . Then, for each product (), we calculate the weighted sum of ratings where the weights are the in the table. We then divide that sum by the total of the weights () to yield the weighted average. The function is used to round the average to one decimal point. Finally, the clause ensures this calculation is done separately for each product.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for its focus on evaluating product ratings or this Amazon Highest-Grossing Items Question which is similar for its process of analyzing product data.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Deloitte SQL interview is to solve as many practice SQL interview questions as you can! In addition to solving the earlier Deloitte SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Google, Uber, and Microsoft.
Each SQL question has hints to guide you, fully explained answers along with a discussion board to see how others solved it and most importantly, there's an online SQL code editor so you can right in the browser run your SQL query and have it graded.
To prep for the Deloitte SQL interview it is also a great idea to solve interview questions from other consulting and professional service companies like:
Unlock the power of Artificial Intelligence and data with Deloitte's expert consulting services!
In case your SQL foundations are weak, don't worry about jumping right into solving questions – strengthen your SQL foundations with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers SQL topics like grouping by multiple columns and using wildcards with LIKE – both of these pop up routinely in Deloitte SQL interviews.
In addition to SQL interview questions, the other topics to practice for the Deloitte Data Science Interview include:
To prepare for Deloitte Data Science interviews read the book Ace the Data Science Interview because it's got:
Don't ignore the behavioral interview – prep for it using this list of behavioral interview questions for Data Scientists.