At KPMG, SQL is used to extract and process client financial data, such as revenue growth and expense patterns, as well as to analyze nuanced data trends specific to the auditing and advisory industry, like identifying anomalies in financial statements. That is why KPMG often asks SQL questions during interviews for Data Analytics, Data Science, and Data Engineering jobs.
To help you ace the KPMG SQL interview, here's 9 KPMG SQL interview questions – can you solve them?
As a Data Analyst in KPMG, you're given a database of transactions that customers have made. Each transaction record contains the customer_id, date of transaction, and revenue generated from the transaction.
Your task is to calculate the Customer Lifetime Value (CLV) for each customer. CLV is the total revenue a company can realistically expect from a single customer account. It considers a customer's revenue value and compares that number to the company's predicted customer lifespan. For this task, you'll not consider churn rate.
Assume each customer has at least 1 transaction. If the customer has more than 1 transaction calculate the CLV as the cumulative sum of the revenue for each customer over time. List the top 10 customers with the highest CLV.
transaction_id | customer_id | transaction_date | revenue |
---|---|---|---|
1 | 3223 | 2019-08-22 | 250.00 |
2 | 4563 | 2019-11-18 | 300.00 |
3 | 3223 | 2020-02-14 | 200.00 |
4 | 6712 | 2020-05-20 | 150.00 |
5 | 4563 | 2020-08-11 | 250.00 |
6 | 6712 | 2020-10-22 | 300.00 |
7 | 3223 | 2021-03-14 | 500.00 |
8 | 6712 | 2021-06-10 | 350.00 |
9 | 4563 | 2021-09-02 | 400.00 |
10 | 6712 | 2021-12-14 | 250.00 |
customer_id | CLV |
---|---|
3223 | 950.00 |
4563 | 950.00 |
6712 | 1050.00 |
This PostgreSQL query calculates the cumulative sum of the revenue for each customer in the transactions table, ordered by the transaction_date. It uses a window function to create a running total of revenue for each customer_id. The query then orders the results by the CLV in descending order and limits the output to the top 10 customers with the highest CLV.
p.s. Window functions show up pretty often during SQL interviews, so practice the 27+ window function questions on DataLemur
Given a table of KPMG employee salary data, write a SQL query to find the top 3 highest paid employees in each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
Code your solution to this question interactively on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
If the code above is tough, you can find a step-by-step solution here: Top 3 Department Salaries.
Think of SQL constraints like the rules of a game. Just like a game needs rules to keep things fair and fun, a database needs constraints to keep things organized and accurate.
There are several types of SQL constraints like:
NOT NULL: This constraint is like a bouncer at a nightclub - it won't let anything NULL through the door.
UNIQUE: This constraint is like a VIP list - only special, one-of-a-kind values get in.
PRIMARY KEY: This constraint is like an elected official - it's made up of NOT NULL and UNIQUE values and helps identify each row in the table.
FOREIGN KEY: This constraint is like a diplomatic ambassador - it helps establish relationships between tables.
CHECK: This constraint is like a referee - it makes sure everything follows the rules.
DEFAULT: This constraint is like a backup plan - it provides a default value if no other value is specified.
So, whether you're playing a game or organizing a database, constraints are an important part of the process!
KPMG as a consulting firm executes various projects for different clients. Each project often consists of multiple tasks and each task is assigned to one or more employees, where each employee can log the hours they've worked. For this question, we're interested in the average number of hours billed per project. Write a SQL query which calculates the average hours billed to each project.
project_id | client_name | project_name |
---|---|---|
100 | Apple | Market Analysis |
101 | Competitive Research | |
102 | Microsoft | Business Strategy |
task_id | project_id | employee_id | hours_billed |
---|---|---|---|
500 | 100 | 123 | 10 |
501 | 101 | 265 | 8 |
502 | 101 | 362 | 6 |
503 | 102 | 192 | 9 |
504 | 102 | 981 | 11 |
project_id | project_name | avg_hoursBilled |
---|---|---|
100 | Market Analysis | 10.00 |
101 | Competitive Research | 7.00 |
102 | Business Strategy | 10.00 |
In this query, we first join the projects and tasks tables on project_id. We then group by project_id and project_name to split our data into sections for each project. Within these sections, we use the AVG function to calculate the average hours billed to each project.
To practice a very similar question try this interactive Microsoft Teams Power Users Question which is similar for calculating usage metrics or this Amazon Server Utilization Time Question which is similar for summarizing time usage.
A is a field in a table that references the of another table. It creates a link between the two tables and ensures that the data in the field is valid.
For example, if you have a table of KPMG customers and an orders table, the customer_id column in the orders table could be a that references the id column (which is the primary key) in the KPMG customers table.
The constraint helps maintain the integrity of the data in the database by preventing the insertion of rows in the table that do not have corresponding entries in the table. It also enforces the relationship between the two tables and prevents data from being deleted from the table if it is still being referenced in the table.
KPMG, a global network of professional firms providing Audit, Tax and Advisory services, has recently increased its online marketing efforts. As an analyst for KPMG, your task is to analyze the click-through rates for their ongoing digital marketing campaigns. This involves calculating the rate at which users who see digital ads on various platforms (Google, Facebook, LinkedIn, etc.) end up landing on KPMG's website.
After clicking on the ad, some users might further opt for a service enquiry.
Given two tables - and , write a SQL query that calculates the proportion of total users who clicked an ad () and subsequently made a service enquiry () for each platform.
ad_id | user_id | click_date | platform |
---|---|---|---|
111 | 2450 | 08/15/2022 00:00:00 | |
892 | 2658 | 08/18/2022 00:00:00 | |
437 | 4800 | 08/20/2022 00:00:00 | |
598 | 3200 | 07/15/2022 00:00:00 | |
320 | 1700 | 07/14/2022 00:00:00 |
enquiry_id | user_id | enquiry_date |
---|---|---|
71 | 2450 | 08/16/2022 00:00:00 |
802 | 3200 | 07/16/2022 00:00:00 |
493 | 1550 | 06/30/2022 00:00:00 |
12 | 6700 | 07/02/2022 00:00:00 |
117 | 9800 | 05/25/2022 00:00:00 |
This query joins the table with the table on , considering only the cases where the is after or on the .
Then it aggregates the data by , and computes the rate of enquiries per clicks for each platform by dividing the COUNT of DISTINCT s in the by the COUNT of DISTINCT s in the , multiplied by 100 to get the percentage rate. This gives us the 'click-to-enquiry' rate for each platform.
To solve a related SQL problem on DataLemur's free online SQL code editor, try this SQL interview question asked by Facebook:
Views are a lot like virtual tables, where you can take a base table and customize it (such as by hiding some data from non-admin users, or removing some random columns/rows based on business requirements).
Here's the PostgreSQL syntax for creating a view based on data in the table:
Given a table , where each row represents an audit conducted by KPMG, find the maximum audit risk for each sector for audits that were completed during the year 2021.
The table has the following schema:
audit_id | sector | company_id | audit_date | audit_risk |
---|
audit_id | sector | company_id | audit_date | audit_risk |
---|---|---|---|---|
641 | Finance | 100 | 2021-02-06 | 0.76 |
702 | IT | 101 | 2021-04-11 | 0.89 |
529 | Finance | 102 | 2021-06-18 | 0.85 |
635 | Manufacturing | 103 | 2021-08-03 | 0.91 |
417 | IT | 104 | 2021-05-20 | 0.72 |
sector | max_audit_risk |
---|---|
Finance | 0.85 |
IT | 0.89 |
Manufacturing | 0.91 |
Here is the PostgreSQL query which calculates the maximum audit risk by sector for the year of 2021:
The statement is used to group the audits by sector. The clause is used to filter out audits that were not performed in the specified year (2021 in this case). And finally, is used to retrieve the highest audit risk from each sector. As you might already know, is a PostgreSQL function that returns the year part of a date as an integer.
KPMG's Marketing Department would like to run an email campaign targeting their customers whose email providers are Gmail. They've noticed that a significant portion of their customer base is using Gmail, and they would like to build a specific marketing strategy around this. Therefore, the task is to find all the customers whose email IDs are registered with a '@gmail.com' domain.
customer_id | first_name | last_name | |
---|---|---|---|
6171 | John | Doe | john.doe@gmail.com |
7802 | Jane | Smith | jane.smith@yahoo.com |
5293 | Richard | Roe | richard.roe@hotmail.com |
6352 | James | Bond | james.bond@gmail.com |
4517 | Alice | Cooper | alice.cooper@gmail.com |
customer_id | first_name | last_name | |
---|---|---|---|
6171 | John | Doe | john.doe@gmail.com |
6352 | James | Bond | james.bond@gmail.com |
4517 | Alice | Cooper | alice.cooper@gmail.com |
This query uses the SQL keyword to filter rows from the customers database table. It matches customer records where the email column ends with '@gmail.com'. Therefore, it returns all customer records that are associated with a Gmail account. The percent symbol (%) is used as a wildcard character that can match any sequence of characters, and the at sign (@) and the text 'gmail.com' form the specific pattern to be matched.
The key to acing a KPMG SQL interview is to practice, practice, and then practice some more! Beyond just solving the above KPMG SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Microsoft, Google, Amazon, and tech startups.
Each DataLemur SQL question has hints to guide you, full answers and best of all, there's an interactive coding environment so you can right online code up your query and have it checked.
To prep for the KPMG SQL interview it is also helpful to practice SQL questions from other accounting & consulting companies like:
Find out how KPMG is using Artificial Intelligence to drive innovation, improve efficiency, and reduce costs!
However, if your SQL query skills are weak, forget about diving straight into solving questions – improve your SQL foundations with this DataLemur SQL Tutorial.
This tutorial covers SQL topics like AND/OR/NOT and CASE/WHEN/ELSE statements – both of which pop up often in SQL job interviews at KPMG.
Beyond writing SQL queries, the other types of questions covered in the KPMG Data Science Interview include:
To prepare for KPMG Data Science interviews read the book Ace the Data Science Interview because it's got:
Don't ignore the behavioral interview – prep for that with this guide on acing behavioral interviews.