Data Scientists, Analysts, and Data Engineers at Kainos write SQL queries all the time as part of their job. They use SQL for analyzing customer insights and financial data for their healthcare and public sector digital services, and managing large-scale data migration within IT transformation projects. Because of this, Kainos frequently asks jobseekers SQL coding interview questions.
So, to help you practice, here’s 8 Kainos Group SQL interview questions – can you solve them?
Sure, here is a potential SQL interview question for the description you have provided:
Kainos, a fictitious online retailer, would like to identify their 'power users'. For Kainos, power users are defined as customers who made the most purchases within the past year. The goal is to write a SQL query to find these power users by analyzing their purchases in the customer database.
purchase_id | customer_id | purchase_date | product_id | product_cost |
---|---|---|---|---|
3037 | 456 | 06/01/2022 00:00:00 | 1001 | 50 |
4025 | 789 | 06/03/2022 00:00:00 | 1002 | 100 |
5362 | 123 | 06/05/2022 00:00:00 | 1001 | 50 |
1089 | 123 | 06/08/2022 00:00:00 | 1001 | 50 |
2108 | 456 | 06/10/2022 00:00:00 | 1002 | 100 |
For this problem, a simple way to identify power users is to count the number of purchases made by each customer_id within the past year. The SQL query could look something like this:
In this PostgreSQL query, we're looking at the 'purchases' table, where the 'purchase_date' was within the last year. We're grouping by 'customer_id', and ordering by the count in descending order. The output will be the top 10 customers by purchase frequency within the past year. In PostgreSQL, the NOW() function gets the present date and time, and INTERVAL '1 YEAR' subtracts a year from that date.
This query will give the list of the top 10 'power users', along with the number of purchases they have made in the past year. The customers with the highest purchase counts can be considered Kainos's most important users.
To practice a similar customer analytics SQL question where you can code right in the browser and have your SQL solution instantly executed, try this Walmart SQL Interview Question:
Given a table of Kainos employee salary data, write a SQL query to find the top 3 highest earning employees within each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
Check your SQL query for this problem interactively on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
If the code above is hard to understand, you can find a detailed solution here: Top 3 Department Salaries.
Database views are created to provide customized, read-only versions of your data that you can query just like a regular table. So why even use one if they're just like a regular table?
Views are useful for creating a simplified version of your data for specific users, or for hiding sensitive data from certain users while still allowing them to access other data.
Assume you are working as a data analyst at Kainos, a multinational digital solutions company. One of your roles is to monitor and evaluate customer reviews for the company's digital products on an ongoing basis.
The company receives numerous reviews across its range of products every day and stores this information within a dataset. The dataset reflects the structure and sample data outlined below:
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
The company wants to drill down on these reviews on a monthly basis. As such, you are tasked with writing a SQL query to compute the average star rating for each product on a monthly basis. You wish to determine the average for each in each month (denoted as ).
Your output should reflect the structure and sample data outlined below:
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
Here is a PostgreSQL query that solves the problem:
This query uses the function to get the month part from the . It then groups by the month and to compute the average for each product for each month. The function is used to calculate the average rating. Please note, the function returns double precision numbers, you may need to cast it to integer if needed. The final result is ordered by and for easier examination.
Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur
Constraints are used to specify the rules concerning data in the table. It can be applied for single or multiple fields in an SQL table during the creation of the table or after creating using the ALTER TABLE command. The constraints are:
Say you were storing sales analytyics data from Kainos's CRM inside a database. Here's some example constraints you could use:
PRIMARY KEY constraint: You might use a PRIMARY KEY constraint to ensure that each record in the database has a unique identifier. For example, you could use the "opportunity_id" field as the primary key in the "opportunities" table.
FOREIGN KEY constraint: You might use a FOREIGN KEY constraint to link the data in one table to the data in another table. For example, you could use a foreign key field in the "opportunities" table to reference the "account_id" field in the "accounts" table.
NOT NULL constraint: You might use a NOT NULL constraint to ensure that a field cannot contain a NULL value. For example, you could use a NOT NULL constraint on the "opportunity_name" field in the "opportunities" table to ensure that each opportunity has a name.
UNIQUE constraint: You might use a UNIQUE constraint to ensure that the data in a field is unique across the entire table. For example, you could use a UNIQUE constraint on the "email" field in the "contacts" table to ensure that each contact has a unique email address.
CHECK constraint: You might use a CHECK constraint to ensure that the data in a field meets certain conditions. For example, you could use a CHECK constraint to ensure that the "deal_probability" field in the "opportunities" table is a value between 0 and 100.
DEFAULT constraint: You might use a DEFAULT constraint to specify a default value for a field. For example, you could use a DEFAULT constraint on the "stage" field in the "opportunities" table to set the default value to "prospecting"
You are given two tables, and . The table has information about all the customers like , , , , and . The table has details about all transactions done by customers like , , , , , and .
Write a SQL query to find out the total quantity of products each customer bought and the total amount they spent, sorted by in descending order.
The sample data of tables are as below:
customer_id | name | location | registration_date | |
---|---|---|---|---|
001 | David | david@example.com | London | 09/12/2020 |
002 | Sophia | sophia@example.com | New York | 11/08/2019 |
003 | Liam | liam@example.com | Paris | 05/15/2021 |
004 | Emma | emma@example.com | Berlin | 02/01/2020 |
005 | Oliver | oliver@example.com | Sydney | 06/18/2020 |
transaction_id | customer_id | product_id | purchase_date | quantity | total_price |
---|---|---|---|---|---|
100001 | 001 | 5678 | 08/10/2022 | 5 | 50 |
100002 | 002 | 9012 | 08/15/2022 | 3 | 30 |
100003 | 001 | 3456 | 08/18/2022 | 2 | 20 |
100004 | 003 | 7890 | 09/15/2022 | 1 | 10 |
100005 | 004 | 1234 | 09/16/2022 | 4 | 40 |
The answer can be written in SQL like this:
In this answer, we join the table with the table on . Then, we group the results by and from the table. We then calculate the total quantity of products each customer bought () and the total amount each customer spent (). Finally, we order the result set by in descending order. This gives us the total quantity of products each customer bought and the total amount they spent, with the customers who bought the most items appearing first.
Because join questions come up so often during SQL interviews, take a stab at this Spotify JOIN SQL question:
Stored procedures are a lot like functions in programming. They're used to encapsulate and organize business logic into one unit of code, and they can accept multiple input parameters and return multiple output values.
For example, if you were a Data Analyst at Kainos working on a HR analytics project, you might create a stored procedure to calculate the average salary for a given department:
To call this stored procedure and find the average salary for the Data Analytics department you'd write the following query:
Company Kainos sells multiple products. Your task is to write a SQL query that extracts the Product ID and calculates the following for each product:
Use the ABS function to handle any negative values of total quantity sold.
product_id | product_name | product_price |
---|---|---|
1254 | Phone | 500 |
2378 | Computer | 1500 |
2759 | Tablet | 1000 |
4961 | Monitor | 800 |
sale_id | product_id | quantity |
---|---|---|
1 | 1254 | 5 |
2 | 1254 | 7 |
3 | 2378 | 8 |
4 | 2759 | -11 |
5 | 4961 | 3 |
6 | 4961 | 12 |
product_id | total_quantity_sold | Avg_Sale_Price | 4thPowerSumQty | Sqrt_Product_ID |
---|---|---|---|---|
1254 | 12 | 500 | 20736 | 35.39 |
2378 | 8 | 1500 | 4096 | 48.74 |
2759 | 11 | 1000 | 14641 | 52.50 |
4961 | 15 | 800 | 50625 | 70.43 |
This SQL query first performs a subquery to get the total quantity sold for each product from the "sales" table. It then joins this result with the "products" table to get the price and calculates the required values based on the question requirement.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for calculating product metrics or this Wayfair Y-on-Y Growth Rate Question which is similar for <analyzing sales data.
The key to acing a Kainos SQL interview is to practice, practice, and then practice some more! Besides solving the above Kainos SQL interview questions, you should also solve the 200+ DataLemur interview questions which come from companies like Microsoft, Google, and Facebook.
Each interview question has hints to guide you, fully explained answers along with a discussion board to see how others solved it and crucially, there's an interactive coding environment so you can easily right in the browser your SQL query answer and have it checked.
To prep for the Kainos SQL interview it is also useful to solve interview questions from other tech companies like:
In case your SQL coding skills are weak, don't worry about diving straight into solving questions – improve your SQL foundations with this interactive SQL tutorial.
This tutorial covers SQL concepts such as window functions and SQL joins with practice exercises – both of which come up frequently during Kainos interviews.
Besides SQL interview questions, the other types of problems to practice for the Kainos Data Science Interview include:
I'm sort of biased, but I think the optimal way to prep for Kainos Data Science interviews is to read the book Ace the Data Science Interview.
The book solves 201 data interview questions taken from FAANG & startups. It also has a refresher on Python, SQL & ML. And finally it's vouched for by the data community, which is how it's gotten to be an Amazon best-seller with over 1000+ 5-star reviews.