At Kaspi.kz, SQL is used often for extracting and analyzing customer transaction data, and managing the company's financial database for more efficient decision-making. Unsurprisingly this is why Kaspi kz asks SQL questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.
To help you ace the Kaspi kz SQL interview, this blog covers 9 Kaspi kz SQL interview questions – can you solve them?
As a data analyst at Kaspi kz, you're tasked with identifying key users of our platform. We define "VIP Users" as those who've made more than 400 transactions in the last month. Write an SQL query to extract a list of VIP Users.
We've provided the following table, , for your analysis.
transaction_id | user_id | transaction_date | amount |
---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 5000.00 |
7802 | 265 | 06/10/2022 00:00:00 | 6985.00 |
5293 | 123 | 06/18/2022 00:00:00 | 5000.00 |
6352 | 192 | 07/26/2022 00:00:00 | 6985.00 |
4517 | 123 | 07/05/2022 00:00:00 | 6985.00 |
This SQL query identifies VIP users by checking the total number of transactions each user has made in the past month. If the count exceeds 400, the user id is returned as a VIP user. The use of the clause specifies this condition after the clause groups the transactions by user_id.
To solve a related customer analytics question on DataLemur's free online SQL code editor, try this Microsoft SQL Interview problem:
Kaspi.kz is a large e-commerce platform in Kazakhstan. We are interested in analyzing our customer's behavior to understand the structure of our revenue. Write a SQL query that would return the average amount of purchase per customer for each month.
sales_id | customer_id | date | price_kz |
---|---|---|---|
1 | 123 | 01/01/2022 | 1000 |
2 | 123 | 01/15/2022 | 2000 |
3 | 265 | 01/20/2022 | 4500 |
4 | 362 | 02/05/2022 | 7000 |
5 | 192 | 03/01/2022 | 3000 |
6 | 981 | 02/28/2022 | 1000 |
7 | 192 | 03/15/2022 | 5000 |
8 | 362 | 02/25/2022 | 3000 |
month | customer_id | avg_purchase_kz |
---|---|---|
01/2022 | 123 | 1500.00 |
01/2022 | 265 | 4500.00 |
02/2022 | 362 | 5000.00 |
02/2022 | 981 | 1000.00 |
03/2022 | 192 | 4000.00 |
Here is a SQL query that uses the AVG() window function to find the average purchase per customer for each month:
This query will return the month, customer_id, and the average amount of purchase for each customer for each month. The 'PARTITION BY' clause in the OVER clause of the AVG window function will divide the data into partitions or groups. This function will be applied to each partition separately and reinitialized for each partition.
This query will help us understand how average spending changes for each customer each month and if there are any trends in our revenue over time.
To practice a related window function SQL problem on DataLemur's free interactive SQL code editor, try this Amazon BI Engineer interview question:
Normalization can reduce redundancy and improve performance and flexibility in a database.
By dividing larger tables into smaller, more specific ones linked by foreign keys, it is often possible to improve query speed. Note: it is not always the case that query speed will be improved due to costly nature of joins. Still, we think normalization is a good idea because it can also increase flexibility by making it easier to alter individual tables without affecting others, which can be useful for adapting Kaspi kz's database to ever-changing business needs.
Kaspi kz is a large retail organization and they are interested in understanding the shopping habits of their customers more closely. They want to track their customers' purchases and they are interested in finding the top 5 customers who made the most purchases in the past month, the total amount of money spent, and the number of unique products they purchased. How would you model the customer, product and orders tables to solve this problem?
Here's the sample tables:
customer_id | name | |
---|---|---|
101 | John Doe | John.Doe@gmail.com |
102 | Jane Smith | Jane.Smith@gmail.com |
103 | David Johnson | David.Johnson@gmail.com |
104 | Susan Williams | Susan.Williams@gmail.com |
105 | Michael Brown | Michael.Brown@gmail.com |
product_id | product_name | price |
---|---|---|
501 | iPhone 12 | $699 |
502 | Macbook Pro | $1299 |
503 | iWatch | $399 |
504 | iPod | $199 |
505 | iPad | $799 |
order_id | customer_id | product_id | quantity | order_date |
---|---|---|---|---|
20101 | 102 | 501 | 1 | 2022-11-01 |
20102 | 105 | 502 | 1 | 2022-11-01 |
20103 | 101 | 503 | 2 | 2022-11-02 |
20104 | 104 | 504 | 3 | 2022-11-03 |
20105 | 103 | 505 | 1 | 2022-11-03 |
The SQL query provided retrieves each customer's ID and name, counts the number of unique products they've ordered from the last month, and sums the total they've spent for the last month. The customers are then sorted by their total spent in a descending order, which means we'll get the customers who spent the most at the top. Finally, we limit the output to the top 5 to meet the company's requirements.
A self-join is a type of JOIN where a table is joined to itself. To execute a self-join, you must include the table name twice in the FROM clause and assign a different alias to each instance. You can then join the two copies of the table using a JOIN clause, and use a WHERE clause to specify the relationship between the rows.
For instance, suppose you had website visitor data for Kaspi kz, exported from the company's Google Analytics account. To assist the web-dev team, you needed to analyze pairs of pages for UX or navigational issues. As part of that analysis, you wanted to generate all pairs of URLs, but needed to exclude pairs where both URLs were the same since that is not a valid pair."
You could use the following self-join:
This query returns the url of each page () along with the url of the page that referred to it (). The self-join is performed using the field, which specifies the id of the page that referred the visitor to the current page, and avoids any pages that referred themself (aka data anomalies).
Sure, here is an example of a question for Kaspi.kz in a SQL interview focusing on click-through rates:
As an analyst at Kaspi.kz, a leading fintech company in Kazakhstan, you're tasked with understanding the click-through rates(CTR) of ads for your digital products. Your digital product team wants to understand the rate of success for users to move from viewing a product's details to adding the product to their cart.
Given two tables - 'product_views' and 'product_add_to_cart', can you calculate the click-through rate (CTR: the number of users who add a product to their cart after viewing it) for each product ?
view_id | user_id | product_id | view_date |
---|---|---|---|
311 | 123 | 1 | 06/08/2022 00:00:00 |
785 | 734 | 2 | 06/10/2022 00:00:00 |
679 | 123 | 1 | 06/18/2022 00:00:00 |
1013 | 490 | 3 | 07/26/2022 00:00:00 |
670 | 314 | 1 | 07/05/2022 00:00:00 |
add_id | user_id | product_id | add_date |
---|---|---|---|
2910 | 123 | 1 | 06/08/2022 00:00:00 |
6809 | 734 | 2 | 06/11/2022 00:00:00 |
6421 | 314 | 1 | 07/06/2022 00:00:00 |
product_id | CTR |
---|---|
1 | 66.67% |
2 | 100% |
3 | 0% |
This query joins the 'product_views' and 'product_add_to_cart' tables on the user_id and product_id. The counts of users who viewed and added each product to the cart are then used to calculate the CTR for each product. The ROUND function is used to limit the CTR to two decimal places and display it as a percentage.
To solve a related SQL interview question on DataLemur's free online SQL code editor, attempt this Meta SQL interview question:
Constraints are used to specify the rules concerning data in the table. It can be applied for single or multiple fields in an SQL table during the creation of the table or after creating using the ALTER TABLE command. The constraints are:
Say you were storing sales analytyics data from Kaspi kz's CRM inside a database. Here's some example constraints you could use:
PRIMARY KEY constraint: You might use a PRIMARY KEY constraint to ensure that each record in the database has a unique identifier. For example, you could use the "opportunity_id" field as the primary key in the "opportunities" table.
FOREIGN KEY constraint: You might use a FOREIGN KEY constraint to link the data in one table to the data in another table. For example, you could use a foreign key field in the "opportunities" table to reference the "account_id" field in the "accounts" table.
NOT NULL constraint: You might use a NOT NULL constraint to ensure that a field cannot contain a NULL value. For example, you could use a NOT NULL constraint on the "opportunity_name" field in the "opportunities" table to ensure that each opportunity has a name.
UNIQUE constraint: You might use a UNIQUE constraint to ensure that the data in a field is unique across the entire table. For example, you could use a UNIQUE constraint on the "email" field in the "contacts" table to ensure that each contact has a unique email address.
CHECK constraint: You might use a CHECK constraint to ensure that the data in a field meets certain conditions. For example, you could use a CHECK constraint to ensure that the "deal_probability" field in the "opportunities" table is a value between 0 and 100.
DEFAULT constraint: You might use a DEFAULT constraint to specify a default value for a field. For example, you could use a DEFAULT constraint on the "stage" field in the "opportunities" table to set the default value to "prospecting"
Kaspi kz has several customers who register with their email addresses. They want to identify customers who have used an email address with a specific domain for their registration, for example, only '@yahoo.com'.
Write a SQL query to filter customers with email addresses ending with a specified domain.
customer_id | first_name | last_name | |
---|---|---|---|
001 | John | Doe | john.doe@gmail.com |
002 | Jane | Smith | jane.smith@yahoo.com |
003 | David | Lee | david.lee@yahoo.com |
004 | Emma | Johnson | emma.johnson@outlook.com |
005 | Robert | Brown | robert.brown@gmail.com |
customer_id | first_name | last_name | |
---|---|---|---|
002 | Jane | Smith | jane.smith@yahoo.com |
003 | David | Lee | david.lee@yahoo.com |
The keyword in SQL is used in the clause to search for a specific pattern in a column. '%' is a wild card that may represent zero, one, or multiple characters. The SQL statement above selects all fields from the table where the field ends with '@yahoo.com', thus, returning customers who have used an yahoo.com email address for registration.
Kaspi.kz is a large digital ecosystem with a leading position in e-commerce and fintech in Kazakhstan. As a data analyst at Kaspi, you are tasked with determining the average transaction amount for each of their mobile wallet transactions for each month in the year 2020.
You are given access to the table, which has the following structure:
transaction_id | user_id | transaction_date | transaction_amount |
---|---|---|---|
1231 | 4321 | 01/15/2020 08:20:00 | 3450 |
1232 | 5235 | 02/16/2020 12:45:00 | 1250 |
1233 | 3654 | 03/21/2020 16:22:00 | 2310 |
1234 | 9584 | 04/29/2020 10:00:00 | 9340 |
1235 | 3265 | 05/19/2020 17:45:00 | 2250 |
1236 | 8523 | 06/07/2020 20:30:00 | 7100 |
1237 | 4593 | 07/24/2020 22:05:00 | 4350 |
1238 | 5473 | 08/11/2020 07:00:00 | 3675 |
1239 | 6938 | 09/01/2020 10:16:00 | 5200 |
1240 | 2845 | 10/12/2020 14:20:00 | 4500 |
1241 | 3601 | 11/03/2020 19:33:00 | 6800 |
1242 | 5293 | 12/22/2020 21:40:00 | 3500 |
You are expected to output a table with two columns, and .
mth | avg_transaction_amount |
---|---|
1 | 3450 |
2 | 1250 |
3 | 2310 |
4 | 9340 |
5 | 2250 |
6 | 7100 |
7 | 4350 |
8 | 3675 |
9 | 5200 |
10 | 4500 |
11 | 6800 |
12 | 3500 |
This query first filters the table for transactions that occurred in the year 2020. It then groups the transactions by month, calculated via the function. Finally, it calculates the average amount for each month and orders the output table by month.
The best way to prepare for a Kaspi kz SQL interview is to practice, practice, practice. In addition to solving the earlier Kaspi kz SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Google, Uber, and Microsoft.
Each exercise has multiple hints, full answers and best of all, there's an online SQL coding environment so you can instantly run your SQL query and have it executed.
To prep for the Kaspi kz SQL interview it is also wise to solve SQL problems from other tech companies like:
However, if your SQL skills are weak, forget about diving straight into solving questions – strengthen your SQL foundations with this interactive SQL tutorial.
This tutorial covers SQL topics like filtering on multiple conditions using AND/OR/NOT and SUM/AVG window functions – both of these show up routinely during SQL job interviews at Kaspi kz.
Beyond writing SQL queries, the other types of questions to prepare for the Kaspi kz Data Science Interview are:
The best way to prepare for Kaspi kz Data Science interviews is by reading Ace the Data Science Interview. The book's got: