logo

10 Kakao SQL Interview Questions (Updated 2024)

Updated on

March 1, 2024

At Kakao, SQL is used often for analyzing user behavior data for targeted marketing, and to analyze KakaoTalk product analytics data. That's why Kakao frequently asks SQL query questions during interviews for Data Science and Data Engineering positions.

To help you prepare for the Kakao SQL interview, this blog covers 10 Kakao SQL interview questions – how many can you solve?

10 Kakao SQL Interview Questions

SQL Question 1: Analyze Top User Engagement With Kakao Services

As a representative for Kakao, you have been supplied with a dataset providing detailed information about the usage habits of your users over the past month. The dataset is contained in a PostgreSQL table named 'user_activity'. Given below is a sample of the data:

Example Input:
activity_iduser_idactivity_dateservice_idduration_seconds
724110012022-08-01 08:30:00K011800
834710022022-08-01 09:00:00K022400
602510012022-08-01 15:30:00K011800
721810022022-08-02 08:00:00K023600
836010032022-08-02 10:00:00K011200

Your task is to write a SQL query to find the top 3 users who have the longest total duration of using each service for every day. You will need to make use of SQL window functions to solve this problem.

Answer:


In the answer, we use window rows to compute the total duration for each user per day for each service. Every window is defined by the PARTITION BY clause, which separates our table into partitions of rows sharing the same activity_date and service_id.

Then the SUM function computes the total duration of use for each user_id within each partition. The ROW_NUMBER function assigns a unique rank to each row within a partition, ordered by the total duration in descending order.

Finally, we filter rows with a rank no higher than 3 (top 3). The result will be the top 3 users with the longest total duration of usage for each service each day.

To practice a related window function SQL problem on DataLemur's free interactive coding environment, try this Amazon BI Engineer interview question: Amazon Business Intelligence SQL Question

SQL Question 2: Analyzing User Interaction with Kakao's Various Services

Kakao Corporation provides many services like messaging (KakaoTalk), music streaming (KakaoMusic), ride-hailing (KakaoT), and a digital wallet (KakaoPay). These services can affect each other's engagement. For instance, users who chat about a newly released album on KakaoTalk might decide to listen to it later on KakaoMusic.

You've been given two tables - and , with the following schema:

interaction_iduser_idservice_idinteraction_dateinteraction_duration(mins)
001123A06/08/202210
002265B06/10/20225
003362A06/18/202215
004192C07/26/202220
005981B07/05/20225
service_idservice_name
AKakaoTalk
BKakaoMusic
CKakaoT
DKakaoPay

Assume all interactions are unique. Write a SQL query to find the average interaction duration for each service.

Answer:


This query first joins with on the field. The query then groups the data by . With each grouping of a service, it calculates the average duration of all interactions related to that service, providing a useful insight into which services are more engaging to users.

SQL Question 3: What's a primary key?

The primary key of a table is a column or set of columns that serves as a unique identifier for each row. It ensures that all rows are distinct and does not allow null values.

For example, say you had stored some Facebook ad campaign data that Kakao ran:


The column uniquely identifies each row in the table, and the PRIMARY KEY constraint ensures that no two rows have the same . This helps to maintain the integrity of the data in the table by preventing duplicate rows.

The primary key is also an important part of the table because it allows you to easily identify and reference specific campaigns in your Facebook Ad data. You can use it to join to other tables in the database, such as a table containing data on the results of the campaigns.

Kakao SQL Interview Questions

SQL Question 4: Finding Active Users

Kakao is interested in finding out which of their users are the most active on the platform. They would like a list of users who have sent more than 500 messages in the last 3 months and are part of at-least 10 different chat groups.

Please use the following tables for this analysis:

Example Input:
message_iduser_idsend_dategroup_idcontent
618154112/15/20227032Hello everyone
984268112/16/20224321What's the update on the project?
539354112/18/20227032We have a meeting at 5pm today
866698212/20/20228901Done with the task
575854112/22/20224321Here is the presentation for tomorrow
Example Input
user_idgroup_id
5417032
6814321
9828901
5414321
5418796

Write an SQL query to find all user who have sent more than 500 messages in the last 3 months and are part of at-least 10 different chat groups.

Answer:


This query joins the and tables on field. It then filters out the messages sent in the last 3 months. The clause groups the data by , and the clause filters the groups by users who have sent more than 500 messages and are part of at least 10 different groups.

SQL Question 5: Why might you denormalize a database?

Database denormalization is when you add redundancy to a database, and break typical normalization rules (specified by 1st, 2nd, 3rd normal forms). There's a few reasons to denormalize a database:

  • Improved performance: Denormalization often reduces the the number of costly join operations that are needed to retrieve data. This is helpful when the database is being used for OLAP (Online Analytical Processing) use cases, as joins can be expensive and slow.

  • Simplification: Denormalization can also be used to simplify the design of a database by reducing the number of tables and relationships that need to be managed. This can make it easier to understand and maintain the database.

  • Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.

While denormalization can be a useful tool for improving performance and scalability, it's important to keep in mind that it can make update and delete operations more complex. This is because denormalization can create data duplicates, which can make it harder to maintain data integrity. So, to avoid any headaches, it's a good idea to start with a well-normalized database design and then consider denormalization only if it's absolutely necessary for your specific performance and scalability needs. In other words, denormalization can be a bit of a wild card, so it's best to handle it with care!

SQL Question 6: Average Revenue Generated by Each Kakao Product

Kakao sells various digital products. Each product has multiple transactions every day. Can you write a SQL query to find out the average monthly revenue of each product based on the data in the "transactions" table

Here are the sample data:

Example Input:
transaction_idproduct_idtransaction_daterevenue
61715000106/08/2022 00:00:00500
78026985206/10/2022 00:00:00700
52935000106/18/2022 00:00:00550
63526985207/26/2022 00:00:00710
45176985207/05/2022 00:00:00450
Example Output:
productmonthavg_revenue
500016525.00
698526700.00
698527580.00

Answer:


In this query, we use GROUP BY along with the AVG function as asked in the question statement. The AVG function is used to find the average of 'revenue' for each 'product_id' for each 'month' of 'transaction_date'. The EXTRACT function is used to get the month from the 'transaction_date'. The result is ordered by 'product' and 'month'.

SQL Question 7: Why should you normalize your database?

Database normalization is helpful because it improves the performance, flexibility, and scalability of a database. Some specific benefits of normalization include:

  • Reducing Redundancy: Normalization helps to minimize redundancy by breaking down a larger general table into smaller, more specific tables. This can often reduce the amount of data that needs to be accessed for particular queries.

  • Improving Data Integrity: Normalization helps to ensure the integrity of the data by minimizing the risk of data inconsistencies and anomalies. It does this by establishing clear relationships between the tables via primary and foreign keys.

  • Increasing Flexibility: Normalization makes it easier to modify the structure of the database, as there is less redundancy, so it allows you to make changes to one table without affecting others. This makes it easier to adapt the database to changing business needs (and change is basically constant at Kakao!)

  • Improving Performance: Normalization can improve the performance of the database by reducing the amount of data that needs to be stored and accessed in a single table. This can result in faster query times and better overall performance.

  • Easier Data Modeling: Normalization can make it easier to model the data in a logical and intuitive way, since less random data is commingled together, which improves the overall database design.

SQL Question 8: Filtering User Records with LIKE Keyword

You're working with Kakao's customer records and you're asked to find all the users who live in Seoul and have an '@kakao.com' email address. For this problem, consider the following table schema for the 'users' table.

Example Input:
user_idemaillocation
123john@kakao.comSeoul
265mary@gmail.comBusan
362paul@kakao.comSeoul
192linda@kakao.comDaegu
981kim@kakao.comSeoul

Write a SQL query that will filter down the 'users' table and return all records with an location of 'Seoul' and an email address that ends with '@kakao.com'.

Hint: Use the SQL keyword LIKE in your where clause to match the specific string pattern for the email address.

Answer:

Here is the PostgreSQL query that can be used to solve this problem:


This query will return all user records where the location is 'Seoul' and the email address ends with '@kakao.com'. The '%' symbol in the LIKE clause acts as a wildcard that matches any sequence of characters. Thus, '%@kakao.com' will match any string that ends with '@kakao.com'.

SQL Question 9: Find the total revenue per city from Kakao users

Kakao wants to analyze customer purchases and total revenue generated per city to create a growth strategy. Currently, all user data and purchase activity is stored in two separate tables: and . The user information is stored in the table which includes a unique and the residing of each user. The table stores all purchase data, with each order having a unique , the of the customer who made the purchase, the and the of each purchase.

Write a SQL query to find the total revenue per city in June 2022.

Example Input:
user_idcity
123Seoul
265Busan
362Daegu
192Incheon
981Seoul
Example Input:
order_iduser_idpurchase_dateprice
504112306/08/2022 00:00:0050000
671226506/10/2022 00:00:0069800
392336206/18/2022 00:00:0085000
735219207/26/2022 00:00:0064800
815798107/05/2022 00:00:0068000

Answer:


This query first joins the and table using the . It then filters the data to only include orders made in June 2022 using the function. Finally, it groups by and calculates the total revenue for each city using the function.

Because joins come up so often during SQL interviews, try this interactive Snapchat JOIN SQL interview question: Snapchat SQL Interview question using JOINS

SQL Question 10: Can you explain the distinction between a unique and a non-unique index?

While both types of indexes improve the performance of SQL queries by providing a faster way to lookup rows of data, a unique index enforces the uniqueness of the indexed columns while a non-unique index allows duplicate values in the indexed columns.

Suppose you had a table of Kakao employees. Here's an example of a unique index on the column:


This index would ensure that no two Kakao employees have the same , which could be used as a unique identifier for each employee.

Here's a non-unique index example example on the column:


This index would not enforce uniqueness, but it could be used to improve the performance of queries that filter or sort the data based on the column. For example, if you want to quicklly retreive all Data Scientists, the database can use the index to efficiently locate and retrieve the desired records without having to do a full table scan on all Kakao employees.

How To Prepare for the Kakao SQL Interview

The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Besides solving the above Kakao SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Microsoft, Google, Amazon, and tech startups. DataLemur Question Bank

Each interview question has hints to guide you, fully explained answers along with a discussion board to see how others solved it and best of all, there is an interactive SQL code editor so you can easily right in the browser your SQL query answer and have it checked.

To prep for the Kakao SQL interview you can also be wise to solve SQL problems from other tech companies like:

But if your SQL skills are weak, don't worry about going right into solving questions – improve your SQL foundations with this SQL interview tutorial.

SQL tutorial for Data Scientists & Analysts

This tutorial covers SQL concepts such as WHERE with AND/OR/NOT and INTERCEPT/EXCEPT – both of these pop up routinely in SQL job interviews at Kakao.

Kakao Data Science Interview Tips

What Do Kakao Data Science Interviews Cover?

In addition to SQL interview questions, the other question categories covered in the Kakao Data Science Interview are:

Kakao Data Scientist

How To Prepare for Kakao Data Science Interviews?

The best way to prepare for Kakao Data Science interviews is by reading Ace the Data Science Interview. The book's got:

  • 201 Interview Questions from companies like Microsoft, Google & Amazon
  • A Crash Course on Python, SQL & ML
  • Great Reviews (900+ reviews, 4.5-star rating)

Ace the Data Science Interview by Nick Singh Kevin Huo