logo

10 Kuaishou SQL Interview Questions (Updated 2024)

Updated on

February 14, 2024

At Kuaishou Technology, SQL does the heavy lifting for querying and manipulating user engagement data for personalized in-app recommendations, and managing advertisement campaign data to enhance monetization strategies. That's why Kuaishou asks SQL problems in interviews for Data Analytics, Data Science, and Data Engineering jobs.

As such, to help you practice for the Kuaishou SQL interview, here’s 10 Kuaishou Technology SQL interview questions – can you solve them?

10 Kuaishou Technology SQL Interview Questions

SQL Question 1: Identify Kuaishou's VIP Users

Kuaishou is a popular short video sharing and live streaming platform. An important indicator of a VIP user (also referred to as power users or whale users) in such a platform could be based on their level of engagement i.e., frequency of video submissions and viewership. Write a SQL query to identify the users who have posted a large number of videos and have garnered huge viewership in the last six months.

Assuming two tables: and .

Example Input:
user_idusernameregistration_date
1010user101/05/2019
2022user229/07/2020
3033user317/03/2018
4044user401/12/2019
5055user527/02/2022
Example Input:
video_iduser_idpublish_dateview_count
1001101001/08/202250001
2002202205/08/202269852
3003303306/07/202250002
4004404407/06/202269853
5005505508/08/202250003

Answer:


This query joins users and videos tables on and filters the videos that were published within the last six months. It then groups the results by and , and filters for those users have posted more than 100 videos and have garnered total viewership of more than 100,000 views in the last six months. The users are then ordered by total viewership and video count in descending order. The resulting list of users are likely the VIP users on the platform.

To work on another SQL customer analytics question where you can solve it right in the browser and have your SQL query instantly executed, try this Walmart SQL Interview Question: Walmart Labs SQL Interview Question

SQL Question 2: User Activity Analysis

In an effort to improve user engagement on Kuaishou, you have been tasked with analyzing user activity. Specifically, the company wants to understand the number of consecutive days each user has been active on Kuaishou.

For each user_id, you are required to calculate the maximum number of consecutive days they have been "active". A user is considered "active" on a day if they have at least one video view on that day.

Data has been pulled from the databases logging all video views for the last year. Each view has been logged in the table 'user_activity' with corresponding user_id and the date of the view.

The 'user_activity' table structure is illustrated below:

Example Input:
event_iduser_idevent_date
10150012022-01-01
10250012022-01-02
10350012022-01-04
10450012022-01-05
10550012022-01-06
10650022022-01-01
10750022022-01-03
10850022022-01-04
Example Output:
user_idmax_consecutive_days
50013
50022

Answer:

In order to solve this question, we can use PostgreSQL's window functions to calculate differences between current and previous row - representing consecutive active days.


In the query above, we first create a CTE 'activity' to calculate the difference between the consecutive activity days for each user. Then in the 'consecutive_activity' CTE we label them as consecutive or not. Then in the 'cumulative' CTE, we find the cumulative sum over the rows for the same user. Finally, we find the maximum from all these cumulative sums group by 'user_id'. This will give us the maximum consecutive active days for each user.

To practice a related window function SQL problem on DataLemur's free interactive SQL code editor, try this Amazon SQL Interview Question: Amazon Business Intelligence SQL Question

SQL Question 3: What is denormalization, and in what situations might it be a useful?

Database denormalization is when you add redundancy to a database, and break typical normalization rules (specified by 1st, 2nd, 3rd normal forms). There's a few reasons to denormalize a database:

  • Improved performance: Denormalization often reduces the the number of costly join operations that are needed to retrieve data. This is helpful when the database is being used for OLAP (Online Analytical Processing) use cases, as joins can be expensive and slow.

  • Simplification: Denormalization can also be used to simplify the design of a database by reducing the number of tables and relationships that need to be managed. This can make it easier to understand and maintain the database.

  • Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.

While denormalization can be a useful tool for improving performance and scalability, it's important to keep in mind that it can make update and delete operations more complex. This is because denormalization can create data duplicates, which can make it harder to maintain data integrity. So, to avoid any headaches, it's a good idea to start with a well-normalized database design and then consider denormalization only if it's absolutely necessary for your specific performance and scalability needs. In other words, denormalization can be a bit of a wild card, so it's best to handle it with care!

Kuaishou Technology SQL Interview Questions

SQL Question 4: Filter Users Based on Activity and Join Date

As a data analyst at Kuaishou, your task is to filter users who joined in the past year and have more than five uploads every month. Furthermore, information about users who did not upload any videos is also requested.

Example Input:
user_idusernamejoin_date
1user12021-07-10
2user22022-01-15
3user32020-02-23
4user42021-08-27
Example Input:
upload_iduser_idupload_date
20112022-09-20
20212022-09-23
20322022-09-26
20432022-09-20
20532022-09-23
20612022-08-20
20722022-08-22
20822022-08-23
20922022-08-27
21022022-08-30

Your result should list the users, their join date, and the total number of uploads for users who joined in the past year and have more than five uploads every month. Pivot this info with null records for users who did not upload any videos.

Answer:


This command first joins the and tables on the field. It then filters for users who joined in the last year using the clause. The clause counts the number of uploads for each user and only keeps users with more than five uploads in a month. Finally, it returns the user's ID, username, join date, and total number of uploads, ordering from most to least uploads.

SQL Question 5: In the context of databases, what does atomicity, consistency, isolation, and durability mean?

A DBMS (database management system), in order to ensure transactions are relaible and correct, tries to mantain the following ACID properties: Atomicity, Consistency, Isolation, and Durability

Here is what each of the ACID properties stands for:

Atomicity: ensures that a transaction is either completed in its entirety, or not completed at all. If a transaction fails halfway, the database does a rollback on the commit.

Consistency: ensures that a transaction will only be completed if adheres to the constraints defined in the database of the DB.

Isolation: ensures that concurrent transactions are isolated from each one another, so that the changes made by one transaction cannot be seen by a 2nd transaction until the 1st transaction is done.

**Durability: ** ensures that once a transaction has been committed, the database permanently stores the results in the DB.

As you can see, it's pretty important for Kuaishou's data systems to be ACID compliant, else they'll be a big problem for their customers!

SQL Question 6: Average Daily Views of Videos

Kuaishou is a popular social video-sharing app. For this SQL interview question, consider that you are an analyst for Kuaishou and tasked to monitor the performance of videos. You are asked to find the average daily views for each video within a specific month.

Example Input:
video_iduser_idview_dateviews
10178906/01/2022350
10256706/02/2022500
10334506/03/2022700
10178906/02/2022450
10256706/03/2022600
10178906/03/2022700
10334506/02/2022800
10256706/01/2022500
10178906/02/2022600
10334506/03/2022900
Example Output:
mthvideoaverage_daily_views
6101600
6102550
6103800

Answer:


This query works by first extracting the month from the column and then grouping by both the month and the . The function is then applied to the column to find the average number of views per day for each video. The clause is used to filter for the specific month we are interested in.

To practice a very similar question try this interactive Facebook App Click-through Rate (CTR) Question which is similar for querying specific metrics over a time period or this New York Times Laptop vs. Mobile Viewership Question which is similar for tracking specific content views.

SQL Question 7: What is a SQL constraint?

The UNIQUE constraint makes sure that all values in a column are distinct. It is often paired with other constraints, like NOT NULL, to ensure that the data follows certain rules.

For example, say you were an analyst on the marketing team at Kuaishou, and had access to a database on marketing campaigns:


In this example, the UNIQUE constraint is applied to the "campaign_name" field to ensure that each campaign has a unique name. This helps to ensure the integrity of the data in the table and prevents errors that could occur if two campaigns had the same name.

SQL Question 8: Calculate Average Video Likes per Month

Kuaishou is a social video-sharing app, comparable to TikTok, where users can create, like, and share videos. For a given month, suppose Kuaishou wants to find out the average number of likes each video received.

You will have a table that records all the videos created, and a table that records all the likes on videos. Group videos by their creation month and find the average likes per video for each group.

Example Input:
video_iduser_idcreate_date
10019812022-01-15
20021232022-01-18
30032652022-02-01
40043622022-02-03
50051922022-03-18
Example Input:
like_iduser_idvideo_idlike_date
810012310012022-01-16
820026510012022-01-17
830036220022022-01-19
840019240042022-02-04
850098150052022-03-19
860012350052022-03-20

Answer:


This query first groups likes by to count the number of likes per video. The outer query then joins this result with the table, allowing us to group by the month the video was created. We then compute the average number of likes for videos in each month.

SQL Question 9: Filtering Kuaishou User Data

Your task as a data analyst for the company Kuaishou, is to find all users whose email addresses end with . Assume the table contains information on all users of Kuaishou's digital services.

The table is structured as follows:

Example Input:
user_idfirst_namelast_nameemail
123JohnDoejohndoe@kuaishou.com
265JaneSmithjanesmith@yahoo.com
362PaulBrownpaulbrown@gmail.com
192EmilyJohnsonemilyjohnson@kuaishou.com
981RobertDavisrobertdavis@kuaishou.com

You need to deliver a table with all user details for users with a email domain.

The expected output will look as follows:

Example Output:
user_idfirst_namelast_nameemail
123JohnDoejohndoe@kuaishou.com
192EmilyJohnsonemilyjohnson@kuaishou.com
981RobertDavisrobertdavis@kuaishou.com

Answer:

Your task now is to write a SQL query that would select only those users from the table whose email ends with . Here is how you would do that:


This SQL command leverages the keyword to filter data in the column of the table. The character is a wildcard that matches any number of characters. So will match any email that ends with regardless of what comes before it.

SQL Question 10: What's the difference between the and window function?

Both the and window functions are used to access a row at a specific offset from the current row.

However, the function retrieves a value from a row that follows the current row, whereas the function retrieves a value from a row that precedes the current row.

Often, the offset for both functions is 1, which gives access to the immediately following/preceding row. Here's a SQL query example:


Kuaishou SQL Interview Tips

Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Kuaishou SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the earlier Kuaishou SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Microsoft, Google, Amazon, and tech startups. DataLemur SQL Interview Questions

Each exercise has multiple hints, step-by-step solutions and most importantly, there's an interactive SQL code editor so you can right in the browser run your query and have it graded.

To prep for the Kuaishou SQL interview it is also wise to solve interview questions from other tech companies like:

But if your SQL coding skills are weak, forget about diving straight into solving questions – improve your SQL foundations with this DataLemur SQL Tutorial.

DataLemur SQL Course

This tutorial covers topics including filtering groups with HAVING and filtering data with boolean operators – both of these show up routinely during Kuaishou SQL interviews.

Kuaishou Technology Data Science Interview Tips

What Do Kuaishou Data Science Interviews Cover?

Besides SQL interview questions, the other types of questions to prepare for the Kuaishou Data Science Interview are:

  • Statistics and Probability Questions
  • Python or R Coding Questions
  • Data Case Study Questions
  • Machine Learning Questions
  • Behavioral Interview Questions

Kuaishou Data Scientist

How To Prepare for Kuaishou Data Science Interviews?

To prepare for Kuaishou Data Science interviews read the book Ace the Data Science Interview because it's got:

  • 201 interview questions sourced from tech companies like Google & Microsoft
  • a refresher covering Python, SQL & ML
  • over 900+ 5-star reviews on Amazon

Acing Data Science Interview