At TikTok (owned by parent company ByteDance), SQL is used all the damn time for analyzing user engagement data to make the app more addictive, and for studying the efficacy of it's video content recommendations algorithm. Unsurprisingly, this is why TikTok almost always evaluates jobseekers on SQL query questions during interviews for Data Science and Data Engineering positions.
So, if you're studying for a SQL Interview, we've collected 10 TikTok SQL interview questions to practice, which are similar to recently asked questions at TikTok – can you solve them?
Assume new TikTok users sign up with their emails. They confirmed their signup by replying to the text confirmation to activate their accounts. Users may receive multiple text messages for account confirmation until they have confirmed their new account.
A senior analyst is interested to know the activation rate of specified users in the table. Write a SQL query to find the activation rate percentage, and round your answer to 2 decimal places.
Definitions:
Assumptions:
To explore the given dataset, and write code online to solve the problem and have your solution graded, click here:
Column Name | Type |
---|---|
email_id | integer |
user_id | integer |
signup_date | datetime |
email_id | user_id | signup_date |
---|---|---|
125 | 7771 | 06/14/2022 00:00:00 |
236 | 6950 | 07/01/2022 00:00:00 |
433 | 1052 | 07/09/2022 00:00:00 |
Column Name | Type |
---|---|
text_id | integer |
email_id | integer |
signup_action | varchar |
text_id | email_id | signup_action |
---|---|---|
6878 | 125 | Confirmed |
6920 | 236 | Not Confirmed |
6994 | 236 | Confirmed |
'Confirmed' in means the user has activated their account and successfully completed the signup process.
confirm_rate |
---|
0.67 |
Example Output Explanation: 67% of users have successfully completed their signup and activated their accounts. The remaining 33% have not yet replied to the text to confirm their signup.
As a Data Analyst for TikTok, you've been asked to identify the users who are the most active on the platform. "Activity" in this context is defined by the number of videos a user uploads.
A "power user" is someone who has uploaded more than 1000 videos. Write a SQL query to list all of the power users, sorted by the number of videos they have posted in descending order.
The company has two tables, and :
user_id | username | signup_date |
---|---|---|
1 | user1 | 01/01/2020 |
2 | user2 | 02/02/2020 |
3 | user3 | 05/05/2020 |
4 | user4 | 12/12/2020 |
video_id | user_id | upload_date |
---|---|---|
1001 | 1 | 01/02/2020 |
1002 | 1 | 01/03/2020 |
1003 | 2 | 02/03/2020 |
1004 | 3 | 03/03/2020 |
1005 | 4 | 04/04/2020 |
1006 | 4 | 05/04/2020 |
1007 | 4 | 05/04/2020 |
1008 | 4 | 05/04/2020 |
1009 | 3 | 06/04/2020 |
1010 | 2 | 07/07/2020 |
This SQL query first performs an inner join on and tables using as the joining column. It then groups by to get the total count of (which represents the number of videos posted by each user). The clause filters out any users who have uploaded less than 1000 videos. Finally, it orders the remaining users by in decreasing order, giving the 'power users' at TikTok who have uploaded the most videos.
In database schema design, a one-to-one relationship is when each entity is associated with only one instance of the other. For instance, a US citizen's relationship with their social-security number (SSN) is one-to-one because each citizen can only have one SSN, and each SSN belongs to one person.
A one-to-many relationship, on the other hand, is when one entity can be associated with multiple instances of the other entity. An example of this is the relationship between a person and their email addresses - one person can have multiple email addresses, but each email address only belongs to one person.
New users on TikTok sign up using their email addresses, and upon sign-up, each user receives a text message confirmation to activate their account. Write a SQL query to output the user IDs of those who did not confirm their sign-up on the first day, but confirmed on the second day.
To practice this problem interactively, and explore the input data yourself, visit the 2nd Day Confirmation Problem on DataLemur:
Column Name | Type |
---|---|
email_id | integer |
user_id | integer |
signup_date | datetime |
email_id | user_id | signup_date |
---|---|---|
125 | 7771 | 06/14/2022 00:00:00 |
433 | 1052 | 07/09/2022 00:00:00 |
Column Name | Type |
---|---|
text_id | integer |
email_id | integer |
signup_action | string ('Confirmed', 'Not confirmed') |
action_date | datetime |
Hint: refers to the date when users activated their accounts and confirmed their sign-up through text messages.
text_id | email_id | signup_action | action_date |
---|---|---|---|
6878 | 125 | Confirmed | 06/14/2022 00:00:00 |
6997 | 433 | Not Confirmed | 07/09/2022 00:00:00 |
7000 | 433 | Confirmed | 07/10/2022 00:00:00 |
user_id |
---|
1052 |
Only User 1052 confirmed their sign-up on the second day.
For a step-by-step solution to this problem, go here.
A database view is a virtual table that is created based on the results of a SELECT statement, and provides you a customized, read-only version of your data that you can query just like a regular table.
You'd want to use a view for a few reasons:
On TikTok, each user can post several videos. For business decisions, it's often necessary to know the average duration of these videos to better understand the user engagement. For instance, if the average video duration is short, it could indicate that users on the platform prefer shorter, more concise content. Alternatively, longer average video lengths could infer that users enjoy or are more engaged with longer-form content. Calculate the average video duration for each TikTok user using the provided database tables.
Provided below is a snapshot of your 'users' table and 'videos' table:
user_id | username | signup_date |
---|---|---|
101 | user1 | 06/01/2020 |
102 | user2 | 06/03/2020 |
103 | user3 | 06/05/2020 |
video_id | user_id | upload_date | video_length_seconds |
---|---|---|---|
201 | 101 | 06/08/2022 | 60 |
202 | 101 | 06/10/2022 | 120 |
203 | 102 | 06/18/2022 | 90 |
204 | 103 | 07/26/2022 | 100 |
205 | 103 | 07/05/2022 | 120 |
This script first joins the 'videos' table and 'users' table on 'user_id', then calculates the average video length for each user. The AVG() function is used to calculate the average video length. GROUP BY is used to group the results by each unique user.
To practice a very similar question try this interactive Twitter Tweets' Rolling Averages Question which is similar for calculating averages based on user content or this TikTok Signup Activation Rate Question which is similar for deriving metrics from TikTok data.
NULLs are NOT the same as zero or blank spaces in SQL. NULLs are used to represent a missing value or the abscence of a value, whereas zero and blank space are legitimate values.
It's important to handle NULLs carefully, because they can mess up your analysis very easily. For example, if you compare a NULL value using the = operator, the result will always be NULL (because just like Drake, nothing be dared compared to NULL). That's why many data analysis in SQL start with removing NULLs using the function.
You have been given access to the TikTok database and you are tasked with the following:
TikTok has two primary tables -- and . Each row in the table represents a unique user on the platform, while each row in the table represents a unique video that has been uploaded on the platform. A video can be uploaded by a user, and the same user can 'like' or 'share' other videos, including their own.
Write a SQL query that shows the top 5 Users who have uploaded the videos that have received the most 'likes'. The output should display the User ID, the total number of videos they have uploaded, and the total number of 'likes' their videos have collectively received.
The two tables are structured as follows:
user_id | username | country | join_date |
---|---|---|---|
1 | 'user1' | 'USA' | '2021-01-01' |
2 | 'user2' | 'Canada' | '2021-02-01' |
3 | 'user3' | 'UK' | '2021-01-31' |
4 | 'user4' | 'USA' | '2021-01-30' |
5 | 'user5' | 'Canada' | '2021-01-15' |
video_id | upload_date | user_id | video_likes |
---|---|---|---|
101 | '2021-01-01' | 1 | 500 |
102 | '2021-02-01' | 2 | 1000 |
103 | '2021-02-01' | 1 | 1500 |
104 | '2021-03-01' | 3 | 2000 |
105 | '2021-03-01' | 4 | 250 |
106 | '2021-04-01' | 5 | 5000 |
This SQL query first joins the 'Users' and 'Videos' tables on the field so that information about which user uploaded each video (from the 'Users' table) can be combined with information about each video (from the 'Videos' table).
The clause is then used to group the combined table by , so that the subsequent aggregation functions (COUNT and SUM) can be applied for each user. The COUNT function counts the total number of videos uploaded by the user, while the SUM function sums the total number of likes received by all of their videos.
Finally, the clause sorts the table based on the total number of likes, in descending order, and the clause displays only the top 5 users.
Since joins come up frequently during SQL interviews, try an interactive Spotify JOIN SQL question:
Using the data from the video and user interactions on TikTok, calculate the average, maximum, and minimum durations of videos watched by users, rounded to the nearest whole number. Additionally, calculate the square root of the total number of likes given by users and present it as totalLikesSQRT. Assume we only have data for a single day.
Here are the data tables:
video_id | duration_secs |
---|---|
001 | 60 |
002 | 45 |
003 | 75 |
004 | 120 |
005 | 30 |
user_id | video_id | watched_duration_secs |
---|---|---|
123 | 001 | 60 |
265 | 002 | 30 |
362 | 003 | 55 |
192 | 004 | 120 |
981 | 005 | 25 |
user_id | video_id | liked |
---|---|---|
123 | 001 | TRUE |
265 | 002 | FALSE |
362 | 003 | TRUE |
192 | 004 | TRUE |
981 | 005 | TRUE |
avg_watched_duration | max_watched_duration | min_watched_duration | totalLikesSQRT |
---|---|---|---|
58 | 120 | 25 | 2 |
The above SQL code first calculates the average, max, and min watched durations per video rounding the average to the nearest whole number. We then join this with the user_likes table on both user_id and video_id to get the liked videos for each user. The likes are then calculated as a square root of the total likes in the database. Note, CASE statement is used inside the SUM to count 'TRUE' values in the 'liked' column.
In case this problem was difficult, strengthen your SQL foundations with this free SQL tutorial which has 30+ lessons including one on SQL math functions.
You are working as a Data Analyst for TikTok. Some videos go viral suddenly after a period of time. Your task is to find for each User_Id, the video (Video_Id) with the Maximun number of likes (Likes) per day (Date). Note that some users might have multiple videos in a day, and the result need to show only the first uploaded video in case of tie on likes count.
User_Id | Video_Id | Date | Likes |
---|---|---|---|
101 | VV567 | 2022-10-01 | 150 |
101 | VV234 | 2022-10-01 | 80 |
101 | VV890 | 2022-10-01 | 150 |
102 | VV101 | 2022-10-01 | 300 |
102 | VV111 | 2022-10-01 | 200 |
101 | VV123 | 2022-10-02 | 100 |
101 | VV456 | 2022-10-02 | 120 |
102 | VV789 | 2022-10-02 | 500 |
This SQL query first assigns a unique row number within each group of User_Id and Date based on descending number of likes and ascending order of Video Id (to account for the same Like counts within a day for a user). This is achieved by using the SQL Window Function 'ROW_NUMBER()'.
Then, we select only those rows where the row number is 1, i.e. we only pick the first video in case of tie on likes count. Hence, we get a row for each User_Id, Date representing the video with highest Likes count uploaded first for the day.
To solve a similar window function question on DataLemur's free interactive coding environment, try this Google SQL Interview Question:
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the TikTok SQL interview is to solve as many practice SQL interview questions as you can! In addition to solving the earlier TikTok SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Google, Facebook, Microsoft and Amazon.
Each DataLemur SQL question has multiple hints, fully explained answers along with a discussion board to see how others solved it and most importantly, there is an online SQL code editor so you can easily right in the browser your SQL query and have it checked.
To prep for the TikTok SQL interview it is also helpful to solve SQL problems from other like:
However, if your SQL foundations are weak, don't worry about diving straight into solving questions – refresh your SQL knowledge with this DataLemur SQL Tutorial.
This tutorial covers things like creating summary stats with GROUP BY and manipulating date/time data – both of these come up routinely during SQL job interviews at TikTok.
For the TikTok Data Science Interview, in addition to SQL query questions, the other types of questions to prepare for are:
The best way to prepare for TikTok Data Science interviews is by reading Ace the Data Science Interview. The book's got:
You should also read the blog "5 TikTok Data Science Interview Questions & Interview Prep Guide".
Another way to help prepare to to stay up to date with current data science trends. Stay up to date with things happening in the industry with these 11 Data Science Newsletters.