At Bilibili, SQL does the heavy lifting for analyzing user behavior the TV-watching app's user-experience. That's why Bilibili LOVES to ask SQL questions during interviews for Data Science and Data Engineering positions.
So, to help you ace the Bilibili SQL interview, here’s 8 Bilibili SQL interview questions – able to answer them all?
Assume we have a table called videos
that stores data about every video uploaded to Bilibili. Besides the unique video id, we also know who uploaded it (uploader_id
), when it was uploaded (upload_date
), total views it got (total_views
) and total likes it received (total_likes
).
Here is your challenge: Write a SQL query to calculate the likes/views ratio of each video and return videos that have this ratio greater than the average ratio of videos uploaded on the same day.
videos
Example Input:video_id | uploader_id | upload_date | total_views | total_likes |
---|---|---|---|---|
101 | 1 | 07/01/2021 | 5000 | 1000 |
202 | 2 | 07/01/2021 | 6000 | 800 |
303 | 3 | 07/01/2021 | 4000 | 2000 |
404 | 4 | 07/02/2021 | 2000 | 500 |
505 | 5 | 07/02/2021 | 7000 | 1400 |
WITH avg_ratio AS ( SELECT upload_date, AVG(total_likes::float / total_views) OVER (PARTITION BY upload_date) as avg_daily_ratio FROM videos ) SELECT v.*, total_likes::float / total_views as video_like_view_ratio FROM videos v JOIN avg_ratio ar ON v.upload_date = ar.upload_date WHERE total_likes::float / total_views > ar.avg_daily_ratio
This solution first calculates the average likes/views ratio for each day (avg_daily_ratio) using a window function in the CTE named avg_ratio
. Then it joins the original videos
table with avg_ratio
on upload_date
and selects the videos where the likes/views ratio is greater than avg_daily_ratio of the same day. The total_likes::float / total_views
expression is used to calculate the video's like/view ratio.
p.s. Window functions show up pretty often during SQL interviews, so practice the 27+ window function questions on DataLemur
Bilibili is a popular video-sharing platform where users can submit, view and comment on user-submitted videos. Users can also like or dislike the videos. Design a database for recording such interactions and answer the following question:
"Can you find the top 5 most popular videos for the month of July 2022 (popularity determined by the number of views)?"
Assume two tables to illustrate the user interaction on the platform - 'Videos' and 'VideoViews'.
Videos
Tablevideo_id | Title | Uploader | Upload_Date |
---|---|---|---|
1 | Learning SQL | JohnDoe | 2020-06-04 |
2 | PostgreSQL Tutorial | JaneDoe | 2020-07-12 |
3 | Advanced SQL Queries | JohnDoe | 2021-05-12 |
VideoViews
Tableview_id | video_id | user_id | View_Date |
---|---|---|---|
1000 | 1 | 101 | 2022-06-08 |
1001 | 1 | 102 | 2022-07-08 |
1002 | 2 | 102 | 2022-07-12 |
1003 | 3 | 103 | 2022-07-15 |
1004 | 3 | 104 | 2022-07-28 |
SELECT V.Title AS "Video Title", COUNT(VV.view_id) AS "View count" FROM Videos V JOIN VideoViews VV ON V.video_id = VV.video_id WHERE VV.View_Date >= '2022-07-01' AND VV.View_Date < '2022-08-01' GROUP BY V.Title ORDER BY COUNT(VV.view_id) DESC LIMIT 5;
This query joins the Videos and VideoViews tables on video_id (the primary key for Videos and a foreign key for VideoViews). Then, it filters the records to contain only the views from the month of July 2022. The 'GROUP BY' clause groups the number of views for each video, and the 'ORDER BY' clause orders this data in descending order to fetch the 5 most viewed videos.
HAVING
and WHERE
differ?The HAVING
clause serves as a filter for the groups created by the GROUP BY
clause, similar to how the WHERE
clause filters rows. However, HAVING
is applied to groups rather than individual rows.
For example, say you were a data analyst at Bilibili trying to understand how sales differed by region:
SELECT region, SUM(sales) FROM bilibili_sales WHERE date > '2023-01-01' GROUP BY region HAVING SUM(sales) > 400000;
This query retrieves the total sales for all products in each region, and uses the WHERE
clause to only sales made after January 1, 2023. The rows are then grouped by region and the HAVING
clause filters the groups to include only those with total sales greater than $400,000.
Bilibili is an online video platform with a vibrant community. One of the features on the platform is the ability for users to subscribe to different categories like "Animation", "Music", "Gaming", etc. Bilibili likes to track its user subscriptions, especially active users who have logged in within the last week.
You are given two tables: users
and subscriptions
. The users
table contains information about the user's last login date. The subscriptions
table contains information about the categories each user is subscribed to.
Write a query to filter active users who are subscribed to the "Animation" category.
users
Example Input:user_id | last_login_date |
---|---|
1 | 2022-08-20 |
2 | 2022-05-15 |
3 | 2022-08-22 |
4 | 2022-06-10 |
5 | 2022-08-18 |
subscriptions
Example Input:subscription_id | user_id | subscription_category |
---|---|---|
101 | 1 | "Animation" |
102 | 2 | "Music" |
103 | 3 | "Animation" |
104 | 4 | "Gaming" |
105 | 5 | "Animation" |
SELECT u.user_id, s.subscription_category FROM users u JOIN subscriptions s ON u.user_id = s.user_id WHERE s.subscription_category = 'Animation' AND u.last_login_date > current_date - interval '7 days';
In this query, we are joining the users
and subscriptions
tables on the user_id
column. We then filter the records to only include those users who are subscribed to the 'Animation' category and have logged in within the last week. The current_date - interval '7 days'
condition checks for active users in the last week.
When designing a database schema, a one-to-one relationship between two entities is characterized by each entity being related to a single instance of the other. An example of this is the relationship between a car and a license plate - each car has one license plate, and each license plate belongs to one car.
On the other hand, a one-to-many relationship is when one entity can be associated with multiple instances of the other entity. For example, a person can have multiple email addresses, but each email address only relates back to one person.
As a data analyst for Bilibili, you often have to track the performance of various marketing efforts. For one such effort, your team is running several digital ads which lead users to video content on the site. You've been asked to calculate the click-through rate (CTR), which is the percentage of ad impressions that led to a click, for each ad for the month of August, 2022.
Assume you have access to two tables: AdImpressions
and AdClicks
.
AdImpressions
Example Input:impression_id | ad_id | impression_date | user_id |
---|---|---|---|
1 | 101 | 08/01/2022 00:00:00 | 123 |
2 | 102 | 08/01/2022 00:00:00 | 456 |
3 | 101 | 08/02/2022 00:00:00 | 789 |
4 | 101 | 08/02/2022 00:00:00 | 321 |
5 | 102 | 08/03/2022 00:00:00 | 654 |
AdClicks
Example Input:click_id | ad_id | click_date | user_id |
---|---|---|---|
1 | 101 | 08/01/2022 00:00:00 | 123 |
2 | 101 | 08/02/2022 00:00:00 | 789 |
3 | 102 | 08/03/2022 00:00:00 | 654 |
WITH impression_summary AS ( SELECT ad_id, COUNT(impression_id) AS total_impressions FROM AdImpressions WHERE DATE_TRUNC('month', impression_date) = DATE '2022-08-01' GROUP BY ad_id ), click_summary AS ( SELECT ad_id, COUNT(click_id) AS total_clicks FROM AdClicks WHERE DATE_TRUNC('month', click_date) = DATE '2022-08-01' GROUP BY ad_id ) SELECT i.ad_id, total_impressions, total_clicks, total_clicks::decimal / total_impressions as CTR FROM impression_summary i JOIN click_summary c ON i.ad_id = c.ad_id;
With the above query, we first create two summary tables; one for impressions and one for clicks. We use the date_trunc function to ensure that we are only considering data from August. Then, we join these two tables together and calculate the CTR for each ad_id.
To solve a similar problem on DataLemur's free interactive SQL code editor, attempt this Facebook SQL Interview question:
A foreign key is a field in a database table that serves as a reference to the primary key of another table, allowing for the creation of a relationship between the two tables.
For a concrete example, let's inspect employee data from Bilibili's HR database:
bilibili_employees
:
+------------+------------+------------+------------+ | employee_id| first_name | last_name | manager_id | +------------+------------+------------+------------+ | 1 | Aubrey | Graham | 3 | | 2 | Marshal | Mathers | 3 | | 3 | Dwayne | Carter | 4 | | 4 | Shawn | Carter | | +------------+------------+------------+------------+
In this table, employee_id
is the primary key, and is used to uniquely identify each row.
manager_id
could be a foreign key. It references the employee_id
of the manager of each employee in the table, establishing a relationship between the employees and their managers. This foreign key allows you to easily query the table to find out who a specific employee's manager is, or to find out which employees report to a specific manager.
It is possible for a table to have multiple foreign keys that reference primary keys in different tables. For instance, the bilibili_employees
table could have additional foreign keys for the department_id
of the department where an employee works and the location_id
of the employee's location.
Bilibili is trying to identify the average amount its customers spend on each content category it offers including Anime, Gaming, Music and Tech. Here are the two tables you have at your disposal.
customers
Example Input:user_id | first_name | last_name | registration_date |
---|---|---|---|
123 | John | Smith | 01/01/2020 |
265 | Alice | Clark | 09/15/2019 |
362 | Bob | Harris | 05/20/2020 |
192 | Eve | Nelson | 12/31/2019 |
981 | Oliver | Green | 03/01/2020 |
purchases
Example Input:transaction_id | user_id | purchase_date | category | amount_spent |
---|---|---|---|---|
1001 | 123 | 06/09/2021 | Anime | 25.50 |
1050 | 123 | 06/10/2021 | Gaming | 30.00 |
2080 | 265 | 06/18/2021 | Tech | 45.20 |
3015 | 192 | 07/26/2021 | Music | 22.10 |
2950 | 981 | 07/05/2021 | Anime | 25.00 |
We want the output to show each user's name with the breakdown of their average spending in each category.
first_name | last_name | category | avg_spending |
---|---|---|---|
John | Smith | Anime | 25.50 |
John | Smith | Gaming | 30.00 |
Alice | Clark | Tech | 45.20 |
Eve | Nelson | Music | 22.10 |
Oliver | Green | Anime | 25.00 |
SELECT c.first_name, c.last_name, p.category, AVG(p.amount_spent) as avg_spending FROM customers c JOIN purchases p ON c.user_id = p.user_id GROUP BY c.first_name, c.last_name, p.category;
This query joins the customers
and purchases
tables on the user_id
field (which exists in both tables) and then calculates the average spending by category for each user. The GROUP BY
clause groups the records by users' first and last names as well as the purchase category, upon which the average spending is then calculated.
Because joins come up so often during SQL interviews, try an interactive Spotify JOIN SQL question:
The best way to prepare for a Bilibili SQL interview is to practice, practice, practice.
Besides solving the above Bilibili SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Google, Uber, and Microsoft.
Each interview question has hints to guide you, detailed solutions and crucially, there's an interactive SQL code editor so you can right in the browser run your query and have it checked.
To prep for the Bilibili SQL interview it is also useful to practice SQL questions from other tech companies like:
However, if your SQL foundations are weak, don't worry about diving straight into solving questions – improve your SQL foundations with this DataLemur SQL Tutorial.
This tutorial covers topics including filtering data with WHERE and creating summary stats with GROUP BY – both of these come up often during SQL interviews at Bilibili.
In addition to SQL query questions, the other topics covered in the Bilibili Data Science Interview are:
The best way to prepare for Bilibili Data Science interviews is by reading Ace the Data Science Interview. The book's got: