At Bilibili, SQL does the heavy lifting for analyzing user behavior the TV-watching app's user-experience. That's why Bilibili LOVES to ask SQL questions during interviews for Data Science and Data Engineering positions.
So, to help you ace the Bilibili SQL interview, here’s 8 Bilibili SQL interview questions – able to answer them all?
Assume we have a table called that stores data about every video uploaded to Bilibili. Besides the unique video id, we also know who uploaded it (), when it was uploaded (), total views it got () and total likes it received ().
Here is your challenge: Write a SQL query to calculate the likes/views ratio of each video and return videos that have this ratio greater than the average ratio of videos uploaded on the same day.
video_id | uploader_id | upload_date | total_views | total_likes |
---|---|---|---|---|
101 | 1 | 07/01/2021 | 5000 | 1000 |
202 | 2 | 07/01/2021 | 6000 | 800 |
303 | 3 | 07/01/2021 | 4000 | 2000 |
404 | 4 | 07/02/2021 | 2000 | 500 |
505 | 5 | 07/02/2021 | 7000 | 1400 |
This solution first calculates the average likes/views ratio for each day (avg_daily_ratio) using a window function in the CTE named . Then it joins the original table with on and selects the videos where the likes/views ratio is greater than avg_daily_ratio of the same day. The expression is used to calculate the video's like/view ratio.
p.s. Window functions show up pretty often during SQL interviews, so practice the 27+ window function questions on DataLemur
Bilibili is a popular video-sharing platform where users can submit, view and comment on user-submitted videos. Users can also like or dislike the videos. Design a database for recording such interactions and answer the following question:
"Can you find the top 5 most popular videos for the month of July 2022 (popularity determined by the number of views)?"
Assume two tables to illustrate the user interaction on the platform - 'Videos' and 'VideoViews'.
video_id | Title | Uploader | Upload_Date |
---|---|---|---|
1 | Learning SQL | JohnDoe | 2020-06-04 |
2 | PostgreSQL Tutorial | JaneDoe | 2020-07-12 |
3 | Advanced SQL Queries | JohnDoe | 2021-05-12 |
view_id | video_id | user_id | View_Date |
---|---|---|---|
1000 | 1 | 101 | 2022-06-08 |
1001 | 1 | 102 | 2022-07-08 |
1002 | 2 | 102 | 2022-07-12 |
1003 | 3 | 103 | 2022-07-15 |
1004 | 3 | 104 | 2022-07-28 |
This query joins the Videos and VideoViews tables on video_id (the primary key for Videos and a foreign key for VideoViews). Then, it filters the records to contain only the views from the month of July 2022. The 'GROUP BY' clause groups the number of views for each video, and the 'ORDER BY' clause orders this data in descending order to fetch the 5 most viewed videos.
The clause serves as a filter for the groups created by the clause, similar to how the clause filters rows. However, is applied to groups rather than individual rows.
For example, say you were a data analyst at Bilibili trying to understand how sales differed by region:
This query retrieves the total sales for all products in each region, and uses the clause to only sales made after January 1, 2023. The rows are then grouped by region and the clause filters the groups to include only those with total sales greater than $400,000.
Bilibili is an online video platform with a vibrant community. One of the features on the platform is the ability for users to subscribe to different categories like "Animation", "Music", "Gaming", etc. Bilibili likes to track its user subscriptions, especially active users who have logged in within the last week.
You are given two tables: and . The table contains information about the user's last login date. The table contains information about the categories each user is subscribed to.
Write a query to filter active users who are subscribed to the "Animation" category.
user_id | last_login_date |
---|---|
1 | 2022-08-20 |
2 | 2022-05-15 |
3 | 2022-08-22 |
4 | 2022-06-10 |
5 | 2022-08-18 |
subscription_id | user_id | subscription_category |
---|---|---|
101 | 1 | "Animation" |
102 | 2 | "Music" |
103 | 3 | "Animation" |
104 | 4 | "Gaming" |
105 | 5 | "Animation" |
In this query, we are joining the and tables on the column. We then filter the records to only include those users who are subscribed to the 'Animation' category and have logged in within the last week. The condition checks for active users in the last week.
When designing a database schema, a one-to-one relationship between two entities is characterized by each entity being related to a single instance of the other. An example of this is the relationship between a car and a license plate - each car has one license plate, and each license plate belongs to one car.
On the other hand, a one-to-many relationship is when one entity can be associated with multiple instances of the other entity. For example, a person can have multiple email addresses, but each email address only relates back to one person.
As a data analyst for Bilibili, you often have to track the performance of various marketing efforts. For one such effort, your team is running several digital ads which lead users to video content on the site. You've been asked to calculate the click-through rate (CTR), which is the percentage of ad impressions that led to a click, for each ad for the month of August, 2022.
Assume you have access to two tables: and .
impression_id | ad_id | impression_date | user_id |
---|---|---|---|
1 | 101 | 08/01/2022 00:00:00 | 123 |
2 | 102 | 08/01/2022 00:00:00 | 456 |
3 | 101 | 08/02/2022 00:00:00 | 789 |
4 | 101 | 08/02/2022 00:00:00 | 321 |
5 | 102 | 08/03/2022 00:00:00 | 654 |
click_id | ad_id | click_date | user_id |
---|---|---|---|
1 | 101 | 08/01/2022 00:00:00 | 123 |
2 | 101 | 08/02/2022 00:00:00 | 789 |
3 | 102 | 08/03/2022 00:00:00 | 654 |
With the above query, we first create two summary tables; one for impressions and one for clicks. We use the date_trunc function to ensure that we are only considering data from August. Then, we join these two tables together and calculate the CTR for each ad_id.
To solve a similar problem on DataLemur's free interactive SQL code editor, attempt this Facebook SQL Interview question:
A foreign key is a field in a database table that serves as a reference to the primary key of another table, allowing for the creation of a relationship between the two tables.
For a concrete example, let's inspect employee data from Bilibili's HR database:
:
+------------+------------+------------+------------+ | employee_id| first_name | last_name | manager_id | +------------+------------+------------+------------+ | 1 | Aubrey | Graham | 3 | | 2 | Marshal | Mathers | 3 | | 3 | Dwayne | Carter | 4 | | 4 | Shawn | Carter | | +------------+------------+------------+------------+
In this table, is the primary key, and is used to uniquely identify each row.
could be a foreign key. It references the of the manager of each employee in the table, establishing a relationship between the employees and their managers. This foreign key allows you to easily query the table to find out who a specific employee's manager is, or to find out which employees report to a specific manager.
It is possible for a table to have multiple foreign keys that reference primary keys in different tables. For instance, the table could have additional foreign keys for the of the department where an employee works and the of the employee's location.
Bilibili is trying to identify the average amount its customers spend on each content category it offers including Anime, Gaming, Music and Tech. Here are the two tables you have at your disposal.
user_id | first_name | last_name | registration_date |
---|---|---|---|
123 | John | Smith | 01/01/2020 |
265 | Alice | Clark | 09/15/2019 |
362 | Bob | Harris | 05/20/2020 |
192 | Eve | Nelson | 12/31/2019 |
981 | Oliver | Green | 03/01/2020 |
transaction_id | user_id | purchase_date | category | amount_spent |
---|---|---|---|---|
1001 | 123 | 06/09/2021 | Anime | 25.50 |
1050 | 123 | 06/10/2021 | Gaming | 30.00 |
2080 | 265 | 06/18/2021 | Tech | 45.20 |
3015 | 192 | 07/26/2021 | Music | 22.10 |
2950 | 981 | 07/05/2021 | Anime | 25.00 |
We want the output to show each user's name with the breakdown of their average spending in each category.
first_name | last_name | category | avg_spending |
---|---|---|---|
John | Smith | Anime | 25.50 |
John | Smith | Gaming | 30.00 |
Alice | Clark | Tech | 45.20 |
Eve | Nelson | Music | 22.10 |
Oliver | Green | Anime | 25.00 |
This query joins the and tables on the field (which exists in both tables) and then calculates the average spending by category for each user. The clause groups the records by users' first and last names as well as the purchase category, upon which the average spending is then calculated.
Because joins come up so often during SQL interviews, try an interactive Spotify JOIN SQL question:
The best way to prepare for a Bilibili SQL interview is to practice, practice, practice. Besides solving the above Bilibili SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Google, Uber, and Microsoft.
Each interview question has hints to guide you, detailed solutions and crucially, there's an interactive SQL code editor so you can right in the browser run your query and have it checked.
To prep for the Bilibili SQL interview it is also useful to practice SQL questions from other tech companies like:
However, if your SQL foundations are weak, don't worry about diving straight into solving questions – improve your SQL foundations with this DataLemur SQL Tutorial.
This tutorial covers topics including filtering data with WHERE and creating summary stats with GROUP BY – both of these come up often during SQL interviews at Bilibili.
In addition to SQL query questions, the other topics covered in the Bilibili Data Science Interview are:
The best way to prepare for Bilibili Data Science interviews is by reading Ace the Data Science Interview. The book's got: