At iQIYI, SQL is used frequently for querying and manipulating vast video streaming data, and generating insights for personalized content recommendations. So, it shouldn't surprise you that iQIYI often tests SQL questions in interviews for Data Science and Data Engineering positions.
As such, to help you ace the iQIYI SQL interview, we'll cover 11 iQIYI SQL interview questions – can you solve them?
iQIYI is a popular streaming platform in China and hosts a wide variety of TV shows and Movies. As a valued member of the iQIYI data team, you've been tasked with identifying our 'Whale' users. These are the users who watch an extreme number of videos, making them highly valuable to the business.
Your task is to write a SQL query that identifies the top 10 users based on the number of videos watched in the last 30 days.
For this task, let's consider we have a table where each row corresponds to a single video view by a user.
activity_id | user_id | view_date | video_id |
---|---|---|---|
1281 | 101 | 06/25/2022 00:00:00 | 50098 |
5961 | 201 | 06/26/2022 00:00:00 | 50254 |
3527 | 101 | 06/30/2022 00:00:00 | 50098 |
8731 | 302 | 07/01/2022 00:00:00 | 50012 |
4721 | 101 | 07/02/2022 00:00:00 | 50254 |
5971 | 201 | 07/02/2022 00:00:00 | 50075 |
9831 | 302 | 07/05/2022 00:00:00 | 50098 |
2921 | 201 | 07/08/2022 00:00:00 | 50254 |
7922 | 101 | 07/10/2022 00:00:00 | 50075 |
6181 | 302 | 07/12/2022 00:00:00 | 50098 |
This SQL query groups the table by , and counts the number of video views for each user. The query also filters the data to only consider views from the past 30 days. The result is then sorted in descending order to list users with the highest video views first and limited to top 10 users.
To practice a similar VIP customer analysis question on DataLemur's free online SQL coding environment, try this Microsoft Azure Cloud SQL Interview Question:
iQIYI, often referred to as the Netflix of China, is a platform that provides an extensive library of TV shows and movies. As a data analyst at iQIYI, you are given a task to analyze viewer behavior.
You are given access to the and tables:
user_id | show_id | timestamp |
---|---|---|
101 | 1 | 2022-07-25 19:30:50 |
102 | 2 | 2022-07-26 20:40:10 |
101 | 3 | 2022-07-26 21:45:15 |
103 | 1 | 2022-07-26 22:00:00 |
101 | 2 | 2022-07-27 23:10:25 |
102 | 3 | 2022-07-28 00:20:35 |
show_id | title |
---|---|
1 | The Big Bang Theory |
2 | Game of Thrones |
3 | Friends |
Write a SQL query to find out which show has the highest number of active users on a given day. Define an 'active user' as someone who has viewed at least once on that day. Provide the date, show_id and show title.
The first part of the query calculates the number of active users for each show and each day. The second part finds out which show has the most active users for each day using the RANK window function. Each row in the result represents the show with the most active users for a specific day.
To practice a related window function SQL problem on DataLemur's free online SQL coding environment, solve this Amazon SQL Interview Question:
Constraints are just rules your DBMS has to follow when updating/inserting/deleting data.
Say you had a table of iQIYI products and a table of iQIYI customers. Here's some example SQL constraints you'd use:
NOT NULL: This constraint could be used to ensure that certain columns in the product and customer tables, such as the product name and customer email address, cannot contain NULL values.
UNIQUE: This constraint could be used to ensure that the product IDs and customer IDs are unique. This would prevent duplicate entries in the respective tables.
PRIMARY KEY: This constraint could be used to combine the NOT NULL and UNIQUE constraints to create a primary key for each table. The product ID or customer ID could serve as the primary key.
FOREIGN KEY: This constraint could be used to establish relationships between the iQIYI product and customer tables. For example, you could use a foreign key to link the customer ID in the customer table to the customer ID in the product table to track which products each customer has purchased.
CHECK: This constraint could be used to ensure that certain data meets specific conditions. For example, you could use a CHECK constraint to ensure that iQIYI product prices are always positive numbers.
DEFAULT: This constraint could be used to specify default values for certain columns. For example, you could use a DEFAULT constraint to set the customer registration date to the current date if no value is provided when a new customer is added to the database.
iQIYI, a leading online video platform in China, is highly interested in understanding its users' video viewing patterns. The company wants to analyze the most-watched genres, peak viewing times, and the average duration viewers watch videos.
You have access to two tables: and
table records every viewing event by users:
viewing_id | user_id | video_id | start_time | viewing_duration |
---|---|---|---|---|
21831 | 2345 | 9876 | 06/21/2022 18:00:00 | 20 |
24561 | 1452 | 5234 | 06/22/2022 20:00:00 | 30 |
32651 | 2873 | 9876 | 06/23/2022 21:00:00 | 45 |
22851 | 5698 | 1762 | 06/24/2022 22:00:00 | 15 |
12576 | 7859 | 5234 | 06/26/2022 18:00:00 | 35 |
table holds video information:
video_id | title | genre | duration |
---|---|---|---|
9876 | Inception | Sci-Fi | 120 |
5234 | Friends | Comedy | 30 |
1762 | Frozen II | Animation | 90 |
In this exercise:
Note: All time references are in the 'YYYY/MM/DD HH24:MI:SS' string format.
This problem can be solved with the following PostgreSQL queries.
Each of these queries targets a specific aspect of user viewing patterns - most watched genre, peak viewing times, and average viewing duration. By using appropriate joins, aggregations, and order conditions, we can extract meaningful insights from the data for iQIYI.
The clause is used to filter rows from the result set of a , , or statement. It allows you to specify a condition that must be met for a row to be included in the result set.
The clause is used to filter groups created by the clause. It is similar to the clause, but it is used to specify conditions on the groups created by the clause, rather than on the individual rows of the table.
Say you were working on a social media analytics project for iQIYI.
Here is an example of a SQL query that you might write which uses both the WHERE and HAVING clauses:
This query retrieves the total impressions and average conversions for each platform in the table, the date of the campaign is in January 2023. The rows are grouped by platform and the clause filters the groups to include only those with more than 5000 impressions and an average conversion rate above 0.2.
iQIYI is a popular streaming platform in China, often thought of as the Netflix of China. Let's assume you're working with their database of user activity, which contains one record per view of a show by a user.
Your task is to write a SQL query to obtain a list of users who have watched at least 3 different shows over the past month, and whose last watched show was a specified show "Endless Love". The columns in the database are , and .
user_id | show_id | view_date |
---|---|---|
1 | 800 | 2022-11-01 |
1 | 810 | 2022-11-05 |
1 | 850 | 2022-11-07 |
2 | 800 | 2022-11-04 |
2 | 850 | 2022-11-06 |
3 | 800 | 2022-11-08 |
3 | 810 | 2022-11-10 |
3 | 850 | 2022-11-15 |
3 | 800 | 2022-11-18 |
user_id |
---|
1 |
3 |
Here's one way to write that query using PostgreSQL syntax:
In this query, the subquery creates a table where each row is numbered within each in reverse chronological order. The desired output is then selected from this subquery by filtering for the rows where , , and meet the specified conditions.
In detail, the conditions are that for each user, the last watched show () must be "Endless Love" (), and the user must have watched at least 3 different shows over the past month (). You can replace 800 with the show_id of "Endless Love" if it's different.
The primary key of a table is a column or set of columns that serves as a unique identifier for each row. It ensures that all rows are distinct and does not allow null values.
For example, say you had stored some Facebook ad campaign data that iQIYI ran:
The CampaignID column is used to uniquely identify each row in the table, and the constraint ensures that there are no duplicate CampaignID values. This helps to maintain the accuracy of the data by preventing duplicate rows. The primary key is also an important part of the table because it enables you to easily identify and reference specific campaigns in your Facebook Ad data. You can use it to join to other tables in the database, such as a table with data on the results of the campaigns.
As a part of the marketing team at iQIYI, a leading online entertainment service company, you're tasked to monitor and improve the effectiveness of the company's online ads and product listings. From the 'clicks' and 'cart' tables provided, calculate the click-through rate (CTR) for each ad, and the conversion rate of each product listing. In addition, include in your results only those ads and listings that have received at least 100 clicks/views.
click_id | user_id | timestamp | ad_id |
---|---|---|---|
1001 | 123 | 06/08/2022 00:00:00 | 50001 |
1002 | 265 | 06/09/2022 00:05:00 | 50001 |
1003 | 362 | 06/08/2022 00:00:00 | 69852 |
1004 | 192 | 07/26/2022 00:08:00 | 69852 |
1005 | 981 | 07/05/2022 00:10:00 | 69852 |
entry_id | user_id | timestamp | product_id |
---|---|---|---|
9001 | 123 | 06/08/2022 00:01:00 | 50001 |
9002 | 265 | 06/10/2022 00:06:00 | 50001 |
9003 | 362 | 06/18/2022 00:01:00 | 69852 |
9004 | 192 | 07/26/2022 00:30:00 | 69852 |
9005 | 981 | 07/05/2022 00:35:00 | 69852 |
This query starts by creating two subqueries - 'ad_clicks' and 'add_to_cart' - to count the total number of clicks per ad and total number of items added to the cart per product id respectively. Then, it joins these subqueries on the ad_id (from 'ad_clicks') and product_id (from 'add_to_cart'). Finally, it calculates the click-through rate (ctr) and the conversion rate, including in the output only those ads and listings that have received at least 100 clicks/views.
To practice a related SQL problem on DataLemur's free online SQL code editor, attempt this SQL interview question asked by Facebook:
You are given two tables, and . The table contains logs for each time iQIYI's customers interact with different shows and the contains the customer's subscription status including their subscription tier and subscription date.
You are tasked to write a SQL query to get the list of customers who have a premium subscription and have viewed the show 'Go Go Squid!' data. Include in your query the customer's id, the show id, the date viewed, the date of subscription to the premium tier, and the subscription tier.
customer_id | show_id | date_viewed |
---|---|---|
1001 | 200 | 2019-07-22 |
1002 | 300 | 2019-07-23 |
1003 | 400 | 2019-07-24 |
1001 | 200 | 2019-07-25 |
1002 | 300 | 2019-07-26 |
customer_id | subscription_tier | date_subscribed |
---|---|---|
1001 | Premium | 2019-07-20 |
1002 | Standard | 2019-07-21 |
1003 | Premium | 2019-07-22 |
1004 | Standard | 2019-07-23 |
1005 | Premium | 2019-07-24 |
The SQL query does a join on the and tables based on the customer_id. A filter ( clause) is then applied to extract only premium customers who viewed 'Go Go Squid!'. The statement outlines the columns to be returned.
Because join questions come up frequently during SQL interviews, try this interactive Snapchat SQL Interview question using JOINS:
A correlated sub-query is one that is linked to the outer query and cannot be executed on its own. It uses the outer query to filter or transform data by referencing a column from the outer query, while the outer query uses the results of the inner query. On the other hand, a non-correlated sub-query is independent of the outer query and can be executed on its own. It does not reference any columns from the outer query and is used to retrieve additional data for the outer query.
Correlated sub-queries are slower to execute, as they have to be re-evaluated for each row of the outer query, while non-correlated sub-queries are faster, as they only have to be executed once.
iQIYI is a massive video platform with thousands of users engaging with content on a daily basis. Suppose you have the following database table, 'user_engagement', which contains activity data which includes "user_id", "session_id", "time_in_seconds" (time spent on each session), and "actions" (total actions taken in each session, such as likes, shares, comments, etc).
User engagement score is calculated as the sum of time_in_seconds and the square root of the actions for each session, all divided by the total number of sessions. Additionally, a user is categorized based on their avg_score, using the floor function for rounding: If the avg_score >=9, they are categorized as 'High'; avg_score between 5 to 8, categorized as 'Medium' else 'Low'.
Your task is to write a query that calculates the average engagement score for each user, round off to 2 decimal points and categorizes them based on the given rules.
user_id | session_id | time_in_seconds | actions |
---|---|---|---|
1001 | S01 | 120 | 25 |
1001 | S02 | 450 | 30 |
1001 | S03 | 240 | 15 |
2002 | S04 | 360 | 18 |
2002 | S05 | 300 | 24 |
2002 | S06 | 180 | 16 |
3003 | S07 | 150 | 8 |
user_id | avg_score | user_category |
---|---|---|
1001 | 10.17 | High |
2002 | 8.33 | Medium |
3003 | 5.00 | Medium |
This query first calculates the score for each session for each user in the subquery, then it calculates the average score for each user and classifies them into categories based on the average score. It uses the PostgreSQL math functions such as SQRT(), FLOOR(), and the arithmetic operations '/', '+', and AVG(), alongside GROUP BY to group results by user_id.
To practice a very similar question try this interactive Snapchat Sending vs. Opening Snaps Question which is similar for analyzing time spent on platform or this Twitter Histogram of Tweets Question which is similar for understanding user activity.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Besides solving the above iQIYI SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Microsoft, Google, Amazon, and tech startups.
Each interview question has multiple hints, full answers and crucially, there is an interactive SQL code editor so you can right in the browser run your SQL query answer and have it executed.
To prep for the iQIYI SQL interview you can also be a great idea to practice interview questions from other tech companies like:
In case your SQL coding skills are weak, don't worry about jumping right into solving questions – strengthen your SQL foundations with this SQL tutorial for Data Analytics.
This tutorial covers things like transforming strings with CONCAT()/LOWER()/TRIM() and aggregate window functions – both of which come up frequently during iQIYI SQL interviews.
In addition to SQL interview questions, the other types of questions to prepare for the iQIYI Data Science Interview are:
The best way to prepare for iQIYI Data Science interviews is by reading Ace the Data Science Interview. The book's got: