At Tencent, SQL does the heavy lifting for analyzing datasets related to paid user acquisition for Tencents gaming portfolio and for supporting TencentDB. Unsurprisingly this is why Tencent LOVES to ask SQL problems during interviews for Data Analyst, Data Science, and BI jobs.
To help you prepare for the SQL Interview, we've collected 9 Tencent SQL interview questions to practice, which are similar to commonly asked questions at Tencent – can you solve them?
Tencent has a large portfolio of video games that they own. As a Data Analyst, assume you are conducting analysis on how our game ratings fluctuate over time.
We have collected data from our users in the table with columns such as , , , , and . The column contains the star ratings (1-5) given by users for our games.
Write a SQL query to calculate the monthly average ratings per game. Order the result by month and game_id.
review_id | user_id | submit_date | game_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
month | game_id | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
This query first extracts the month from the column. Then we group our data by extracted month and by . We calculate the average of for each group and order the final output by and . The AVG function is accurate to 2 decimal places.
To solve a similar window function interview problem which uses RANK() on DataLemur's free interactive SQL code editor, solve this Amazon SQL question asked in a BI Engineer interview:
Tencent, being a major player in the gaming industry, continuously needs to track user interactions and performance across different gaming titles. The games department is launching a new game and would like to maintain an eye on active user counts, the time spent by gamers and their accomplishments (scores).
The design should consider at least three tables: , and .
The table will define the attributes of each game. The table identifies the information for each player. The table records each gameplay session uniquely, connecting players to games they've played, the scores they've achieved, and the timestamps those games were played.
game_id | game_title |
---|---|
101 | "SuperTencentLand" |
102 | "TencentRacers" |
player_id | player_name |
---|---|
201 | "TencentMaster1" |
202 | "TencentGuru2" |
game_id | player_id | score | played_at |
---|---|---|---|
101 | 201 | 3500 | 2022-07-01 09:00:00 |
102 | 202 | 4500 | 2022-07-01 09:15:00 |
101 | 202 | 3000 | 2022-07-01 09:20:00 |
102 | 201 | 2900 | 2022-07-01 09:30:00 |
In this SQL solution, we are joining three tables and using window functions to determine the average playtime. Please note, the provided SQL assumes that the table column represents the end time of each playing session. Since the actual session duration or start time isn't provided, this approximation of game playing time was used for demonstration. Adjustments would be needed for the actual session duration data. Also, keep in mind to properly deal with NULL return of LAG function when calculating time spent playing.
To normalize a database, tables are divided into smaller, more specialized ones and relationships between them are defined via primary and foreign keys. This minimizes redundancy, making the database more flexible, scalable, and easier to maintain. Normalization also helps to ensure the accuracy of the data by reducing the likelihood of inconsistencies and errors.
Given a table called that records the number of active users each day for different Tencent games, find the average daily active users for each game for the month of August, 2022. The table has the following schema:
date | game_id | active_users |
---|---|---|
08/01/2022 | 123 | 50000 |
08/02/2022 | 123 | 51000 |
08/01/2022 | 456 | 60000 |
08/02/2022 | 456 | 62000 |
08/03/2022 | 123 | 52000 |
08/03/2022 | 456 | 61000 |
We're looking for output that looks like:
game_id | avg_active_users |
---|---|
123 | 51000 |
456 | 61000 |
This query calculates the average number of daily active users for each game in the month of August, 2022. The WHERE clause restricts the rows selected to those between August 1, 2022 and August 31, 2022. The GROUP BY statement divides the data into groups, each corresponding to a different game. The AVG function is then applied to each group, giving the average number of active users over the specified period for each game.
To practice a very similar question try this interactive Facebook Active User Retention Question which is similar for analyzing user activity over a specific month or this Facebook App Click-through Rate (CTR) Question which is similar for the analysis of user interaction metrics.
A primary key is a column or set of columns in a table that uniquely identifies each row in the table. The primary key is used to enforce the uniqueness and non-nullability of the rows in the table.
In a SQL database, a primary key is defined using the constraint. For example, say you had a table of :
In this example, the column is the primary key of the Tencent employees table. It is defined as an integer and is marked as the primary key using the constraint.
A table can have only one primary key, but the primary key can consist of multiple columns. For example, say you had a table of Tencent customer transactions:
In the above example, the primary key of the Orders table consists of two columns: TransactionID and ProductID. This means that the combination of OrderID and ProductID must be unique for every row in the table.
Tencent is a global entity with various services in different sectors. One of their key products is online gaming. For this question, consider the Tencent gaming division. There is a table 'purchases' with columns 'purchase_id', 'user_id', 'purchase_date', 'game_id', and 'amount'.
The task is to write a SQL query to find out the average monthly spending per game for the year 2022.
Sample data:
purchase_id | user_id | purchase_date | game_id | amount |
---|---|---|---|---|
8791 | 345 | 01/08/2022 | 10001 | 25 |
4502 | 211 | 02/19/2022 | 20002 | 15 |
1943 | 563 | 02/28/2022 | 10001 | 30 |
3625 | 211 | 03/01/2022 | 20002 | 20 |
9876 | 345 | 03/20/2022 | 10001 | 30 |
The output should include the month, game_id and average spending for that game in the month.
Month | game_id | average_spending_per_game |
---|---|---|
1 | 10001 | 25.00 |
2 | 20002 | 15.00 |
2 | 10001 | 30.00 |
3 | 20002 | 20.00 |
3 | 10001 | 30.00 |
This PostgreSQL query extracts the month from the purchase_date first and then, groups the purchases by its month and game_id. On this grouped data, it calculates the average spending amount. It finally sorts the resultset by Month and game_id. The WHERE clause filters the data for the year 2022.
When designing a database schema, a one-to-one relationship between two entities is characterized by each entity being related to a single instance of the other. An example of this is the relationship between a car and a license plate - each car has one license plate, and each license plate belongs to one car.
On the other hand, a one-to-many relationship is when one entity can be associated with multiple instances of the other entity. For example, a person can have multiple email addresses, but each email address only relates back to one person.
Tencent, being a global company, has a large customer base with diverse names. Assume that for a marketing campaign, they want to filter out all customer records where the last name starts with a specific letter, say 'L'. Write an SQL query that can fetch such records from the customer database.
The customer database is represented by the table . The table schema and data are represented as follows:
customer_id | first_name | last_name | country | joined_date |
---|---|---|---|---|
001 | John | Lee | China | 11/06/2019 |
002 | Pete | Li | Hong Kong | 12/14/2020 |
003 | Mark | Lo | Taiwan | 09/01/2018 |
004 | Jenny | Wu | China | 01/05/2021 |
005 | Sam | Lin | Japan | 02/20/2019 |
Here is the SQL block to accomplish the task:
This query in PostgreSQL will return all the rows in the table where the field starts with the letter 'L'. The '%' character is a wildcard that matches any sequence of characters.
customer_id | first_name | last_name | country | joined_date |
---|---|---|---|---|
001 | John | Lee | China | 11/06/2019 |
002 | Pete | Li | Hong Kong | 12/14/2020 |
003 | Mark | Lo | Taiwan | 09/01/2018 |
005 | Sam | Lin | Japan | 02/20/2019 |
This output includes only those customers whose last name starts with the letter 'L'.
Suppose you work for Tencent on their gaming division. They have a vast amount of data on each user's in-game purchases and game player stats.
You need to generate a report providing the following:
Please compute the above metrics from the and table. The holds data about every in-app purchase made by the users. The holds data about users' gameplay time and the unique games they played.
Sample data for :
purchase_id | user_id | game_id | purchase_date | amount |
---|---|---|---|---|
1 | 1 | 100 | 06/08/2022 | 200 |
2 | 1 | 200 | 06/10/2022 | 300 |
3 | 2 | 100 | 06/18/2022 | 150 |
4 | 2 | 300 | 07/26/2022 | 250 |
5 | 3 | 200 | 07/05/2022 | 500 |
Sample data for :
user_id | country | game_hours | unique_games |
---|---|---|---|
1 | USA | 50 | 2 |
2 | UK | 30 | 3 |
3 | China | 70 | 1 |
In the query, we join the and tables on . We compute the average spend per user with AVG(), count the number of unique games played per user, and compute the square root of total game hours, rounding to the nearest whole number, as part of the SELECT statement. The GROUP BY clause enables us to generate these metrics for each and combination.
The two most similar questions based on the details given are 'Histogram of Users and Purchases' by Walmart and 'Average Review Ratings' by Amazon.
'Histogram of Users and Purchases' correlates with the problem as it also deals with user transactions and the number of items bought by users, which appears similar to the number of unique games a user played. 'Average Review Ratings' deals with calculating average values grouped per product, similar to the need for calculating average spend per user.
To practice a very similar question try this interactive Walmart Histogram of Users and Purchases Question which is similar for focusing on user transactions and their quantity or this Amazon Average Review Ratings Question which is similar for calculating average values with a specific grouping, like the AVG_Spend per user in your problem.
The key to acing a Tencent SQL interview is to practice, practice, and then practice some more! In addition to solving the earlier Tencent SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Facebook, Google and unicorn tech startups.
Each interview question has multiple hints, fully explained answers along with a discussion board to see how others solved it and most importantly, there's an online SQL coding environment so you can right online code up your SQL query answer and have it executed.
To prep for the Tencent SQL interview you can also be wise to practice interview questions from other tech companies like:
But if your SQL foundations are weak, don't worry about diving straight into solving questions – strengthen your SQL foundations with this free SQL tutorial.
This tutorial covers SQL concepts such as filtering on multiple conditions using AND/OR/NOT and sorting data with ORDER BY – both of which show up frequently during SQL interviews at Tencent.
For the Tencent Data Science Interview, beyond writing SQL queries, the other types of questions to prepare for are:
The best way to prepare for Tencent Data Science interviews is to learn more about Tencent's AI Lab. You should also read Ace the Data Science Interview because it's got: