At NetEase, analysts write SQL queries to analyze user behavior data from NetEases's 250+ games. They also transitioned from MySQL to TiDB to store player and company data. Because of this, NetEase often tests SQL coding questions in interviews for Data Science and Data Engineering positions.
In case you're stressed about an upcoming SQL Assessment, we've collected 9 NetEase SQL interview questions to practice, which are similar to commonly asked questions at NetEase – able to answer them all?
NetEase, being a major provider of online PC and mobile games, collects user reviews for its various games. For review analysis, one typical requirement might be to compute the average user ratings for each game by month.
Here is your task: You are given a table where each row is a review record of a particular game () submitted by a user (). The table has the following schema:
review_id | user_id | submit_date | game_id | stars |
---|---|---|---|---|
6171 | 123 | 06/01/2022 00:00:00 | GameA | 4 |
7802 | 456 | 06/02/2022 00:00:00 | GameB | 5 |
5293 | 789 | 07/01/2022 00:00:00 | GameA | 3 |
6352 | 123 | 07/02/2022 00:00:00 | GameB | 4 |
4517 | 456 | 07/03/2022 00:00:00 | GameA | 2 |
Write a SQL query that computes the average rating () for each game () by month (). Assume that is a timestamp, and we only care about the year and month (not the day). The output should be a table with columns for the month, game, and average rating, and should be sorted by month and then game.
The above SQL query groups the reviews by both the month of and . Within each group, it calculates the average to represent the average rating of that game in that month.
mth | game | avg_stars |
---|---|---|
06/01/2022 00:00:00 | GameA | 4.00 |
06/01/2022 00:00:00 | GameB | 5.00 |
07/01/2022 00:00:00 | GameA | 2.50 |
07/01/2022 00:00:00 | GameB | 4.00 |
For more window function practice, solve this Uber SQL Interview Question within DataLemur's online SQL code editor:
NetEase, a leading provider of online services including games, would want to understand the performance of its games based on user reviews. Could you design a database model and SQL query to find out the average user review ratings of each game for each month?
Assume that we have following tables:
game_id | game_name | launch_date |
---|---|---|
101 | Fantasy World | 01/01/2021 00:00:00 |
102 | Warrior's Quest | 06/01/2022 00:00:00 |
103 | Modern Strategist | 06/01/2022 00:00:00 |
review_id | user_id | submit_date | game_id | stars |
---|---|---|---|---|
201 | 123 | 06/08/2022 00:00:00 | 101 | 4 |
202 | 265 | 06/10/2022 00:00:00 | 102 | 4 |
203 | 362 | 06/18/2022 00:00:00 | 101 | 3 |
204 | 192 | 07/26/2022 00:00:00 | 102 | 3 |
205 | 981 | 07/05/2022 00:00:00 | 103 | 5 |
We need to provide the average review ratings of each game for each month.
The SQL block above first joins the table with table based on . Then it groups the records based on the submission month of review and game name, and calculates the average number of stars for each group. The result is ordered by the month in a descending order.
While both and are used to rank rows, the key difference is in how they deal with ties.
RANK(): When there's a tie, leaves a gap in the ranking. For example, if three rows are tied for 2nd place, the RANK() function will assign a rank of 2 to the first of these rows, a rank of 3 to the 2nd row in the tie, and a rank of 4 to the the 3rd tie.
DENSE_RANK(): For ties, does not leave a gap in the ranking. Instead, it assigns the same rank to all tied rows, and then makes the next row 1 bigger. Confusing, I know, but here's an example to make it more clear: if three rows are tied for 3rd place, the function will assign a rank of 3 to all three rows, and then assign a rank of 4 to the next row.
Suppose we had data on how many deals different salespeople at NetEase:
To rank these salespeople, we could execute the following query:
The result of this query would be:
name | deals_closed | rank | dense_rank |
---|---|---|---|
Akash | 50 | 1 | 1 |
Brittany | 50 | 2 | 1 |
Carlos | 40 | 3 | 2 |
Dave | 40 | 4 | 3 |
Eve | 30 | 5 | 3 |
Farhad | 10 | 6 | 4 |
As you can see, the function assigns a rank of 1 to the first row (Akash), a rank of 2 to the second row (Brittany), and so on. In contrast, the , gives both Akash and Brittany a rank of 1, since they are tied, then moves on to rank 2 for Carlos.
NetEase, a major internet company based in China, manages a customer records database. They want to filter down this database to select specific customers based on their purchase behavior and account status.
They have two tables, and . The table contains details about the customer including their current account status. The table records the individual transactions made by customers.
customer_id | account_status | name | |
---|---|---|---|
1001 | active | John Doe | john.doe@example.com |
1002 | inactive | Jane Smith | jane.smith@example.com |
1003 | active | Mary Johnson | mary.johnson@example.com |
1004 | block | Charles Williams | charles.williams@example.com |
transaction_id | customer_id | product_id | amount | purchased_date |
---|---|---|---|---|
1456 | 1001 | 301 | 200 | 01/01/2023 |
2387 | 1002 | 402 | 300 | 01/01/2023 |
3829 | 1001 | 301 | 200 | 01/02/2023 |
4129 | 1003 | 303 | 150 | 01/02/2023 |
Question: Write a SQL query to select the and of customers who have an 'active' status in the table and have spent more than $200 on product 301 in the table.
This query first generates a subquery that calculates the total amount spent by each customer on product 301. It then joins this subquery with the table on the and filters the results based on the account status and the total amount spent. This gives us the customers with an active account status who have spent more than $200 on the specified product.
A correlated sub-query is one that relies on the outer query and cannot be run independently. It uses the outer query to filter or transform data by referencing a column from it, while the outer query uses the results of the inner query. On the other hand, a non-correlated sub-query is not linked to the outer query and can be run on its own. It does not reference any columns from the outer query and is used to retrieve additional data for the outer query.
An example correlated sub-query:
This correlated subquery retrieves the names and salaries of NetEase employees who make more than the average salary for their department. The subquery references the department column in the main query's FROM clause (e1.department) and uses it to filter the rows of the subquery's FROM clause (e2.department).
An example non-correlated sub-query:
This non-correlated subquery retrieves the names and salaries of NetEase employees who make more than the average salary for the Data Analytics department (which honestly should be very few people since Data Analysts are awesome).
Anways, the subquery is independent of the main query and can be considered a standalone query. Its output (the average salary for the Data Analytics department) is then used in the main query to filter the rows of the table.
Given a database with information on the ad impressions, click-throughs, and conversions of NetEase digital products, calculate and present the average click-through rates (CTR) and conversion rates per product.
We have two tables, and . The table has the , , and for each impression. In the table, we document each click's , , the it occurred, whether it to a sale (1 indicating yes, 0 indicating no), and the .
impression_id | time | product_id |
---|---|---|
1 | 2022/07/29 09:20:00 | 001 |
2 | 2022/07/30 11:42:00 | 002 |
3 | 2022/07/31 13:17:00 | 003 |
4 | 2022/08/01 15:15:00 | 001 |
5 | 2022/08/02 08:12:00 | 002 |
click_id | impression_id | time | converted | product_id |
---|---|---|---|---|
1 | 1 | 2022/07/29 09:22:00 | 1 | 001 |
2 | 2 | 2022/07/30 11:50:00 | 0 | 002 |
3 | 3 | 2022/07/31 13:20:00 | 1 | 003 |
4 | 4 | 2022/08/01 15:18:00 | 0 | 001 |
5 | 5 | 2022/08/02 08:15:00 | 1 | 002 |
This query first computes the CTR by dividing the number of distinct clicks by the number of distinct impressions for each product. Then it calculates the conversion rates by dividing successful conversions by the number of distinct clicks. The final step is joining the CTR and conversion rate data together based on .
To practice a related SQL problem on DataLemur's free interactive coding environment, attempt this Meta SQL interview question:
A clustered index is an index where the order of the rows in the database corresponds to the order of the rows in the index. Because of this, a table can only have one clustered index, but it can have multiple non-clustered indexes.
The main difference between the two is that the database tries to keep the data in the same order as the corresponding keys in the clustered index. This can improve the performance of most query operations, as it provides a linear-access path to the data stored in the database.
NetEase is an international technology company and runs a large e-commerce website. Each product they sell has both a sale price and a cost price, and they can sell multiple products in a day. NetEase would like you to write a SQL query to calculate the total revenue, the average sale price and the average profit margin for each product per month. The profit margin should be calculated as , rounded to four decimal places.
The sale_price and cost_price columns are stored in a two decimal number format. The dates are stored in the format "DD/MM/YYYY".
sales_id | product_id | sale_date | sale_price | cost_price |
---|---|---|---|---|
621 | 50001 | 06/08/2022 | 100.00 | 75.00 |
422 | 69852 | 06/10/2022 | 200.00 | 150.00 |
953 | 50001 | 06/18/2022 | 105.00 | 74.00 |
364 | 69852 | 07/26/2022 | 195.00 | 145.00 |
517 | 69852 | 07/05/2022 | 198.00 | 152.00 |
mth | product | revenue | avg_sale_price | avg_profit_margin |
---|---|---|---|---|
6 | 50001 | 205.00 | 102.50 | 0.2667 |
6 | 69852 | 200.00 | 200.00 | 0.2500 |
7 | 69852 | 393.00 | 196.50 | 0.2598 |
This SQL query uses aggregate functions and to calculate total revenue and average sale price. The function is used to limit the decimal places for average sale price and profit margin. The function extracts the month from each sale_date. The result is grouped by both month and product_id.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for calculating sales related metrics, or this Wayfair Y-on-Y Growth Rate Question which is similar for requiring use of window functions and time-based grouping.
As part of an analysis on user engagement, NetEase, a leading game developing company, wants to calculate the monthly average time users spend playing each game. You have been provided with a 'gaming_sessions' table where each row represents a distinct gaming session, which includes the user_id, game_id, session_start and session_end dates/times.
To assist NetEase, can you write a SQL query which gives the game_id along with the monthly average time spent (in minutes) playing each game?
session_id | user_id | game_id | session_start | session_end |
---|---|---|---|---|
101 | 123 | 3001 | 01/08/2022 10:00:00 | 01/08/2022 11:30:00 |
102 | 380 | 5002 | 02/01/2022 20:00:00 | 02/01/2022 20:45:00 |
103 | 477 | 3001 | 03/02/2022 16:30:00 | 03/02/2022 18:00:00 |
104 | 290 | 3001 | 29/01/2022 09:00:00 | 29/01/2022 11:00:00 |
105 | 123 | 5002 | 30/03/2022 14:00:00 | 30/03/2022 15:10:00 |
This query calculates the time difference between session_end and session_start in minutes by using AGE() function and EXTRACT(EPOCH FROM..). Then it groups by game_id and month number (extracted from session_start) to get the average session length.
The best way to prepare for a NetEase SQL interview is to practice, practice, practice. In addition to solving the above NetEase SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like Google, Uber, and Microsoft.
Each problem on DataLemur has hints to guide you, step-by-step solutions and crucially, there's an online SQL code editor so you can right online code up your SQL query answer and have it checked.
To prep for the NetEase SQL interview you can also be a great idea to practice SQL questions from other tech companies like:
However, if your SQL skills are weak, don't worry about diving straight into solving questions – refresh your SQL knowledge with this free SQL for Data Analytics course.
This tutorial covers SQL concepts such as using LIKE and handling timestamps – both of these pop up often in NetEase SQL assessments.
For the NetEase Data Science Interview, in addition to SQL query questions, the other types of questions to prepare for are:
To prepare for NetEase Data Science interviews read the book Ace the Data Science Interview because it's got: