9 NetEase SQL Interview Questions - How Many Can You Solve?

Updated on

February 6, 2024

At NetEase, analysts write SQL queries to analyze user behavior data from NetEases's 250+ games. They also transitioned from MySQL to TiDB to store player and company data. Because of this, NetEase often tests SQL coding questions in interviews for Data Science and Data Engineering positions.

In case you're stressed about an upcoming SQL Assessment, we've collected 9 NetEase SQL interview questions to practice, which are similar to commonly asked questions at NetEase – able to answer them all?

NetEase Games

9 NetEase SQL Interview Questions

SQL Question 1: Compute Average User Rating for Each Game by Month

NetEase, being a major provider of online PC and mobile games, collects user reviews for its various games. For review analysis, one typical requirement might be to compute the average user ratings for each game by month.

Here is your task: You are given a table where each row is a review record of a particular game () submitted by a user (). The table has the following schema:

Example Input

review_iduser_idsubmit_dategame_idstars
617112306/01/2022 00:00:00GameA4
780245606/02/2022 00:00:00GameB5
529378907/01/2022 00:00:00GameA3
635212307/02/2022 00:00:00GameB4
451745607/03/2022 00:00:00GameA2

Write a SQL query that computes the average rating () for each game () by month (). Assume that is a timestamp, and we only care about the year and month (not the day). The output should be a table with columns for the month, game, and average rating, and should be sorted by month and then game.

Answer:


The above SQL query groups the reviews by both the month of and . Within each group, it calculates the average to represent the average rating of that game in that month.

Example Output

mthgameavg_stars
06/01/2022 00:00:00GameA4.00
06/01/2022 00:00:00GameB5.00
07/01/2022 00:00:00GameA2.50
07/01/2022 00:00:00GameB4.00

For more window function practice, solve this Uber SQL Interview Question within DataLemur's online SQL code editor:

Uber SQL problem

SQL Question 2: Analyzing Game Performance Based on User Reviews

NetEase, a leading provider of online services including games, would want to understand the performance of its games based on user reviews. Could you design a database model and SQL query to find out the average user review ratings of each game for each month?

Assume that we have following tables:

Example Input:
game_idgame_namelaunch_date
101Fantasy World01/01/2021 00:00:00
102Warrior's Quest06/01/2022 00:00:00
103Modern Strategist06/01/2022 00:00:00
Example Input:
review_iduser_idsubmit_dategame_idstars
20112306/08/2022 00:00:001014
20226506/10/2022 00:00:001024
20336206/18/2022 00:00:001013
20419207/26/2022 00:00:001023
20598107/05/2022 00:00:001035

We need to provide the average review ratings of each game for each month.

Answer:


The SQL block above first joins the table with table based on . Then it groups the records based on the submission month of review and game name, and calculates the average number of stars for each group. The result is ordered by the month in a descending order.

SQL Question 3: How do and differ when it comes to ranking rows in a result set?

While both and are used to rank rows, the key difference is in how they deal with ties.

RANK(): When there's a tie, leaves a gap in the ranking. For example, if three rows are tied for 2nd place, the RANK() function will assign a rank of 2 to the first of these rows, a rank of 3 to the 2nd row in the tie, and a rank of 4 to the the 3rd tie.

DENSE_RANK(): For ties, does not leave a gap in the ranking. Instead, it assigns the same rank to all tied rows, and then makes the next row 1 bigger. Confusing, I know, but here's an example to make it more clear: if three rows are tied for 3rd place, the function will assign a rank of 3 to all three rows, and then assign a rank of 4 to the next row.

Suppose we had data on how many deals different salespeople at NetEase:


To rank these salespeople, we could execute the following query:


The result of this query would be:

namedeals_closedrankdense_rank
Akash5011
Brittany5021
Carlos4032
Dave4043
Eve3053
Farhad1064

As you can see, the function assigns a rank of 1 to the first row (Akash), a rank of 2 to the second row (Brittany), and so on. In contrast, the , gives both Akash and Brittany a rank of 1, since they are tied, then moves on to rank 2 for Carlos.

NetEase SQL Interview Questions

SQL Question 4: Filter Customer Records by Purchase Behavior and Account Status

NetEase, a major internet company based in China, manages a customer records database. They want to filter down this database to select specific customers based on their purchase behavior and account status.

They have two tables, and . The table contains details about the customer including their current account status. The table records the individual transactions made by customers.

Example Input:
customer_idaccount_statusnameemail
1001activeJohn Doejohn.doe@example.com
1002inactiveJane Smithjane.smith@example.com
1003activeMary Johnsonmary.johnson@example.com
1004blockCharles Williamscharles.williams@example.com
Example Input:
transaction_idcustomer_idproduct_idamountpurchased_date
1456100130120001/01/2023
2387100240230001/01/2023
3829100130120001/02/2023
4129100330315001/02/2023

Question: Write a SQL query to select the and of customers who have an 'active' status in the table and have spent more than $200 on product 301 in the table.

Answer:


This query first generates a subquery that calculates the total amount spent by each customer on product 301. It then joins this subquery with the table on the and filters the results based on the account status and the total amount spent. This gives us the customers with an active account status who have spent more than $200 on the specified product.

SQL Question 5: Can you describe the difference between a correlated and a non-correlated sub-query?

A correlated sub-query is one that relies on the outer query and cannot be run independently. It uses the outer query to filter or transform data by referencing a column from it, while the outer query uses the results of the inner query. On the other hand, a non-correlated sub-query is not linked to the outer query and can be run on its own. It does not reference any columns from the outer query and is used to retrieve additional data for the outer query.

An example correlated sub-query:


This correlated subquery retrieves the names and salaries of NetEase employees who make more than the average salary for their department. The subquery references the department column in the main query's FROM clause (e1.department) and uses it to filter the rows of the subquery's FROM clause (e2.department).

An example non-correlated sub-query:


This non-correlated subquery retrieves the names and salaries of NetEase employees who make more than the average salary for the Data Analytics department (which honestly should be very few people since Data Analysts are awesome).

Anways, the subquery is independent of the main query and can be considered a standalone query. Its output (the average salary for the Data Analytics department) is then used in the main query to filter the rows of the table.

SQL Question 6: NetEase Click-Through-Rate Analysis

Given a database with information on the ad impressions, click-throughs, and conversions of NetEase digital products, calculate and present the average click-through rates (CTR) and conversion rates per product.

We have two tables, and . The table has the , , and for each impression. In the table, we document each click's , , the it occurred, whether it to a sale (1 indicating yes, 0 indicating no), and the .

Example Input:
impression_idtimeproduct_id
12022/07/29 09:20:00001
22022/07/30 11:42:00002
32022/07/31 13:17:00003
42022/08/01 15:15:00001
52022/08/02 08:12:00002
Example Input:
click_idimpression_idtimeconvertedproduct_id
112022/07/29 09:22:001001
222022/07/30 11:50:000002
332022/07/31 13:20:001003
442022/08/01 15:18:000001
552022/08/02 08:15:001002

Answer:


This query first computes the CTR by dividing the number of distinct clicks by the number of distinct impressions for each product. Then it calculates the conversion rates by dividing successful conversions by the number of distinct clicks. The final step is joining the CTR and conversion rate data together based on .

To practice a related SQL problem on DataLemur's free interactive coding environment, attempt this Meta SQL interview question: SQL interview question asked by Facebook

SQL Question 7: What are the similarities and differences between a clustered index and non-clustered index?

A clustered index is an index where the order of the rows in the database corresponds to the order of the rows in the index. Because of this, a table can only have one clustered index, but it can have multiple non-clustered indexes.

The main difference between the two is that the database tries to keep the data in the same order as the corresponding keys in the clustered index. This can improve the performance of most query operations, as it provides a linear-access path to the data stored in the database.

SQL Question 8: Calculate Monthly Product Revenue and Rounded Average Sales Price

NetEase is an international technology company and runs a large e-commerce website. Each product they sell has both a sale price and a cost price, and they can sell multiple products in a day. NetEase would like you to write a SQL query to calculate the total revenue, the average sale price and the average profit margin for each product per month. The profit margin should be calculated as , rounded to four decimal places.

The sale_price and cost_price columns are stored in a two decimal number format. The dates are stored in the format "DD/MM/YYYY".

Example Input:
sales_idproduct_idsale_datesale_pricecost_price
6215000106/08/2022100.0075.00
4226985206/10/2022200.00150.00
9535000106/18/2022105.0074.00
3646985207/26/2022195.00145.00
5176985207/05/2022198.00152.00
Example Output:
mthproductrevenueavg_sale_priceavg_profit_margin
650001205.00102.500.2667
669852200.00200.000.2500
769852393.00196.500.2598

Answer:


This SQL query uses aggregate functions and to calculate total revenue and average sale price. The function is used to limit the decimal places for average sale price and profit margin. The function extracts the month from each sale_date. The result is grouped by both month and product_id.

To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for calculating sales related metrics, or this Wayfair Y-on-Y Growth Rate Question which is similar for requiring use of window functions and time-based grouping.

SQL Question 9: User Engagement Levels Over Time

As part of an analysis on user engagement, NetEase, a leading game developing company, wants to calculate the monthly average time users spend playing each game. You have been provided with a 'gaming_sessions' table where each row represents a distinct gaming session, which includes the user_id, game_id, session_start and session_end dates/times.

To assist NetEase, can you write a SQL query which gives the game_id along with the monthly average time spent (in minutes) playing each game?

Example Input:
session_iduser_idgame_idsession_startsession_end
101123300101/08/2022 10:00:0001/08/2022 11:30:00
102380500202/01/2022 20:00:0002/01/2022 20:45:00
103477300103/02/2022 16:30:0003/02/2022 18:00:00
104290300129/01/2022 09:00:0029/01/2022 11:00:00
105123500230/03/2022 14:00:0030/03/2022 15:10:00

Answer:


This query calculates the time difference between session_end and session_start in minutes by using AGE() function and EXTRACT(EPOCH FROM..). Then it groups by game_id and month number (extracted from session_start) to get the average session length.

NetEase SQL Interview Tips

The best way to prepare for a NetEase SQL interview is to practice, practice, practice. In addition to solving the above NetEase SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like Google, Uber, and Microsoft. DataLemur Question Bank

Each problem on DataLemur has hints to guide you, step-by-step solutions and crucially, there's an online SQL code editor so you can right online code up your SQL query answer and have it checked.

To prep for the NetEase SQL interview you can also be a great idea to practice SQL questions from other tech companies like:

However, if your SQL skills are weak, don't worry about diving straight into solving questions – refresh your SQL knowledge with this free SQL for Data Analytics course.

SQL interview tutorial

This tutorial covers SQL concepts such as using LIKE and handling timestamps – both of these pop up often in NetEase SQL assessments.

NetEase Data Science Interview Tips

What Do NetEase Data Science Interviews Cover?

For the NetEase Data Science Interview, in addition to SQL query questions, the other types of questions to prepare for are:

  • Statistics and Probability Questions
  • Python or R Coding Questions
  • Data Case Study Questions
  • Machine Learning Questions
  • Behavioral Based Interview Questions

NetEase Data Scientist

How To Prepare for NetEase Data Science Interviews?

To prepare for NetEase Data Science interviews read the book Ace the Data Science Interview because it's got:

  • 201 interview questions sourced from tech companies like Google & Microsoft
  • a crash course on SQL, Product-Sense & ML
  • over 900+ 5-star reviews on Amazon

Ace the Data Science Interview Book on Amazon

© 2024 DataLemur, Inc

Career Resources

Free 9-Day Data Interview Crash CourseFree SQL Tutorial for Data AnalyticsUltimate SQL Interview GuideAce the Data Job Hunt Video CourseAce the Data Science InterviewBest Books for Data Analysts