At NCsoft, SQL is used across the company for analyzing player behavior patterns in online games and optimizing in-game economies based on transactional data. Unsurprisingly this is why NCSoft frequently asks SQL problems in interviews for Data Science and Data Engineering positions.
To help you practice for the NCSoft SQL interview, this blog covers 8 NCsoft SQL interview questions – able to solve them?
NCSoft is an online gaming company. For them, a whale user is a player who makes frequent in game purchases. Your task is to write a SQL query to analyze the in-game purchases data table and identify the top five whale users for each of the games in terms of the total amount spent. Assume that each game has a separate purchase data table.
Note: The purchase amount is in USD.
Let's consider the table representing the in-game purchases for one of the games.
purchase_id | user_id | purchase_date | amount |
---|---|---|---|
1001 | 453 | 06/01/2022 | 100 |
1002 | 453 | 06/05/2022 | 150 |
1003 | 654 | 06/05/2022 | 50 |
1004 | 231 | 06/07/2022 | 200 |
1005 | 231 | 06/08/2022 | 150 |
The select statement should give the following output:
user_id | total_amount |
---|---|
231 | 350 |
453 | 250 |
654 | 50 |
In PostgreSQL syntax:
In this query, we are summing up the amount spent by each user and arranging them in decreasing order. We are then selecting the top 5 users having the maximum total_amount. These will be the whale users.
To solve a similar power-user data analysis problem question on DataLemur's free interactive SQL code editor, try this recently asked Microsoft SQL interview question:
You are working as a data analyst for NCSoft, a gaming company. NCSoft has launched a new game and they are keeping track of their in-game purchases by their users.
Your task is to write a SQL query that provides a monthly analysis of the total revenue generated by each game. You need to calculate revenue per month, per game, and also, for each row, provide the previous month’s revenue for that game (or null if there isn't a previous month).
Consider the following sample data:
purchase_id | user_id | game_id | purchase_date | **amount ($) ** |
---|---|---|---|---|
1 | 101 | 1 | 2022-01-01 | 10 |
2 | 102 | 2 | 2022-01-02 | 5 |
3 | 103 | 1 | 2022-01-05 | 15 |
4 | 104 | 1 | 2022-02-01 | 25 |
5 | 105 | 2 | 2022-02-03 | 20 |
6 | 106 | 1 | 2022-02-05 | 10 |
7 | 107 | 2 | 2022-03-01 | 15 |
8 | 108 | 2 | 2022-03-04 | 10 |
game_id | purchase_month | total_revenue | prev_month_revenue |
---|---|---|---|
1 | 2022-01 | 25 | null |
1 | 2022-02 | 35 | 25 |
2 | 2022-01 | 5 | null |
2 | 2022-02 | 20 | 5 |
2 | 2022-03 | 25 | 20 |
Based on the problem given, we can utilize a window function to calculate the previous month revenue. Here is one such solution using a postgresql query:
In the above query, we are first creating a CTE (Common Table Expression) named that aggregates the total_revenue per game_id and purchase_month. Then we select from this CTE and use function (a window function) to get the previous row's value within each partition of game_id ordered by purchase_month. This way, the column will contain the previous month's revenue for each game. Please note that since only pulls data from a previous row, if there is no previous row (e.g., it's the first month for a given game_id), it will output null.
To solve another window function question on DataLemur's free interactive coding environment, try this Google SQL Interview Question:
Database normalization is helpful because it improves the performance, flexibility, and scalability of a database. Some specific benefits of normalization include:
Reducing Redundancy: Normalization helps to minimize redundancy by breaking down a larger general table into smaller, more specific tables. This can often reduce the amount of data that needs to be accessed for particular queries.
Improving Data Integrity: Normalization helps to ensure the integrity of the data by minimizing the risk of data inconsistencies and anomalies. It does this by establishing clear relationships between the tables via primary and foreign keys.
Increasing Flexibility: Normalization makes it easier to modify the structure of the database, as there is less redundancy, so it allows you to make changes to one table without affecting others. This makes it easier to adapt the database to changing business needs (and change is basically constant at NCSoft!)
Improving Performance: Normalization can improve the performance of the database by reducing the amount of data that needs to be stored and accessed in a single table. This can result in faster query times and better overall performance.
Easier Data Modeling: Normalization can make it easier to model the data in a logical and intuitive way, since less random data is commingled together, which improves the overall database design.
As a data analyst at NCSoft, you are tasked with getting insights from the subscription database. Write a SQL query to return the names of customers who live in the United States and have an active subscription to the "Blade & Soul" game.
For ease of understanding, here's your database:
customer_id | first_name | last_name | country |
---|---|---|---|
1001 | John | Doe | USA |
1002 | Jane | Smith | UK |
1003 | Bob | Taylor | USA |
1004 | Charlie | Brown | Canada |
1005 | Alice | White | USA |
customer_id | game_name | is_active |
---|---|---|
1001 | Blade & Soul | true |
1002 | Blade & Soul | true |
1003 | Aion | true |
1004 | Blade & Soul | false |
1005 | Blade & Soul | true |
This query first joins the customer and subscription tables on their common key, . Then it applies three filter conditions using the WHERE clause - the customer must be from the USA, the game must be "Blade & Soul", and the subscription must be active.
A cross-join, also known as a cartesian join, is a type of join that returns the cross-product of two tables. In a cross-join, each row from the first table is paired with every row from the second table, resulting in a new table that has a row for each possible combination of rows from the two input tables.
For example, say you worked on the Marketing Analytics team at NCSoft, and needed to understand what advertising copy (text) goes best with what advertising creative (the photo that gets posted with the advertisement copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
You could this query to generate all possible combinations of ad copy and ad creative to help you create more effective ads for NCSoft. Just be careful: if you had 1,000 rows of ad copy and 1,000 different image creatives, the resulting cross-join would have 1 million rows!
You are a data analyst at NCSoft, a company that produces online games. Your supervisor wants to understand more about the players' behavior across different games. Specifically, he wants to know the average play duration per game on a monthly basis. Assume that all the games have the same unit for play duration (minutes).
You are given a table with the following structure:
activity_id | player_id | game_id | play_date | play_duration(min) |
---|---|---|---|---|
1001 | 9503 | 1204 | 04/03/2023 00:00:00 | 60 |
1002 | 6085 | 1001 | 05/12/2023 00:00:00 | 180 |
1003 | 4201 | 1001 | 04/01/2023 00:00:00 | 120 |
1004 | 9202 | 1204 | 04/20/2023 00:00:00 | 90 |
1005 | 3580 | 1001 | 03/28/2023 00:00:00 | 60 |
Your task is to write a SQL query to return a table that contains the game_id, the month of the play date, and the average play duration for that game_id during that month. The average play duration should be rounded to two decimal places.
Here is the PostgreSQL query to solve the problem:
By grouping the data by both and the month part of the , we can calculate the average play duration per game on a monthly basis. Also, we're rounding the average to two decimal places using the function for better readability.
Assuming that the table has the structure and data shown above, then the output should look like this:
game_id | month | avg_play_duration |
---|---|---|
1001 | 3 | 60.00 |
1001 | 4 | 120.00 |
1204 | 4 | 75.00 |
The most similar questions to the SQL problem are "Average Review Ratings" from Amazon and "Teams Power Users" from Microsoft.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for calculating averages on a monthly basis or this Microsoft Teams Power Users Question which is similar for summarising user engagement data.
The primary key of a table is a column or set of columns that serves as a unique identifier for each row. It ensures that all rows are distinct and does not allow null values.
For example, say you had stored some Facebook ad campaign data that NCSoft ran:
The CampaignID column is used to uniquely identify each row in the table, and the constraint ensures that there are no duplicate CampaignID values. This helps to maintain the accuracy of the data by preventing duplicate rows. The primary key is also an important part of the table because it enables you to easily identify and reference specific campaigns in your Facebook Ad data. You can use it to join to other tables in the database, such as a table with data on the results of the campaigns.
As a Data Analyst at NCSoft, you are tasked with analyzing the average monthly revenue generated per game. Given the 'sales' and 'games' tables, write a SQL query that will calculate the average monthly revenue for each game.
Example Input:
sale_id | game_id | user_id | sale_date | price |
---|---|---|---|---|
3600 | 1 | 342 | 02/15/2022 00:00:00 | 59.99 |
3601 | 2 | 984 | 02/22/2022 00:00:00 | 49.99 |
3602 | 1 | 241 | 03/01/2022 00:00:00 | 59.99 |
3603 | 2 | 468 | 03/02/2022 00:00:00 | 49.99 |
3604 | 3 | 764 | 03/08/2022 00:00:00 | 39.99 |
Example Input:
game_id | game_name |
---|---|
1 | Lineage |
2 | Guild Wars 2 |
3 | Blade & Soul |
This query truncates the sale_date down to the month using the DATE_TRUNC function. It joins the 'sales' and 'games' tables on the game_id. By grouping the records by the month and game_name, it calculates the average price (revenue) for each group, thereby providing the average monthly revenue per game. It lists the results in the order of the month and the average monthly revenue per game.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the NCSoft SQL interview is to solve as many practice SQL interview questions as you can!
Beyond just solving the earlier NCSoft SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like Facebook, Google and unicorn tech startups.
Each SQL question has multiple hints, full answers and most importantly, there is an interactive coding environment so you can instantly run your SQL query answer and have it graded.
To prep for the NCSoft SQL interview it is also a great idea to practice SQL questions from other tech companies like:
In case your SQL query skills are weak, forget about jumping right into solving questions – refresh your SQL knowledge with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers SQL concepts such as using wildcards with LIKE and RANK vs. DENSE RANK – both of these pop up often in NCSoft SQL assessments.
In addition to SQL query questions, the other types of questions tested in the NCSoft Data Science Interview are:
The best way to prepare for NCSoft Data Science interviews is by reading Ace the Data Science Interview. The book's got: