At CD Projekt, SQL is used day-to-day for analyzing gaming user data to enhance gameplay experience, and managing large databases for effective game development. Because of this, CD Projekt typically asks SQL coding questions in interviews for Data Science, Analytics, and & Data Engineering jobs.
As such, to help you ace the CD Projekt SQL interview, this blog covers 8 CD Projekt SQL interview questions – able to answer them all?
Given a database of user review data for various games produced by CD Projekt, can you write a SQL query that calculates the average rating of each game for each month? The average rating should be rounded to two decimal places.
The table contains the following columns:
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
1 | 123 | 2022-01-10 | 1001 | 4 |
2 | 456 | 2022-01-15 | 1002 | 3 |
3 | 789 | 2022-01-20 | 1001 | 5 |
4 | 123 | 2022-02-10 | 1001 | 4 |
5 | 456 | 2022-02-15 | 1001 | 2 |
6 | 789 | 2022-02-20 | 1002 | 4 |
This query uses a window function to calculate the average stars per product per month. The EXTRACT function gets the month from the submit_date, and it groups the data first by the month then by the product_id. The AVG function calculates the average stars. The results are ordered by month then by product_id for easier readability.
month | product_id | avg_rating |
---|---|---|
1 | 1001 | 4.50 |
1 | 1002 | 3.00 |
2 | 1001 | 3.00 |
2 | 1002 | 4.00 |
p.s. Window functions show up super often during SQL interviews, so practice the 27+ window function questions on DataLemur
CD Projekt, a renowned video game publisher, is conducting analysis on its customers' usage patterns. They would like to identify their active customers and figure out which games are their favorites. An "active customer" is defined by CD Projekt as a customer who has purchased more than three games in the last six months. A customer’s favorite game is the game that they've played the most, based on the total number of play hours.
Write a SQL query that retrieves all active customers and their favorite games. Include the total play hours for the favorite game.
customer_id | name |
---|---|
101 | John Doe |
102 | Jane Doe |
103 | Jim Brown |
104 | Jill White |
purchase_id | customer_id | game_id | purchase_date |
---|---|---|---|
201 | 101 | 301 | 2022-01-01 |
202 | 101 | 302 | 2022-02-01 |
203 | 101 | 303 | 2022-03-01 |
204 | 102 | 301 | 2022-01-01 |
205 | 103 | 304 | 2022-10-01 |
206 | 104 | 305 | 2022-05-01 |
207 | 101 | 306 | 2022-04-01 |
gameplay_id | purchase_id | hours_played |
---|---|---|
501 | 201 | 50 |
502 | 202 | 30 |
503 | 203 | 70 |
504 | 204 | 10 |
505 | 205 | 90 |
506 | 206 | 40 |
507 | 207 | 20 |
This query first joins the customers, purchases, and gameplay tables. It filters out the purchases that were made in the last six months. It then groups the results by customer name and filters customers that made more than three purchases (thus, active customers). Finally, it extracts the maximum hours played for each customer, which corresponds to their favorite game, and orders the results by play hours in descending order. Note that if a customer has multiple games with the same highest play hours, this query will only return one of them. To return all such games, we would need a more complex query.
In SQL, the operator combines the result of two or more SELECT statements into a single result set. Note that each statement within the UNION must have the same number of columns and the columns must have similar data types. The operator removes duplicates from the final result set. If you want to include duplicates in the final result set, you can use the operator instead.
Here's an example of using the UNION operator to combine the results of two SELECT statements that retrieve data from tables of CD Projekt's Facebook ads and their Google ads:
This statement would return a combined result set of both Facebook and Google ads that have a spend greater than $500.
As a part of the data team at CD Projekt, you are tasked with a feature that recommends similar game titles to users based on their input. The input will be a string provided by the user. You need to write an SQL query to find all game titles in the database that are similar to the user's input. Assume your database is in PostgreSQL and uses the keyword for pattern matching.
Please note that the similar titles should:
game_id | title | release_date | price |
---|---|---|---|
101 | The Witcher 3: Wild Hunt | 05/19/2015 | 39.99 |
202 | Cyberpunk 2077 | 12/10/2020 | 59.99 |
303 | Thronebreaker: The Witcher Tales | 10/23/2018 | 19.99 |
404 | The Witcher 2: Assassins of Kings | 05/17/2011 | 19.99 |
505 | The Witcher: Enhanced Edition | 10/26/2007 | 9.99 |
User input: "The Wandering Knight"
game_id | title |
---|---|
101 | The Witcher 3: Wild Hunt |
303 | Thronebreaker: The Witcher Tales |
404 | The Witcher 2: Assassins of Kings |
505 | The Witcher: Enhanced Edition |
The SQL command above selects and from the table where begins with 'T' and also has at least one word in the title starting with 'W'. This would fetch all games that have similar titles to the user's input.
The / operator is used to remove to return all rows from the first SELECT statement that are not returned by the second SELECT statement.
Note that is available in PostgreSQL and SQL Server, while MINUS is available in MySQL and Oracle (but don't stress about knowing which DBMS supports what exact commands since the interviewers at CD Projekt should be lenient!).
Here's a PostgreSQL example of using EXCEPT to find all of CD Projekt's Facebook video ads with more than 10k views that aren't also being run on YouTube:
If you want to retain duplicates, you can use the EXCEPT ALL operator instead of EXCEPT. The EXCEPT ALL operator will return all rows, including duplicates.
In CD Projekt, game analysts want to evaluate player engagement for each game in order to understand the impact of different features. The user engagement is taken from a ratings system, where users rate their interaction with each feature on a scale of 1 to 5 (5 is the best), and these ratings are incorporated into an engagement score. The function is used to amplify the impact of higher ratings, function is used to understand the distribution of ratings, is used to measure distance from the highest rating, and is to format the final result. Calculate the weighted engagement score with the following formula:
Using the table, create a view organising information by , and , round the to 2 decimal points.
feature_id | game_name | rating | rating_count |
---|---|---|---|
1001 | Cyberpunk 2077 | 5 | 12000 |
2002 | Cyberpunk 2077 | 4 | 8000 |
3003 | The Witcher 3 | 3 | 7000 |
4004 | The Witcher 3 | 2 | 3500 |
5005 | Cyberpunk 2077 | 1 | 2000 |
game_name | feature_id | engagement_score |
---|---|---|
Cyberpunk 2077 | 1001 | 5.00 |
Cyberpunk 2077 | 2002 | 2.50 |
The Witcher 3 | 3003 | 1.43 |
The Witcher 3 | 4004 | 1.14 |
Cyberpunk 2077 | 5005 | 1.00 |
In this query, each of the key math functions (POWER, MOD, ABS) are used in the calculation of the engagement score, which is the central part of the question. The helps to normalize the engagement considering the users' opinions. This engagement score is then rounded to 2 decimal points for easier reading. The results are presented in a new view ordered by and .
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for dealing with ratings and averages or this Alibaba Compressed Mean Question which is similar for its use of mean calculations.
In SQL, both and are used to rank rows within a result set. The key difference between the two functions is how deal with two or more rows having the same value in the ranked column (aka how the break ties).
RANK(): When there's a tie, leaves a gap in the ranking. For example, if three rows are tied for 2nd place, the RANK() function will assign a rank of 2 to the first of these rows, a rank of 3 to the second row, and a rank of 4 to the third row.
DENSE_RANK(): For ties, does not leave a gap in the ranking. Instead, it assigns the same rank to all tied rows, and then makes the next row 1 bigger. Confusing, I know, but here's an example to make it more clear: if three rows are tied for 3rd place, the function will assign a rank of 3 to all three rows, and then assign a rank of 4 to the next row.
CD Projekt is known in the gaming world for its highly successful game "The Witcher 3: Wild Hunt". Suppose you were given a sales dataset with date information. Your task is to write a SQL query that retrieves the total game sales per month for "The Witcher 3: Wild Hunt".
sale_id | game_id | sale_date | units_sold |
---|---|---|---|
1 | 3 | 01/10/2020 | 200 |
2 | 2 | 01/15/2020 | 300 |
3 | 3 | 02/12/2020 | 250 |
4 | 1 | 02/20/2020 | 300 |
5 | 3 | 03/05/2020 | 150 |
6 | 3 | 03/22/2020 | 200 |
game_id | game_title |
---|---|
1 | Cyberpunk 2077 |
2 | The Witcher 2: Assassins of Kings |
3 | The Witcher 3: Wild Hunt |
mth | year | game_title | total_units_sold |
---|---|---|---|
1 | 2020 | The Witcher 3: Wild Hunt | 200 |
2 | 2020 | The Witcher 3: Wild Hunt | 250 |
3 | 2020 | The Witcher 3: Wild Hunt | 350 |
This SQL query extracts the MONTH and YEAR from the and applies a group by clause on them along with the game title, thus giving us a total sales count for each month and year. It joins the sale table with the game table to fetch the title and filters the records for the game "The Witcher 3: Wild Hunt". The results are ordered by year and month.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the CD Projekt SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the earlier CD Projekt SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Netflix, Airbnb, and Amazon.
Each exercise has hints to guide you, fully explained answers along with a discussion board to see how others solved it and best of all, there is an online SQL coding environment so you can right online code up your SQL query answer and have it graded.
To prep for the CD Projekt SQL interview you can also be useful to practice SQL questions from other tech companies like:
But if your SQL skills are weak, don't worry about going right into solving questions – go learn SQL with this SQL interview tutorial.
This tutorial covers things like aggregate window functions and aggregate functions – both of which show up routinely during SQL interviews at CD Projekt.
In addition to SQL interview questions, the other types of problems to practice for the CD Projekt Data Science Interview are:
To prepare for CD Projekt Data Science interviews read the book Ace the Data Science Interview because it's got: