At Electronic Arts (EA), SQL is used day-to-day for analyzing gamer behavior patterns and optimizing in-game purchase recommendations. That's why EA almost always asks SQL questions in interviews for Data Science, Analytics, and & Data Engineering jobs.
As such, to help you study for the EA SQL interview, this blog covers 11 Electronic Arts SQL interview questions – able to answer them all?
As a business analyst for Electronic Arts (EA), your task is to identify the power users. Power users are those who play EA games most frequently. These users are identified with the maximum number of total playing hours in the last three months.
In this scenario, suppose we have two tables 'users' and 'gameplay_logs'. The 'users' table has a row for each user with user details, and the 'gameplay_logs' table has a row for each gaming session.
user_id | first_name | last_name | |
---|---|---|---|
1 | John | Doe | john.doe@example.com |
2 | Jane | Doe | jane.doe@example.com |
3 | Tom | Smith | tom.smith@example.com |
4 | Mark | Brown | mark.brown@example.com |
log_id | user_id | game_id | start_time | end_time |
---|---|---|---|---|
1 | 1 | 100 | [2022-07-01 10:00:00] | [2022-07-01 14:00:00] |
2 | 2 | 200 | [2022-07-01 10:00:00] | [2022-07-01 12:00:00] |
3 | 2 | 100 | [2022-07-01 12:00:00] | [2022-07-01 15:00:00] |
Now, write a SQL query to find the "user_id", "first_name", "last_name", the total hours played by each user in the last three months. Display the top 5 power users.
Below is an interpretation of the SQL query:
The query displays the user details along with total hours played in the last 3 months using the PostgreSQL function to calculate the total hours played by each user. The Gameplay logs of the last 3 months are selected by applying the condition. is used to combine rows from two or more tables, based on a related column between them which is 'user_id' in this case. The statement groups the users that have the same 'user_id'. The statement sorts the users in descending order by the total hours played. Finally, is used to restrict the output to the top 5 users.
To practice a similar power-user data analysis problem question on DataLemur's free interactive SQL code editor, try this recently asked Microsoft SQL interview question:
As an analyst for EA, we want to analyze the review data for our games (products). To accomplish this, write a SQL query to calculate the average star rating for each product for every month. The table name is containing columns: , , , , .
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
1 | 123 | 2022-04-01 00:00:00 | 50001 | 5 |
2 | 265 | 2022-04-04 00:00:00 | 69852 | 4 |
3 | 362 | 2022-04-21 00:00:00 | 50001 | 3 |
4 | 192 | 2022-05-05 00:00:00 | 69852 | 2 |
5 | 981 | 2022-05-15 00:00:00 | 69852 | 5 |
6 | 782 | 2022-05-28 00:00:00 | 50001 | 4 |
month | product_id | avg_stars |
---|---|---|
4 | 50001 | 4.00 |
4 | 69852 | 4.00 |
5 | 69852 | 3.50 |
5 | 50001 | 4.00 |
The SQL query above calculates and displays the average number of stars assigned to each product every month. The EXTRACT function retrieves the month from the 'submit_date' of each review, and AVG, the window function, averages the review stars for each product monthly. The ORDER BY clause then sorts the records by 'month' and 'product_id'.
p.s. Window functions show up super often during SQL interviews, so practice the 27+ window function questions on DataLemur
A self-join is a operation in which a single table is joined to itself. To perform a self-join, you must specify the table name twice in the clause, giving each instance a different alias. You can then join the two instances of the table using a clause, and specify the relationship between the rows in a clause.
Think of using a self-joins whenever your data analysis involves analyzie pairs of the same things, like comparing the salaries of employees within the same department, or identifying pairs of products that are frequently purchased together (which you can do in this real SQL question from a Walmart interview).
For another self-join example, suppose you were conducting an HR analytics project and needed to examine how frequently employees within the same department at EA interact with one another, you could use a self-join query like the following to retrieve all pairs of EA employees who work in the same department:
This query returns all pairs of EA employees who work in the same department, and excludes pairs where the employee's id is the same (since this would represent the same EA employee being paired with themselves).
Electronic Arts (known as EA) is a leading company in the gaming industry. Assume you are an engineer working at EA and you've been provided with separate datasets that capture user demographics, their game purchasing history, and their activity in playing the games.
The 'user' dataset includes the user_id, user's age, gender, and location. The 'purchase' dataset contains the purchase_id, user_id, game_id, and purchase_date. The 'activity' dataset includes the activity_id, game_id, user_id, and the hours_played.
Design the tables for these datasets, define their relationships, and provide a PostgreSQL query that returns the total hours played per game for each user.
user_id | age | gender | location |
---|---|---|---|
1 | 24 | M | USA |
2 | 32 | F | UK |
3 | 28 | M | Canada |
purchase_id | user_id | game_id | purchase_date |
---|---|---|---|
101 | 1 | 501 | 2022-01-05 |
102 | 2 | 502 | 2022-03-09 |
103 | 1 | 503 | 2022-05-23 |
activity_id | game_id | user_id | hours_played |
---|---|---|---|
1001 | 501 | 1 | 10.5 |
1002 | 502 | 2 | 15.0 |
1003 | 503 | 1 | 8.0 |
The above PostgreSQL script first creates three tables: 'users', 'purchase', and 'activity'. The 'users' table is referenced by the 'purchase' and 'activity' tables via the user_id column, and the 'purchase' table is referenced by the 'activity' table via the game_id column. This creates relations between the tables.
The provided query then uses the clause to group records with the same user_id and game_id and the function to calculate the total hours played by each user for each game.
A correlated sub-query is a sub-query that depends on the outer query and cannot be executed separately. It uses the outer query to filter or transform data by referencing a column from it, and the outer query uses the results of the inner query. In contrast, a non-correlated sub-query is independent of the outer query and can be executed on its own. It does not reference any columns from the outer query and is used to retrieve additional data needed by the outer query.
Here is an example of a correlated sub-query:
This query selects the and total sales of all EA customers in the sales table whose are greater than the average of their own customer group. The sub-query in this case is correlated with the outer query, as it references the column from the outer query ().
Here is an example of a non-correlated sub-query:
The sub-query in this case is non-correlated, as it does not reference any columns from the outer query.
Performance-wise, correlated sub-queries are generally slower to execute, as they have to be re-evaluated for each row of the outer query, while non-correlated sub-queries are faster, as they only have to be executed once.
As a data analyst at EA (Electronic Arts), you are tasked with understanding the average rating of each game per year based on user reviews. The aim is to identify how well each game is perceived by the users in order to furnish evidence-based insights for future game development. For this purpose, you have a table , where each row represents a user review.
The table has the following columns:
Find the average rating for each game per year.
review_id | game_id | user_id | review_date | rating |
---|---|---|---|---|
691 | 10 | 2071 | 2018-11-23 | 4 |
940 | 12 | 3342 | 2018-12-09 | 3 |
392 | 10 | 2546 | 2019-01-25 | 5 |
718 | 13 | 1783 | 2019-03-04 | 2 |
202 | 12 | 1045 | 2020-06-15 | 3 |
year | game_id | avg_rating |
---|---|---|
2018 | 10 | 4.00 |
2018 | 12 | 3.00 |
2019 | 10 | 5.00 |
2019 | 13 | 2.00 |
2020 | 12 | 3.00 |
This query uses the function to get the year from the , then groups the data by year and game_id. It uses the function to calculate the average rating for each group. The query also orders the result by year and game_id.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for calculating average ratings or this Facebook Average Post Hiatus (Part 1) Question which is similar for aggregating data over time.
Similar to the and / operators, the PostgreSQL INTERSECT operator combines result sets of two or more statements into a single result set. However, only returns the rows that are in BOTH select statements.
For a concrete example, say you were on the Sales Analytics team at EA, and had data on sales leads exported from both HubSpot and Salesforce CRMs in two different tables. To write a query to analyze leads created after 2023 started, that show up in both CRMs, you would use the command:
As an analyst in EA, you are given a task to evaluate customer feedback. You have data about customer reviews for EA games. Each game has a unique product_id. The review details are stored in the reviews table. Your task is to provide a month-to-month average star rating for each game.
Assume that reviews are collected over a year and a review once submitted cannot be modified.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 09/01/2022 | 50001 | 4 |
7802 | 265 | 09/05/2022 | 69852 | 4 |
5293 | 362 | 10/18/2022 | 50001 | 3 |
6352 | 192 | 11/05/2022 | 69852 | 3 |
4517 | 981 | 12/20/2022 | 50001 | 5 |
month | game | avg_stars |
---|---|---|
9 | 50001 | 4.00 |
9 | 69852 | 4.00 |
10 | 50001 | 3.00 |
11 | 69852 | 3.00 |
12 | 50001 | 5.00 |
This query takes the , extracts the month from it and groups by that month along with the (game). It then averages the star ratings for each group (month and game combination). This give us the average star rating for each game on a month-to-month basis. It finally orders the results by month and game for easier interpretation.
Write a SQL query to find the total amount of in-game purchases executed by each player, and display it along with the player's basic information.
To analyse these statistics, you'd need to join the and tables for the EA database.
Sample data for your problem is as follows:
user_id | username | signup_date | country |
---|---|---|---|
1 | GameMaster | 01/02/2020 | USA |
2 | ShadowLord | 02/03/2021 | UK |
3 | MagicMiss | 10/11/2020 | France |
transaction_id | user_id | game_id | purchase_date | amount |
---|---|---|---|---|
1001 | 1 | 101 | 01/10/2020 | 10.99 |
1002 | 1 | 102 | 01/12/2020 | 29.99 |
1003 | 2 | 101 | 03/03/2021 | 10.99 |
1004 | 3 | 102 | 11/11/2020 | 29.99 |
The PostgreSQL query to solve this problem would be as follows:
This query works by joining the and tables on the column. It then groups the results by and . The operation calculates the total amount of purchases for each user. The result is a table displaying each user's id, their username, and the total amount they've spent on in-game purchases.
Because joins come up so often during SQL interviews, try an interactive Spotify JOIN SQL question:
The operator merges the output of two or more statements into a single result set. It ignores duplicated rows, and makes sure each row in the result set is unique.
For a concrete example, say you were a Data Analyst at EA working on a Marketing Analytics project. If you needed to get the combined result set of both EA's Google and Facebook ads you could execute this SQL query:
The operator works in a similar way to combine data from multiple statements, but it differs from the operator when it comes to handling duplicate rows. Whereas filters out duplicates (so if the same ad_name is run on both Facebook and Google, it only shows up once), outputs duplicate rows.
The company EA (Electronic Arts) wishes to know the overall average rating for each of their video games. The ratings are given out of 5 stars and the average needs to be calculated for each month. Because EA prefers simplified figures for reporting, the average rating has to be rounded to the nearest whole number.
The table keeps a track of each user's review for different games each month. Following is a snapshot of the table:
Now, EA want you to write a PostgreSQL query that will return the month of the review, the of the game, and the rounded average rating (stars) for that game for that month.
The SQL query that will solve this problem can be written as:
The above SQL command first extracts the month from the and then groups the table based on the extracted month and . It then calculates the average of for these grouped entries and then rounds off these values. The order of the returned rows is set by the month of the review.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for calculating monthly average ratings or this Alibaba Compressed Mean Question which is similar for round-off to nearest whole number.
The key to acing a EA SQL interview is to practice, practice, and then practice some more! In addition to solving the above EA SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Microsoft, Google, Amazon, and tech startups.
Each exercise has hints to guide you, full answers and best of all, there's an online SQL coding environment so you can right in the browser run your SQL query answer and have it executed.
To prep for the EA SQL interview you can also be useful to practice interview questions from other tech companies like:
In case your SQL coding skills are weak, forget about diving straight into solving questions – strengthen your SQL foundations with this free SQL tutorial.
This tutorial covers topics including filtering data with WHERE and joining a table to itself – both of which show up frequently during EA SQL interviews.
In addition to SQL interview questions, the other types of problems to prepare for the EA Data Science Interview are:
The best way to prepare for EA Data Science interviews is by reading Ace the Data Science Interview. The book's got: