At CyberAgent, SQL is used day-to-day for extracting and analyzing ad performance data, and managing user information for personalized ad targeting, essential in the digital advertising industry. So, it shouldn't surprise you that CyberAgent typically asks SQL coding questions during interviews for Data Science, Data Engineering and Data Analytics jobs.
To help you study for the CyberAgent SQL interview, we've curated 11 CyberAgent SQL interview questions – can you solve them?
CyberAgent is a leading global online advertising agency, and activity analysis is critical. Suppose your role here is a data analyst. CyberAgent has a database table "user_activity", which logs all user activities. However, you are specifically interested in a special group of users - those who have logged in more than 25 times in the past month. Your task is to write a SQL query to identify these "power users".
activity_id | user_id | activity_type | activity_date |
---|---|---|---|
1 | 101 | "login" | 2022-09-01 |
2 | 102 | "login" | 2022-09-01 |
3 | 101 | "login" | 2022-09-02 |
4 | 103 | "view_ad" | 2022-09-02 |
5 | 102 | "login" | 2022-09-03 |
6 | 104 | "login" | 2022-09-04 |
7 | 101 | "login" | 2022-09-04 |
... | ... | ... | ... |
This SQL block selects the user_id and counts the number of login activities for each user in the past month. It then filters out only those users who have logged in more than 25 times. These are the 'power users' for the CyberAgent network.
To solve a similar VIP customer analysis question on DataLemur's free interactive coding environment, try this Microsoft SQL Interview problem:
CyberAgent would like to better understand their product performance by analyzing the monthly average ratings for each product.
For each product, Could you please write a SQL query to calculate the average rating for each product on a monthly basis. The month should be derived from the , it will be scored in the format of YYYYMM.
The table is formatted as below,
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
We hope the output report shows up as below,
month | product | avg_stars |
---|---|---|
202206 | 50001 | 3.50 |
202206 | 69852 | 4.00 |
202207 | 69852 | 2.50 |
In the above SQL query, we first convert into the YYYYMM format using PostgreSQL function. Then, we group the data by and , and we calculate the average for each group. Finally, we order the result by and to get the required output.
Pro Tip: Window functions are a popular SQL interview topic, so practice every window function problem on DataLemur
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1NF, 2NF, 3NF, etc.).
This is typically done to improve the performance of the database when it is being used for online analytics processing (OLAP), rather than online transaction processing (OLTP) use cases.
Denormalization can improve query performance by reducing the number of expensive joins required to retrieve data, but it comes with it's own drawbacks too. By adding redundant columns, you incur more data storage costs, and there's need for more complex update and delete operations in order to maintain data integrity across all the duplicated data. Thus, it's important to carefully consider the trade-offs involved before implementing denormalization.
Given the and SQL tables, you're required to write a query to filter down the client records that reflect a transaction-sum of over 10000 in the year 2022, AND indicate if the client primarily communicates in English.
client_id | name | primary_language |
---|---|---|
101 | John Smith | English |
102 | Marie Gomez | Spanish |
103 | Tom Richardson | English |
104 | Ana Perez | English |
105 | Hiroshi Tanaka | Japanese |
trans_id | client_id | transaction_date | amount |
---|---|---|---|
2001 | 101 | 01/15/2022 | 5000 |
2002 | 102 | 02/17/2022 | 2000 |
2003 | 101 | 03/30/2022 | 6000 |
2004 | 104 | 04/25/2022 | 4000 |
2005 | 103 | 05/01/2022 | 7000 |
In this query, we are joining clients and transactions tables on . We only include the records where lies in the year 2022 and of the client is English. Using group by, we sum up the transaction amounts client-wise. The Having clause then filters those clients for whom this sum is more than 10000.
The operator merges the output of two or more statements into a single result set. It ignores duplicated rows, and makes sure each row in the result set is unique.
For a concrete example, say you were a Data Analyst at CyberAgent working on a Marketing Analytics project. If you needed to get the combined result set of both CyberAgent's Google and Facebook ads you could execute this SQL query:
The operator works in a similar way to combine data from multiple statements, but it differs from the operator when it comes to handling duplicate rows. Whereas filters out duplicates (so if the same ad_name is run on both Facebook and Google, it only shows up once), outputs duplicate rows.
Suppose in CyberAgent, your task is to find out the average duration in days of ad campaigns for each client. For this purpose, make sure the and are both included in the duration calculation. The company maintains a table called with the following structure and some example data:
campaign_id | client_id | start_date | end_date |
---|---|---|---|
101 | 2001 | 07/01/2022 | 07/07/2022 |
102 | 2002 | 07/02/2022 | 07/08/2022 |
103 | 2001 | 07/10/2022 | 07/20/2022 |
104 | 2001 | 07/22/2022 | 07/29/2022 |
105 | 2002 | 07/15/2022 | 07/18/2022 |
We want output in this form:
client_id | avg_duration |
---|---|
2001 | 8.667 |
2002 | 4.5 |
The average duration should be rounded up to three decimal places.
In this SQL query, we used the GROUP BY clause to group our data by clients (client_id). For each group, we calculated the difference between end_date and start_date for every campaign and added 1 to include the end date. The average of those durations was then calculated using the AVG aggregate function. To make it more readable, we rounded the result to three decimal places using the ROUND function.
A cross-join, also known as a cartesian join, is a type of JOIN operation in SQL that creates a new table by combining each row from the first table with every row from the second table. This results in a table with a row for every possible combination of rows from the two input tables.
An example of when this might be useful is if you wanted to first make a dataset containing all possible pairs of customers and products data, in order to later build a Machine Learning model to predict the probability of a customer purchasing a particular product.
However, it is important to note that cross-joins can create very large tables, especially if the input tables have a large number of rows. For example, if you had 10,000 customers and 5,000 different product SKUs, the resulting cross-join would have 50 million rows.
As a data analyst at CyberAgent, your task is to create a SQL query that would allow us to filter all of the customers whose names begin with the string 'Cyb'. Given the table 'customers', how would you structure your SQL query?
Example Input:
customer_id | name | sign_up_date | product_purchased | amount_spent |
---|---|---|---|---|
1001 | CyberGuru | 06/08/2022 | GameX | 500 |
1002 | DigitalAgent | 06/10/2022 | AppY | 200 |
1003 | CyberNinja | 06/18/2022 | GameX | 700 |
1004 | WebMaster | 07/26/2022 | AppY | 300 |
1005 | CodeWarrior | 07/05/2022 | GameX | 600 |
Example Output:
customer_id | name | sign_up_date | product_purchased | amount_spent |
---|---|---|---|---|
1001 | CyberGuru | 06/08/2022 | GameX | 500 |
1003 | CyberNinja | 06/18/2022 | GameX | 700 |
The character is used as a wildcard in SQL and matches any sequence of characters. Therefore this query would return any customers whose names start with the string 'Cyb'.
We have two tables - and . The table records the purchase history of customers with fields , , , and . The table captures login activity of users with fields , , , and .
Write a SQL query to analyze these two databases such that it fetches customer ID (), total number of products purchased (), total amount spent (), the most recent purchase date (), and the number of login sessions () for each customer.
Given the following sample data:
user_id | product_id | purchase_date | purchase_amount |
---|---|---|---|
100 | 300 | 01/05/2022 | 150 |
100 | 200 | 03/07/2022 | 250 |
101 | 300 | 04/09/2022 | 100 |
101 | 400 | 02/06/2022 | 200 |
102 | 500 | 01/07/2022 | 400 |
user_id | session_id | login_date | logout_time |
---|---|---|---|
100 | 6001 | 28/03/2022 | 30/03/2022 |
100 | 6002 | 07/04/2022 | 10/04/2022 |
101 | 6003 | 13/02/2022 | 15/02/2022 |
102 | 6004 | 09/01/2022 | 11/01/2022 |
102 | 6005 | 06/02/2022 | 08/02/2022 |
This PostgreSQL query joins and tables on and aggregates the data per to get the desired output. By using we get and , gives us and provides for each user. The ensures that we don't miss any customers from the table even if they do not have matching entries in the table.
Because joins come up so often during SQL interviews, practice this interactive Snapchat Join SQL question:
If you have two tables and want to retrieve only the rows that are present in both tables, just use the operator!
For example, let's use to find all of CyberAgent's Facebook video ads that are also being run on YouTube:
A game developing department at CyberAgent regularly collects scores/ratings from users who play their games. The team wants to calculate the weighted score to make the scoring more accurate, knowing that the weight of each score is the square root of its frequency.
Use the table which contains users' game scores as shown below:
user_id | game_id | score |
---|---|---|
1 | 100 | 70 |
1 | 101 | 80 |
2 | 100 | 90 |
2 | 100 | 90 |
2 | 101 | 100 |
3 | 100 | 80 |
3 | 101 | 70 |
4 | 100 | 60 |
4 | 101 | 50 |
Assume that each game can be played multiple times by different users and users can play different games. The task is to write a SQL query that calculates the weighted average score for each game.
The weighted score is calculated as follows: , where the frequency is the square root of the number of times the score appears for a game.
game_id | weighted_score |
---|---|
100 | 74.37 |
101 | 75.00 |
Here is a PostgreSQL query that would solve this task:
This query starts with a Common Table Expression (CTE) that groups the table by and and computes the frequency of each score for each game.
The next CTE, , calculates the weighted score which is the product of and square root of its , and also calculates the weight which is the square root of the for each and .
Finally, we group the by and calculate the weighted score using the formula and round it to two decimal places. The clause is used to sort the result by .
To practice a very similar question try this interactive Alibaba Compressed Mean Question which is similar for computation based on frequency or this Alibaba Compressed Mode Question which is similar for analysis based on frequency.
The key to acing a CyberAgent SQL interview is to practice, practice, and then practice some more! In addition to solving the earlier CyberAgent SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Netflix, Google, and Amazon.
Each problem on DataLemur has multiple hints, step-by-step solutions and best of all, there's an online SQL coding environment so you can easily right in the browser your SQL query answer and have it graded.
To prep for the CyberAgent SQL interview it is also useful to practice SQL questions from other tech companies like:
But if your SQL query skills are weak, don't worry about jumping right into solving questions – strengthen your SQL foundations with this SQL interview tutorial.
This tutorial covers SQL concepts such as filtering strings based on patterns and Subqueries – both of which pop up often in CyberAgent SQL assessments.
In addition to SQL query questions, the other types of problems tested in the CyberAgent Data Science Interview are:
The best way to prepare for CyberAgent Data Science interviews is by reading Ace the Data Science Interview. The book's got: