French ad-tech company Criteo uses SQL all the damn time for extracting and analyzing advertising data. That's why Criteo almost always evaluates jobseekers on SQL questions during interviews for Data Analyst, Data Science, and BI jobs.
To help you study for the Criteo SQL interview, here's 11 Criteo SQL interview questions – scroll down to start solving them!
Assume we have a table named with server-side log records for each time an ad was displayed to a user and whether it was clicked on or not. Columns are (the ID of the ad), (the ID of the user to whom the ad was displayed), (the timestamp the ad was displayed), and (a boolean value whether the user clicked on the ad or not).
Your task is to write a SQL query that calculates the running average click rate for each ad by day. The click rate for an ad is defined as the number of clicks divided by the number of times the ad was displayed. The running average click rate for an ad on a specific day is the average click rate for that ad from the start up to that day.
ad_id | user_id | display_time | clicked |
---|---|---|---|
101 | 123 | 06/01/2022 00:00:00 | True |
101 | 265 | 06/01/2022 00:00:00 | False |
102 | 362 | 06/01/2022 00:00:00 | True |
101 | 192 | 06/02/2022 00:00:00 | True |
102 | 981 | 06/02/2022 00:00:00 | False |
101 | 123 | 06/03/2022 00:00:00 | True |
102 | 265 | 06/03/2022 00:00:00 | True |
date | ad_id | running_avg_click_rate |
---|---|---|
06/01/2022 | 101 | 0.50 |
06/01/2022 | 102 | 1.00 |
06/02/2022 | 101 | 0.67 |
06/02/2022 | 102 | 0.50 |
06/03/2022 | 101 | 0.75 |
06/03/2022 | 102 | 0.67 |
In the above query, we first type cast the boolean column to an integer to make it suitable for averaging (with =1 and =0). Then we use the window function with to calculate the running average per . inside the window function clause makes sure the running average is calculated for each day from the start up to that day. Lastly, we order the result by and to get the desired output.
To solve another window function question on DataLemur's free online SQL coding environment, try this Google SQL Interview Question:
Given a table of Criteo employee salary information, write a SQL query to find the 2nd highest salary among all employees.
employee_id | salary |
---|---|
1 | 2500 |
2 | 800 |
3 | 1000 |
4 | 1200 |
second_highest_salary |
---|
1200 |
Test your SQL query for this problem and run your code right in DataLemur's online SQL environment:
You can find a detailed solution here: 2nd Highest Salary.
A primary key is a column (or set of columns) in a table that uniquely identifies each row in the table. It cannot contain null values and must be unique across all rows in the table.
A foreign key is a column (or set of columns) in a table that references the primary key of another table. It is used to establish a relationship between the two tables. A foreign key can contain null values, and multiple rows in the referencing table can reference the same row in the referenced table.
For example, consider a database with two tables: and . The Criteo customers table might have a primary key column called , while the Criteo orders table might have a foreign key column called that references the column in the table. This establishes a relationship between the two tables, such that each row in the orders table corresponds to a specific Criteo customer.
Consider you are a data analyst at Criteo, a company that provides an advertising platform for e-commerce businesses. The business stakeholders want to know how well the ads are performing. They are interested in the click-through rate (CTR) per advertiser. To do this, we need to compute the ratio of total clicks to total views per advertiser.
You have the following two tables:
ad_id | advertiser_id | duration | ad_cost |
---|---|---|---|
1 | 101 | 30 | 100 |
2 | 102 | 15 | 50 |
3 | 101 | 60 | 200 |
4 | 103 | 30 | 150 |
5 | 102 | 45 | 75 |
ad_id | views | clicks |
---|---|---|
1 | 500 | 50 |
2 | 750 | 100 |
3 | 300 | 30 |
4 | 400 | 40 |
5 | 600 | 60 |
Compute the click-through rate (CTR) per advertiser, and order the result by CTR in descending order.
advertiser_id | CTR |
---|---|
101 | 0.114 |
102 | 0.16 |
103 | 0.1 |
You can answer this question by joining the two tables on the column. Afterwards, you group by the and compute the CTR as the sum of divided by the sum of .
This query provides an aggregated ratio of clicks to views, which is the definition of CTR, per advertiser. Ordering the result in descending order gives us the advertisers with the highest CTR at the top.
A primary key is a special type of column or set of columns in a database table that is used to identify each row in the table. It helps to ensure that all the rows are distinct and prevents null values. Additionally, primary keys play a crucial role in maintaining the accuracy of the data by making sure that there are no duplicate rows. They are also used to establish relationships between tables.
To create a primary key in a SQL database, you can use the constraint. For example, in the table , the column is the primary key.
Assume you are an analyst at Criteo, an internet advertising company. You have been given a task to identify all the customers who clicked ads for product category 'Electronics' from the 'USA' but did not make a purchase in the month of 'June 2022'. The Customer behavior is logged in two different tables named 'Customer_Clicks' and 'Customer_Purchases'.
Here is the example data from 'Customer_Clicks' and 'Customer_Purchases':
Example Input:
click_id | customer_id | click_date | product_category | country |
---|---|---|---|---|
101 | 123 | 06/01/2022 00:00:00 | Electronics | USA |
102 | 456 | 06/02/2022 00:00:00 | Furniture | USA |
103 | 789 | 06/18/2022 00:00:00 | Electronics | USA |
104 | 123 | 07/26/2022 00:00:00 | Electronics | UK |
105 | 456 | 07/05/2022 00:00:00 | Electronics | USA |
Example Input:
purchase_id | customer_id | purchase_date | product_category |
---|---|---|---|
501 | 123 | 06/08/2022 00:00:00 | Electronics |
502 | 456 | 06/10/2022 00:00:00 | Furniture |
503 | 789 | 06/18/2022 00:00:00 | Toys |
504 | 123 | 07/26/2022 00:00:00 | Electronics |
505 | 456 | 07/05/2022 00:00:00 | Furniture |
In above PostgresSQL query, the date is cast into date type to extract the 'month' part. This query performs a LEFT JOIN on 'Customer_Clicks' and 'Customer_Purchases' on 'customer_id' to get the complete activity of a customer who clicked 'Electronics' ads from 'USA'. The WHERE clause filters out customers who clicked in 'June 2022', AND either didn't purchase in 'June 2022' or didn't purchase 'Electronics' or didn't make any purchase.
There are several normal forms that define the rules for normalizing a database:
A database is in first normal form (1NF) if it meets the following criteria:
A database is in second normal form (2NF) if it meets the following criteria:
Said another way, to achieve 2NF, besides following all the rules from 1NF all the columns in a given table should be dependent only on that table's primary key.
A database is in third normal form (3NF) if it meets the following criteria:
A transitive dependency means that a piece of data in one column is derived from another column. For example, it wouldn't make sense to keep a column called "user's age" and "user's birthdate" (because age can be derived from birthdate.
While there's also a 4th and 5th normal form, it's too pedantic and hence out-of-scope to know for the Criteo SQL interview.
Criteo is a technology company that enables brands to launch advertisement campaigns based on the product interests of users on the internet. Their job involves managing many advertisement campaigns each day and understanding how each performs is crucial.
Let's say we have a table that contains each click made by users on the different ads. Each ad can be identified by a . The question is: What is the average clicks per campaign?
The table:
click_id | user_id | click_time | campaign_id |
---|---|---|---|
1001 | 989 | 2022-05-11 14:00:00 | 201 |
1002 | 745 | 2022-05-11 14:01:00 | 201 |
1003 | 362 | 2022-05-11 15:00:00 | 201 |
1004 | 621 | 2022-05-12 10:00:00 | 202 |
1005 | 521 | 2022-05-12 11:00:00 | 202 |
1006 | 254 | 2022-05-12 13:00:00 | 203 |
1007 | 122 | 2022-05-13 11:00:00 | 203 |
The PostgreSQL query to find the average clicks per campaign:
This query uses the window function AVG() to calculate the average per . The PARTITION BY clause is used to divide the result set into partitions (groups) that will be processed by the AVG function. This results in separate average calculated for each , thus giving us the average number of clicks per campaign.
To practice a very similar question try this interactive Facebook App Click-through Rate (CTR) Question which is similar for requiring SQL for click-based analytics or this Stripe Repeated Payments Question which is similar for analysis of repeated actions.
Criteo, a digital advertising company, is interested in understanding the click-through-conversion rates for its advertisements. They track when users view the advertisement, when they click on the advertisement, and whether they add a product to their cart after clicking.
Given the below tables containing recent user interaction data, write a SQL query that calculates the click-through rate (CTR) and the click-to-add-to-cart conversion rate.
CTR is defined as the number of advertisement clicks divided by the number of advertisement views, both within a specified time period. The click-to-add-to-cart conversion rate is defined as the number of times a product is added to the cart after a click divided by the total number of clicks, both within a specified time period.
view_id | user_id | view_date | ad_id |
---|---|---|---|
1 | 123 | 2022-08-12 00:00:00 | 001 |
2 | 265 | 2022-08-12 01:20:00 | 002 |
3 | 265 | 2022-08-12 01:25:00 | 001 |
4 | 362 | 2022-08-12 02:45:00 | 002 |
5 | 192 | 2022-08-12 02:55:00 | 001 |
click_id | user_id | click_date | ad_id |
---|---|---|---|
101 | 123 | 2022-08-12 00:05:00 | 001 |
102 | 192 | 2022-08-12 03:00:00 | 001 |
103 | 265 | 2022-08-12 1:30:00 | 002 |
add_id | user_id | add_date | ad_id |
---|---|---|---|
301 | 123 | 2022-08-12 00:10:00 | 001 |
302 | 192 | 2022-08-12 03:05:00 | 001 |
This SQL query calculates the CTR as the count of unique clicks divided by the count of unique views, and the conversion rate as the count of unique add-to-cart events divided by the count of unique clicks. All events are grouped by advertisement id.
To practice another question about calculating rates, solve this TikTok SQL question on DataLemur's online SQL coding environment:
Cross join and natural join are like two sides of a coin in the world of SQL.
Cross joins is like the wild and reckless cousin who creates a giant new table by combining every row from table A with every row from table B, no questions asked, no common key needed.
Natural joins are like the more refined and selective cousin who only combines rows from multiple tables if they have something in common (i.e., common columns/keys).
While cross join doesn't discriminate and will create a massive table if given the chance, natural join is more selective and only returns a table with the number of rows equal to the number of matching rows in the input tables. So, choose your JOIN wisely!
As a data analyst of Criteo, you are required to look after customer records. Due to the increasing number of fraudulent activities recorded recently, the company has asked you to filter out the customer records where the email address ends with any of the following: , or . The customer data is available in the table.
customer_id | registration_date | country | |
---|---|---|---|
101 | john.doe@test.com | 10/05/2020 | USA |
102 | jane.doe@gmail.com | 12/12/2019 | Canada |
103 | sam.fisher@fake.com | 01/06/2021 | UK |
104 | alex.mason@fraud.com | 20/02/2021 | Australia |
105 | adam.jensen@gmail.com | 15/05/2022 | USA |
This can be done by using the keyword in SQL to match a specific pattern or string in the field.
This SQL query will return all the records from the table except those records where the field ends with , , or . The percentage sign denotes wildcard in SQL that matches any sequence of characters.
Make sure to sysnthesise the results and communicate your findings effectively to the relevant stakeholders.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the earlier Criteo SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Google, Microsoft and Silicon Valley startups.
Each problem on DataLemur has hints to guide you, detailed solutions and crucially, there is an online SQL code editor so you can instantly run your query and have it checked.
To prep for the Criteo SQL interview it is also wise to practice SQL problems from other tech companies like:
But if your SQL skills are weak, forget about jumping right into solving questions – improve your SQL foundations with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers things like turning a subquery into a CTE and SUM/AVG window functions – both of these pop up often during Criteo interviews.
Beyond writing SQL queries, the other types of problems to prepare for the Criteo Data Science Interview include:
The best way to prep for Criteo Data Science interviews is to read a Criteo employee's account of what they do day-to-day so you have a sense of the job. The next best thing to read is the book Ace the Data Science Interview.
The book covers 201 interview questions sourced from tech companies like Google & Microsoft. It also has a refresher on Stats, ML, & Data Case Studies. And finally it's helped a TON of people, which is how it's gotten to be an Amazon best-seller with over 1000+ 5-star reviews.