At DoubleVerify, SQL does the heavy lifting for extracting and analyzing ad verification data, and managing data quality checks regarding ad fraud detection and viewability measurements. So, it shouldn't surprise you that DoubleVerify often tests SQL coding questions during interviews for Data Analyst, Data Science, and BI jobs.
To help you ace the DoubleVerify SQL interview, here’s 11 DoubleVerify SQL interview questions – able to answer them all?
DoubleVerify verifies the quality and effectiveness of digital media for brands. Assume you're given a table named and another named .
The table provides information about distinct advertising campaigns, their start and end dates, and the advertiser. The table logs each time an advertisement is shown (an impression).
Each row in the table has the following 4 columns:
campaign_id | start_date | end_date | advertiser |
---|---|---|---|
1 | 01/01/2020 | 05/01/2020 | Brand A |
2 | 05/01/2020 | 09/01/2020 | Brand A |
3 | 03/01/2020 | 07/01/2020 | Brand B |
4 | 04/01/2020 | 08/01/2020 | Brand C |
Each row in the table has the following 5 columns:
impression_id | date | campaign_id |
---|---|---|
1 | 02/01/2020 | 1 |
2 | 03/01/2020 | 1 |
3 | 04/01/2020 | 2 |
4 | 05/01/2020 | 3 |
5 | 06/01/2020 | 3 |
6 | 07/01/2020 | 4 |
Write a SQL query that calculates the number of impressions per campaign per day. The output should include the days with zero impressions. Round the number of impressions to the nearest integer without using a ROUND() function.
campaign_id | date | impressions |
---|---|---|
1 | 01/01/2020 | 0 |
1 | 02/01/2020 | 1 |
1 | 03/01/2020 | 1 |
1 | 04/01/2020 | 0 |
1 | 05/01/2020 | 0 |
2 | 05/01/2020 | 0 |
2 | 06/01/2020 | 0 |
2 | 07/01/2020 | 0 |
2 | 08/01/2020 | 0 |
2 | 09/01/2020 | 1 |
3 | 03/01/2020 | 0 |
3 | 04/01/2020 | 0 |
3 | 05/01/2020 | 1 |
3 | 06/01/2020 | 1 |
3 | 07/01/2020 | 0 |
4 | 04/01/2020 | 0 |
4 | 05/01/2020 | 0 |
4 | 06/01/2020 | 0 |
4 | 07/01/2020 | 1 |
4 | 08/01/2020 | 0 |
This query creates a series of dates for each campaign that spans from the start date to the end date. It then calculates the daily impressions for each date within the range. The COALESCE function is used to substitute null values with 0 in case there are days with no impressions.
p.s. Window functions show up pretty frequently during SQL interviews, so practice the 27+ window function questions on DataLemur
Given that DoubleVerify is a company focused on digital advertising, an appropriate question might involve analyzing and understanding advertisement engagement.
Let's consider a scenario where DoubleVerify wants to understand the daily engagement of each ad campaign and the total cost per campaign. The analyst is particularly interested in finding the top three campaigns with the highest costs and the average daily engagement for the past month.
Assume we have two database tables: and .
campaign_id | campaign_name | cost |
---|---|---|
1 | DV Launch | 5000 |
2 | Autumn Sale | 8000 |
3 | Black Friday | 10000 |
4 | Holiday Deal | 4000 |
5 | Spring Clearance | 3000 |
engagement_id | campaign_id | engagement_date | daily_engagements |
---|---|---|---|
1 | 1 | 01/12/2022 | 1000 |
2 | 1 | 02/12/2022 | 1500 |
3 | 2 | 01/12/2022 | 800 |
4 | 2 | 02/12/2022 | 1200 |
5 | 3 | 01/12/2022 | 2000 |
6 | 3 | 02/12/2022 | 2500 |
7 | 4 | 01/12/2022 | 500 |
8 | 4 | 02/12/2022 | 700 |
9 | 5 | 01/12/2022 | 300 |
10 | 5 | 02/12/2022 | 400 |
The goal is to write a SQL query to calculate the average daily engagements and total cost for each campaign, then identify the top three costly campaigns in descending order of cost.
This query first joins the and tables on the column. It then filters out ad engagements that happened within the past month. Using the operation, the query then calculates the average daily engagements and total cost for each campaign over the past month. Finally, the and operations are used to find the top three costliest ad campaigns.
The and operators are both used to filter data based on certain criteria, but they work in different ways. is used to select values within a range, while is used to select values that match a list.
For instance, if you have a table called that contains the salary of each employee, along with which country they reside in, you could use the operator to find all employees who make between 160k:
To find all employees that reside in France and Germany, you could use the operator:
DoubleVerify is an organization that provides a variety of measurements to ensure the quality and effectiveness of digital advertising. Suppose you are asked to calculate the average fraud score per advertiser over the past year based on the data in the table.
The table has the following schema:
advertiser_id | ad_id | timestamp | fraud_score |
---|---|---|---|
101 | 450 | 2022-08-01 14:05:00 | 65 |
101 | 451 | 2022-05-02 08:10:00 | 33 |
105 | 452 | 2021-12-24 16:15:00 | 10 |
105 | 453 | 2022-02-01 12:00:00 | 15 |
102 | 454 | 2022-07-01 10:00:00 | 89 |
The task is to find the average per in the table.
This SQL query calculates the average fraud scores by grouping all the records based on the using the clause. The function is then used to compute the average fraud score for all the records within each group.
To practice a very similar question try this interactive Facebook Advertiser Status Question which is similar for requiring SQL calculations involving advertisers data or this Amazon Average Review Ratings Question which is similar for requiring calculating average ratings, similar to fraud scores.
A database view is a virtual table that is created based on the results of a SELECT statement, and provides you a customized, read-only version of your data that you can query just like a regular table.
Views in SQL can help you enforce data security requirements by hiding sensitive data from certain users, and can improve performance for some queries by pre-computing the results for an intermediate step and storing them in a view (which can be faster than executing the intermediate query each time). However, their read-only nature means that on any underlying update, the view has to be re-computed.
DoubleVerify, is a company that ensures the quality and effectiveness of digital advertising. Let's assume they grant quality scores to each ad impression. Providing advertisers with a deeper insight, formulate a question that calculates the average quality score per campaign on a monthly basis for 2022.
Here's our sample 'ad_impressions' table:
impression_id | campaign_id | impression_date | quality_score |
---|---|---|---|
8341 | 305 | 01/05/2022 00:00:00 | 87 |
9802 | 562 | 01/13/2022 00:00:00 | 78 |
8333 | 305 | 02/21/2022 00:00:00 | 90 |
6962 | 785 | 03/26/2022 00:00:00 | 82 |
4517 | 785 | 03/25/2022 00:00:00 | 85 |
We want our output to resemble the following format:
mth | campaign | avg_quality_score |
---|---|---|
1 | 305 | 87.00 |
1 | 562 | 78.00 |
2 | 305 | 90.00 |
3 | 785 | 83.50 |
Here is a PostgreSQL query that solves this:
This query first filters out the impressions for the year 2022. Then, it groups impressions by month and campaign, calculating the average quality score for each grouping. Finally, it orders the result by month and the average quality score in descending order, so one could easily spot which campaign performed the best in each month.
{#Question-7}
Some similarities between unique and non-unique indexes include:
Some differences between unique and non-unique indexes include:
DoubleVerify is a company that builds software to measure and improve digital ad engagement. You are given a table which contains digital ad campaigns run by DoubleVerify. Each record has , , , , and . Your task is to find all campaigns that contain the word "Boost" in their name.
campaign_id | campaign_name | start_date | "end_date** | customer_name |
---|---|---|---|---|
1 | "Digital Boost" | 06/08/2022 00:00:00 | 06/18/2022 00:00:00 | "Walmart" |
2 | "Summer Sale" | 06/10/2022 00:00:00 | 06/25/2022 00:00:00 | "Amazon" |
3 | "Boost Performance" | 06/15/2022 00:00:00 | 06/30/2022 00:00:00 | "Ebay" |
4 | "Holiday Savings" | 06/20/2022 00:00:00 | 07/05/2022 00:00:00 | "Best Buy" |
5 | "Back to School Boost" | 06/25/2022 00:00:00 | 07/10/2022 00:00:00 | "Target" |
This PostgreSQL query should return all the rows in the campaigns table where the campaign_name contains the word "Boost". The "%" wildcard character on either side of "Boost", along with LIKE, will match any string that contains "Boost" anywhere within it.
campaign_id | campaign_name | start_date | "end_date** | customer_name |
---|---|---|---|---|
1 | "Digital Boost" | 06/08/2022 00:00:00 | 06/18/2022 00:00:00 | "Walmart" |
3 | "Boost Performance" | 06/15/2022 00:00:00 | 06/30/2022 00:00:00 | "Ebay" |
5 | "Back to School Boost" | 06/25/2022 00:00:00 | 07/10/2022 00:00:00 | "Target" |
Assume that you are a data analyst for DoubleVerify and you are asked to analyze customer data in respects to their orders. We have two tables, and .
The table consists of columns: , , , , and . The table contains: , , , , and .
Write a SQL query to find the top 5 customers who have placed the most orders, indicating the full name, email, and the total quantity of all orders placed.
Please format your result in the following manner:
customer_id | first_name | last_name | signup_date | |
---|---|---|---|---|
1 | John | Doe | john.doe@mail.com | 2018-04-07 |
2 | Jane | Smith | jane.smith@mail.com | 2016-02-29 |
3 | Mary | Johnson | mary.johnson@mail.com | 2020-11-20 |
4 | James | Brown | james.brown@mail.com | 2019-07-08 |
5 | Patricia | Jones | patricia.jones@mail.com | 2021-01-15 |
order_id | customer_id | product_id | order_date | quantity |
---|---|---|---|---|
1001 | 1 | 50001 | 2021-10-17 | 2 |
1002 | 2 | 69852 | 2020-02-11 | 1 |
1003 | 2 | 50001 | 2020-05-19 | 3 |
1004 | 1 | 69852 | 2020-10-20 | 5 |
1005 | 3 | 50001 | 2021-04-22 | 2 |
1006 | 4 | 69852 | 2019-12-12 | 1 |
1007 | 2 | 50001 | 2021-01-03 | 4 |
1008 | 5 | 69852 | 2022-02-07 | 4 |
This SQL query joins the table with the table on the and summarizes (using ) the total quantity of orders that each customer has placed. It then orders the result in descending order of and limits the result to the top 5 customers. It selects the full name of the customer, their email, and the total quantity of orders placed.
The function concatenates the and to produce a full name.
Because joins come up so often during SQL interviews, take a stab at this SQL join question from Spotify:
The primary key of a table is a column or set of columns that serves as a unique identifier for each row. It ensures that all rows are distinct and does not allow null values.
For example, say you had stored some Facebook ad campaign data that DoubleVerify ran:
The column uniquely identifies each row in the table, and the PRIMARY KEY constraint ensures that no two rows have the same . This helps to maintain the integrity of the data in the table by preventing duplicate rows.
The primary key is also an important part of the table because it allows you to easily identify and reference specific campaigns in your Facebook Ad data. You can use it to join to other tables in the database, such as a table containing data on the results of the campaigns.
As an Ad Verification Company, DoubleVerify collects information about the impressions/ad views. They have asked you to write a query that calculates a "compression" metric for each campaign_id. The "compression" metric is defined as the absolute difference between the maximum and minimum view_counts, divided by the maximum view_counts for each campaign, rounding to 2 decimal places.
This metric gives an understanding of how evenly an ad campaign's impressions are across different advertisers. Provide a SQL query that calculates this "compression" metric.
campaign_id | advertiser_id | view_counts |
---|---|---|
1 | 143 | 4250 |
1 | 362 | 3800 |
1 | 687 | 4500 |
2 | 123 | 3500 |
2 | 224 | 3600 |
2 | 785 | 3700 |
3 | 411 | 2500 |
3 | 654 | 3000 |
campaign_id | compression_metric |
---|---|
1 | 0.16 |
2 | 0.06 |
3 | 0.20 |
This SQL query uses the GROUP BY clause to calculate statistics for each campaign_id. It then uses MAX(), MIN(), ABS(), ROUND() and basic arithmetic operators to compute the compression metric. The ABS() function ensures that we get a positive difference, MAX(view_counts) is used as the denominator as per the definition of the metric, and the whole expression is rounded to 2 decimal places using ROUND(). The USE of NUMERIC typecast is to ensure proper division.
To practice a very similar question try this interactive Alibaba Compressed Mean Question which is similar for it also requires performing calculations with aggregated data, rounding results, and dealing with multi-column data or this JP Morgan Chase Cards Issued Difference Question which is similar for it also involves computing differences between maximum and minimum values within a grouping.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Beyond just solving the above DoubleVerify SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Netflix, Google, and Amazon.
Each DataLemur SQL question has hints to guide you, full answers and most importantly, there's an interactive coding environment so you can right in the browser run your SQL query and have it executed.
To prep for the DoubleVerify SQL interview you can also be helpful to practice SQL problems from other tech companies like:
In case your SQL foundations are weak, don't worry about going right into solving questions – strengthen your SQL foundations with this SQL tutorial for Data Analytics.
This tutorial covers topics including CASE/WHEN statements and joining multiple tables – both of these show up routinely during DoubleVerify interviews.
Besides SQL interview questions, the other question categories to prepare for the DoubleVerify Data Science Interview are:
To prepare for DoubleVerify Data Science interviews read the book Ace the Data Science Interview because it's got: