10 Cloudflare SQL Interview Questions (Updated 2025)

(Ex-Facebook & Best-Selling Data Science Author)

Updated on

February 11, 2025

At Cloudflare, SQL is used across the company for analyzing network security data and for analyzing web traffic data. They even offer an external service called D1 which allows developers to create server-less databases. Unsurprisingly this is why Cloudflare almost always asks SQL questions during interviews for Data Science, Data Engineering and Data Analytics jobs.

So, to help you prepare for the Cloudflare SQL interview, we've curated 10 Cloudflare SQL interview questions – can you solve them?

Cloudflare SQL Interview Questions

10 Cloudflare SQL Interview Questions

SQL Question 1: Analyze Traffic Data

Let's say you are given a table where each row includes a timestamp of a request reaching Cloudflare's edge, its status code, and the corresponding Cloudflare data center (POP). Write a SQL query to find the total number of requests, along with the percentages of each status code (200, 403, 500, etc.) relative to the total from each POP for the last 30 days. Assume today is '2022-09-01'.

Example Input:

timestamp	pop	status_code
08/08/2022 09:10:13	SFO	200
08/13/2022 11:55:48	LAX	403
08/20/2022 16:31:07	NYC	500
08/28/2022 06:11:19	SFO	200
08/29/2022 23:55:55	LAX	500
08/25/2022 14:20:48	NYC	200

Expected output:

pop	request_count	status_200_perc	status_403_perc	status_500_perc
SFO	2	100.00	0.00	0.00
LAX	2	0.00	50.00	50.00
NYC	2	50.00	0.00	50.00

Answer:

This query utilizes a common table expression (CTE) to get the total number of requests for each POP for the last 30 days. In the main query, we use conditional aggregation to calculate the percentage of each status code by using a case when condition inside the COUNT function. The percentages are calculated as the ratio of the count of a specific status code to the total request count for each POP, which is then multiplied by 100 and rounded to two decimal points for readability.

To practice a similar window function question on DataLemur's free interactive SQL code editor, try this Google SQL Interview Question:

SQL Question 2: Analyze Traffic Data

Assume you are a Data Scientist at Cloudlfare, and have the following two tables:

A table, where each row represents a user with a unique user_id, the country they are from, and the date they created their account.
A table, where each row represents a single page view.

Please analyze the median page views per day for each country for the month of September 2022.

Example Input:

user_id	country	account_creation_date
1	USA	2022-01-01
2	USA	2022-02-01
3	Germany	2022-03-01
4	Germany	2022-01-01
5	India	2022-02-01

Example Input:

user_id	date	page_views
1	2022-09-01	3
2	2022-09-01	5
3	2022-09-01	2
1	2022-09-02	4
5	2022-09-02	1
4	2022-09-03	3
2	2022-09-03	6
1	2022-09-04	1
3	2022-09-04	2
5	2022-09-04	3

The solution requires joining the table with the table while aggregating the median page views per day by country for September 2022.

Answer:

This SQL block first creates a subquery that represents the page views for each day by a user. Then, it joins this processed data with the table to track the country affiliation of the users. The percentile_cont function is finally used to calculate the median page views for each country. This method is accurate even when dealing with data skews due to spatial and time variances.

SQL Question 3: What would you do to optimize a SQL query that was running slow?

There's several steps you can take to troubleshoot a slow SQL query.

First things first, figure out why the query is slow! You can use and commands in PostgreSQL to identify any performance bottlenecks. After that, you can start changing your query, depending on what the source of the query performance issue is.

Generally, indexes can help speed up queries. Also de-normalizing your tables might help, to remove slow joins.

Lastly, you could always just upgrade your hardware! Time and money wasted improving query performance could just be better spent on mroe powerful database servers!

While this is a very surface-level approach to optimizing SQL query performance, it should do the trick for Data Analyst interviews and Data Science interviews at Cloudflare. Data Engineers should know a bit more about the and before the interview.

Cloudflare SQL Interview Questions

SQL Question 4: Calculate the Average Amount of Data Processed Per Day

Cloudflare is an American web-infrastructure and website-security company that provides content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services. You are given a table named 'data_processed' that consists of the number of gigabytes of data processed by Cloudflare each day. Write a query to find the average amount of data processed per day over a given month.

Example Input:

id	date	data_gb
1	2020-07-01	50
2	2020-07-02	54
3	2020-07-03	70
4	2020-07-04	32
5	2020-07-05	64
6	2020-07-06	46
7	2020-07-07	57
8	2020-07-08	60
9	2020-07-09	55
10	2020-07-10	56

Example Output:

month	avg_data_gb
7	54.4

Answer:

This PostgreSQL query first extracts the month from the date using the EXTRACT function. The condition in the WHERE clause limits the data selection to the 7th month. The AVG function is then used, averaging the 'data_gb' from the gathered data. The result is grouped by the 'month', returning the average number of gigabytes processed per day for July.

To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating total data processing or this Alibaba Compressed Mean Question which is similar for calculating average data value.

SQL Question 5: What is denormalization?

Denormalization is a technique used to improve the read performance of a database, typically at the expense of some write performance.

By adding redundant copies of data or grouping data together in a way that does not follow normalization rules, denormalization improves the performance and scalability of a database by eliminating costly join operations, which is important for OLAP use cases that are read-heavy and have minimal updates/inserts.

SQL Question 6: Calculate the Average Data Traffic per Service

Cloudflare is a web-infrastructure and website-security company, providing content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services. Suppose you have been hired as a data analyst at Cloudflare, and your task for today is to calculate the average data traffic per service (product) on a monthly basis for the year 2022. This will provide important insights into the most data-intensive services Cloudflare provides.

For this task, assume that we have a table, , that contains the following columns:

: the ID of the service
: the ID of the user using the service
: the amount of data traffic this user has generated on this service
: the date when the traffic was recorded

Example Input:

service_id	user_id	data_usage (GB)	usage_date
101	111	10	02/01/2022
101	112	15	02/02/2022
102	111	200	03/15/2022
101	113	5	02/06/2022
101	111	15	03/01/2022
103	114	300	01/21/2022
102	115	100	01/11/2022
103	116	350	02/28/2022

Our goal is to create a result table with the following structure:

: the month
: the ID of the service
: the average data traffic per service for this month

Example Output:

month	service_id	average_data (GB)
1	102	100.00
1	103	300.00
2	101	12.50
2	103	350.00
3	101	15.00
3	102	200.00

Answer:

Given the problem, here's the PostgreSQL query we could use to calculate the average data traffic by service per month:

This query first filters data from the year 2022. Then, with the use of the clause, it aggregates data traffic per month and per service. The function is used to calculate the average data traffic. The clause sorts the output by month and service_id.

SQL Question 7: Can you explain the difference between a foreign and primary key in a database?

To explain the difference between a primary key and foreign key, let's start with some data from Cloudflare's marketing analytics database which stores data from Google Ads campaigns:

+------------+------------+------------+------------+ | ad_id | campaign_id| keyword | click_count| +------------+------------+------------+------------+ | 1 | 100 | Cloudflare pricing | 10 | | 2 | 100 | Cloudflare reviews | 15 | | 3 | 101 | Cloudflare alternatives | 7 | | 4 | 101 | buy Cloudflare | 12 | +------------+------------+------------+------------+

In this table, could be a primary key. It is unique for each row in the table and cannot contain null values.

could be a foreign key. It references the of the Google Ads campaign that each ad belongs to, establishing a relationship between the ads and their campaigns. This foreign key allows you to easily query the table to find out which ads belong to a specific campaign, or to find out which campaigns a specific ad belongs to.

It is also possible for a table to have multiple foreign keys that reference different primary keys in other tables. For example, the table could have additional foreign keys for the of the ad group that each ad belongs to, and the of the Google Ads account that the campaigns belong to.

SQL Question 8: Find Customers using Cloudfare Services in specific cities

There is a large database of customer records in the Cloudflare company. Your task is to find all customers that are using Cloudflare services in cities starting with the letter 'S'.

Example Input:

customer_id	first_name	last_name	service_id	city
1	John	Smith	101	San Francisco
2	Jane	Doe	102	Seattle
3	Tom	Brown	103	Sacramento
4	Emma	Johnson	101	New York
5	Harry	White	102	Boston

Example Input:

service_id	service_name
101	DNS
102	DDoS Protection
103	Content Delivery Network

We want to find customers using Cloudfare services in cities starting with the letter 'S' so the output should be:

Example Output:

customer_id	first_name	last_name	service_name	city
1	John	Smith	DNS	San Francisco
2	Jane	Doe	DDoS Protection	Seattle
3	Tom	Brown	Content Delivery Network	Sacramento

Answer:

The above SQL query will join the 'customers' table with the 'services' table on 'service_id'. When filtering data from the 'customers' table, it uses the LIKE keyword to search for customers in cities that start with the letter 'S'. The '%' character in the LIKE pattern means any sequence of characters. Therefore, 'S%' will match any city names that begin with 'S'.

SQL Question 9: Joining and Analyzing Customer and Product Tables

Suppose you are given two tables, and . The table contains information about the company's customers, including their , , , and . The table includes details about the products offered by the company, such as the , , and .

Your task is to write a SQL query that provides a list of customers along with the respective product name that they are on.

Example Input:

customer_id	name	signup_date	plan_id
8732	John Doe	2021-11-03	1001
4756	Jane Smith	2021-03-05	1002
9782	Mary Johnson	2021-07-15	1001
1635	James Brown	2021-09-20	1003

Example Input:

product_id	plan_id	product_name
50001	1001	Basic
69852	1002	Advanced
92370	1003	Premium

Answer:

The PostgreSQL query above uses the JOIN clause to combine rows from and based on the common field between the two, which is . The output of this query will be a new table that contains the , (from the table), and (from the table) for each customer.

Since joins come up so often during SQL interviews, try an interactive Spotify JOIN SQL question:

SQL Question 10: What is denormalization, and in what situations might it be a useful?

Database denormalization is when you add redundancy to a database, and break typical normalization rules (specified by 1st, 2nd, 3rd normal forms). There's a few reasons to denormalize a database:

Improved performance: Denormalization often reduces the the number of costly join operations that are needed to retrieve data. This is helpful when the database is being used for OLAP (Online Analytical Processing) use cases, as joins can be expensive and slow.
Simplification: Denormalization can also be used to simplify the design of a database by reducing the number of tables and relationships that need to be managed. This can make it easier to understand and maintain the database.
Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.

While denormalization can be a useful tool for improving performance and scalability, it's important to keep in mind that it can make update and delete operations more complex. This is because denormalization can create data duplicates, which can make it harder to maintain data integrity. So, to avoid any headaches, it's a good idea to start with a well-normalized database design and then consider denormalization only if it's absolutely necessary for your specific performance and scalability needs. In other words, denormalization can be a bit of a wild card, so it's best to handle it with care!

Preparing For The Cloudflare SQL Interview

The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Besides solving the above Cloudflare SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Netflix, Google, and Amazon.

Each DataLemur SQL question has hints to guide you, detailed solutions and most importantly, there is an online SQL coding environment so you can easily right in the browser your SQL query and have it executed.

To prep for the Cloudflare SQL interview it is also wise to solve SQL questions from other tech companies like:

But if your SQL foundations are weak, don't worry about going right into solving questions – go learn SQL with this SQL interview tutorial.

This tutorial covers SQL concepts such as LEAD window function and math functions in SQL – both of these come up frequently during SQL job interviews at Cloudflare.

Cloudflare Data Science Interview Tips

What Do Cloudflare Data Science Interviews Cover?

Beyond writing SQL queries, the other topics tested in the Cloudflare Data Science Interview are:

Statistics and Probability Questions
Python or R Programming Questions
Business Sense and Product-Sense Questions
Machine Learning Questions
Behavioral Interview Questions

Cloudflare Data Scientist

How To Prepare for Cloudflare Data Science Interviews?

To prepare for Cloudflare Data Science interviews read the book Ace the Data Science Interview because it's got:

201 interview questions sourced from FAANG, tech startups, and Wall Street
a crash course covering SQL, Product-Sense & ML
over 900+ 5-star reviews on Amazon

10 Cloudflare SQL Interview Questions (Updated 2025)

10 Cloudflare SQL Interview Questions

SQL Question 1: Analyze Traffic Data

Example Input:

Expected output:

Answer:

SQL Question 2: Analyze Traffic Data

Example Input:

Example Input:

Answer:

SQL Question 3: What would you do to optimize a SQL query that was running slow?

SQL Question 4: Calculate the Average Amount of Data Processed Per Day

Example Input:

Example Output:

Answer:

SQL Question 5: What is denormalization?

SQL Question 6: Calculate the Average Data Traffic per Service

Example Input:

Example Output:

Answer:

SQL Question 7: Can you explain the difference between a foreign and primary key in a database?

SQL Question 8: Find Customers using Cloudfare Services in specific cities

Example Input:

Example Input:

Example Output:

Answer:

SQL Question 9: Joining and Analyzing Customer and Product Tables

Example Input:

Example Input:

Answer:

SQL Question 10: What is denormalization, and in what situations might it be a useful?

Preparing For The Cloudflare SQL Interview

Cloudflare Data Science Interview Tips

What Do Cloudflare Data Science Interviews Cover?

How To Prepare for Cloudflare Data Science Interviews?

Career Resources

Support

Interview Questions