At Fastly, SQL is crucial for analyzing vast network traffic data for insights and managing databases for real-time edge computing services. For this reason Fastly almost always evaluates jobseekers on SQL problems in interviews for Data Science, Data Engineering and Data Analytics jobs.
So, to help you prepare for the Fastly SQL interview, we've collected 10 Fastly SQL interview questions – how many can you solve?
Fastly is a cloud computing services provider. They have a wide variety of customers using their services ranging from individuals to large enterprises. In order to better understand and service their most valuable customers, they need an analysis of their "VIP" users. These VIP users are those who are using the most bandwidth.
Define a VIP user as one who consumes more than 1TB (Terabyte) of bandwidth per month on average. Write a SQL query to identify these VIP users, and provide a list of these users sorted by the total bandwidth they consumed over the last 12 months.
usage_id | user_id | month_year | bandwidth_used_TB |
---|---|---|---|
101 | 1 | 2021-08 | 500 |
102 | 2 | 2021-08 | 1200 |
103 | 3 | 2021-08 | 1000 |
104 | 1 | 2021-09 | 1500 |
105 | 2 | 2021-09 | 1100 |
106 | 3 | 2021-09 | 900 |
107 | 1 | 2021-10 | 600 |
108 | 2 | 2021-10 | 2500 |
109 | 3 | 2021-10 | 1000 |
user_id | total_used_last_12_months |
---|---|
2 | 4800 |
1 | 3600 |
3 | 2900 |
Here's a PostgreSQL query to solve the problem:
In this query, we first filter the rows that are within last 12 months. We then aggregate the rows by and calculate the total bandwidth used within the last 12 months by each user. We use the clause to retain only users who on average use more than 1TB bandwidth per month. Finally, we sort the resulting users based on the total bandwidth they used within the last 12 months.
To practice a similar customer analytics SQL question where you can solve it interactively and have your SQL solution instantly graded, try this Walmart Labs SQL Interview Question:
Fastly is a content delivery network (CDN) that helps businesses handle high-traffic websites. One common analytical question might be to understand the bandwidth usage of different clients to aid in capacity planning and pricing adjustment.
Suppose Fastly has a database table tracking client bandwidth usage. Each row records the total bytes a client transferred in a specific month.
Consider the following table:
client_id | month_year | total_bytes |
---|---|---|
123 | 2022-01 | 50000 |
123 | 2022-02 | 60000 |
265 | 2022-01 | 70000 |
265 | 2022-02 | 80000 |
123 | 2022-03 | 40000 |
As an interviewer, I would like to ask you to write a SQL query to calculate the average monthly bandwidth usage for each client, ranked by using the window function.
The result should look something like this:
client_id | average_monthly_bandwidth |
---|---|
123 | 50000.00 |
265 | 75000.00 |
The SQL query for this problem might look like this:
This query uses the aggregate function together with the clause to create a window function partitioned by . This means that for each row in the table, the function is computed over the group of rows having the same . Finally, the result is sorted by .
p.s. Window functions show up pretty frequently during SQL interviews, so practice the 27+ window function questions on DataLemur
The COALESCE() function can be used to replace NULL values with a specified value. For instance, if you are a Data Analyst at a company and are working on a customer analytics project to identify the most engaged customers, you may have access to the following data in the table:
customer_id | email_engagement | sms_engagement |
---|---|---|
101 | moderately_active | not_opted_in |
201 | un-subscribed | NULL |
301 | NULL | not_opted_in |
401 | not_active | very_active |
501 | very_active | very_active |
601 | NULL | NULL |
Before you could procede with your analysis, you would need to remove the NULLs, and replace them with the default value for email engagement (not_active), and the default sms_engagement level (not_opted_in).
To do this, you'd run the following query:
This would get you the following output:
customer_id | email_engagement | sms_engagement |
---|---|---|
101 | moderately_active | not_opted_in |
102 | un-subscribed | not_opted_in |
103 | not_active | not_opted_in |
104 | not_active | very_active |
105 | very_active | very_active |
106 | not_active | not_opted_in |
Fastly is a cloud computing services provider and one of the main services they sell is content delivery network (CDN). Assume your client is a company and they want to assess their daily traffic through Fastly's CDN service for their main website.
You have been tasked with writing a SQL query to calculate the average number of unique daily users on the company website for each month. The data is stored in a table called . Each row represents one unique user's visit on a particular day.
traffic_id | visit_date | user_id |
---|---|---|
20001 | 2022-06-01 | 3345 |
20002 | 2022-06-01 | 4321 |
20003 | 2022-06-01 | 5564 |
20004 | 2022-06-02 | 5564 |
20005 | 2022-06-02 | 3345 |
20006 | 2022-06-02 | 7667 |
20007 | 2022-07-01 | 3245 |
20008 | 2022-07-01 | 5564 |
20009 | 2022-07-02 | 3345 |
20010 | 2022-07-02 | 4344 |
month | average_daily_users |
---|---|
6 | 2.00 |
7 | 2.5 |
In this query we first create a subquery to get number of unique users each day. The function is used to get distinct users in a day and this data is grouped by . Then in the outer query we calculate the average daily users by month using function and .
To practice a very similar question try this interactive Facebook Active User Retention Question which is similar for calculating unique active users or this Facebook App Click-through Rate (CTR) Question which is similar for assessing daily traffic.
Joins in SQL allow you to combine data from different tables based on a shared key or set of keys.
Four JOIN types are available in SQL. For an example of each one, say you had sales data exported from Fastly's Salesforce CRM stored in a PostgreSQL database, and had access to two tables: and .
Fastly is a company that provides CDN, streaming and other security services. They want to track the clickthrough rate of its emails regarding a new product update. The clickthrough rate is defined as the number of unique users who clicked the link in the email (clicked = 'yes') and visited the product update page divided by the total number of unique users who received the email (delivered = 'yes'). Can you write a query to calculate this rate?
The 'email_logs' table contains each instance of a user receiving an email. A unique user is identified by their 'user_id' and all fields are assumed to be not null.
user_id | email_id | delivered | clicked |
---|---|---|---|
4782 | 10678 | 'yes' | 'no' |
7533 | 10679 | 'yes' | 'yes' |
5411 | 10680 | 'yes' | 'no' |
2638 | 10681 | 'yes' | 'yes' |
9375 | 10682 | 'yes' | 'yes' |
1892 | 10683 | 'yes' | 'no' |
This query first counts the distinct number of users who clicked the email link and visited the product update page. It then counts the total number of users who received the email. The two counts are divided to calculate the click-through rate and the result is converted to percentage format by multiplying by 100. The result is finally rounded to two decimal places for presentation.
To solve a related SQL interview question on DataLemur's free online SQL code editor, attempt this SQL interview question asked by Facebook:
There's several steps you can take to troubleshoot a slow SQL query.
First things first, figure out why the query is slow! You can use and commands in PostgreSQL to identify any performance bottlenecks. After that, you can start changing your query, depending on what the source of the query performance issue is.
Generally, indexes can help speed up queries. Also de-normalizing your tables might help, to remove slow joins.
Lastly, you could always just upgrade your hardware! Time and money wasted improving query performance could just be better spent on mroe powerful database servers!
While this is a very surface-level approach to optimizing SQL query performance, it should do the trick for Data Analyst interviews and Data Science interviews at Fastly. Data Engineers should know a bit more about the and before the interview.
As an engineer at Fastly, a company that provides cloud computing services to businesses, you are asked to find the maximum bandwidth usage per customer in a specific month for understanding the load on their services.
Use the below schema and data for the problem.
usage_id | customer_id | time_stamp | bandwidth_in_GB |
---|---|---|---|
01 | 123 | 2022-06-08 00:00:00 | 23.5 |
02 | 265 | 2022-06-10 00:00:00 | 30.3 |
03 | 123 | 2022-06-18 00:00:00 | 28.1 |
04 | 265 | 2022-07-26 00:00:00 | 22.2 |
05 | 123 | 2022-07-05 00:00:00 | 20.5 |
month | customer | max_bandwidth |
---|---|---|
6 | 123 | 28.1 |
6 | 265 | 30.3 |
7 | 123 | 20.5 |
7 | 265 | 22.2 |
Here is a PostgreSQL query that will provide you the solution.
In the query, we extracted the month from and grouped the data by and the month. We then found the max bandwidth for each group sorted it by and month.
As a part of Fastly analytics team, you have been tasked to find out all the customers whose 'email' contains the string 'fastly' from the 'Customers' table. This will help in identifying customers that may have created their email ids with the brand name in it and can be potential brand advocates.
customer_id | first_name | last_name | |
---|---|---|---|
123 | John | Smith | johnsmith@gmail.com |
265 | Sarah | Doe | sarahdoe@fastlymail.com |
362 | Brian | Taylor | briantaylor@hotmail.com |
192 | Kelly | Lee | kelly.fastly.lee@yahoo.com |
981 | Andrew | Brown | andrewbrown@gmail.ca |
customer_id | first_name | last_name | |
---|---|---|---|
265 | Sarah | Doe | sarahdoe@fastlymail.com |
192 | Kelly | Lee | kelly.fastly.lee@yahoo.com |
Using PostgreSQL, you can use the keyword to filter records. The query will look like this:
This PostgreSQL query would return all records from the Customers table where the email field contains the string 'fastly'. The '%' on either side of 'fastly' in the SQL query is a wildcard that matches zero or more characters. This means it will match email addresses where 'fastly' appears anywhere in the string.
Normal forms are guidelines that are used to help design a relational database in a way that minimizes redundancy and ensures the integrity of the data. The 3 most commonly use normal forms are the 1st, 2nd, and 3rd normal forms. Here's a brief explanation of each:
1st Normal Form (1NF) is all about keeping it simple - each column should only have one value and there should be no repeating groups of data.
2nd Normal Form (2NF) is about organization - your database should already be in 1NF and all the non-key columns should depend on the primary key. This means that each non-key column should be completely dependent on the entire primary key, not just part of it.
3rd Normal Form (3NF) is about independence - if your database is already in 2NF, then all the non-key columns should not depend on each other. They should be self-sufficient and not rely on other non-key columns.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the above Fastly SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like FAANG and tech startups.
Each interview question has hints to guide you, detailed solutions and most importantly, there's an online SQL code editor so you can right online code up your SQL query and have it graded.
To prep for the Fastly SQL interview it is also a great idea to practice interview questions from other tech companies like:
But if your SQL coding skills are weak, don't worry about going right into solving questions – improve your SQL foundations with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers SQL topics like UNION vs. joins and filtering data with WHERE – both of these pop up frequently in Fastly SQL assessments.
Beyond writing SQL queries, the other question categories tested in the Fastly Data Science Interview are:
The best way to prepare for Fastly Data Science interviews is by reading Ace the Data Science Interview. The book's got: