At Akamai, SQL is used across the company for analyzing traffic patterns across Akamai's distributed network, for querying of their vast web logs dataset for anomaly detection, and in supporting their Managed Databases product. That's why Akamai frequently asks SQL problems during interviews for Data Science and Data Engineering positions.
So, to help you study for the Akamai SQL interview, here’s 9 Akamai SQL interview questions – able to answer them all?
Akamai is a leading content delivery and cloud services provider. In this exercise, we have access to two tables from Akamai's database. The first table records basic information about customers, and the second table records every single interaction each customer has with Akamai's systems.
Your task is to write a SQL query to identify the top 5 customers who generated the most web traffic in the last month. For this exercise, we'll determine a customer's "activity" level by the number of entries their ID has in the table for the last month.
Positions at Akamai can involve analyzing customer engagement and identifying key players or "power users". Being able to create a list of these important customers can aid in targeted marketing or special customer care programs.
customer_id | name | sign_up_date |
---|---|---|
123 | ABC Corp | 2022-01-01 00:00:00 |
456 | DEF Inc | 2022-03-05 00:00:00 |
789 | GHI Ltd | 2022-05-10 00:00:00 |
321 | JKL Corp | 2022-07-15 00:00:00 |
654 | MNO Inc | 2022-08-20 00:00:00 |
traffic_id | customer_id | date |
---|---|---|
111 | 123 | 2022-09-01 00:00:00 |
222 | 123 | 2022-09-05 00:00:00 |
333 | 456 | 2022-09-10 00:00:00 |
444 | 123 | 2022-09-15 00:00:00 |
555 | 789 | 2022-09-20 00:00:00 |
This query first creates a Common Table Expression ( clause) to count the number of interactions each customer has made in the last month. Then it joins this CTE with the table to get the names of the customers, and sorts by to get the top 5 most active customers.
Remember, gives the first day of the current month at 00:00:00, so we subtract an interval of '1 month' to get the same date-time but for the previous month.
To practice a related super-user data analysis question on DataLemur's free online SQL coding environment, try this Microsoft Teams Power User SQL Interview Question:
Akamai Technologies is a Content Delivery Network (CDN) and cloud service provider. They handle huge amounts of web and video traffic for their clients. As a data analyst at Akamai, let's assume that you have been given the task to analyze the traffic volume handled by each service over different months. You need to write a SQL query to find out the total traffic (in GB) handled by each service per month.
Please note, traffic is calculated per request and we need to aggregate it per service per month.
request_id | service_id | request_date | traffic_volume_gb |
---|---|---|---|
101 | 1 | 06/08/2022 | 1.2 |
102 | 2 | 06/10/2022 | 2.5 |
103 | 1 | 06/18/2022 | 0.6 |
104 | 2 | 07/26/2022 | 1.8 |
105 | 1 | 07/05/2022 | 1.4 |
month | service_id | total_traffic_gb |
---|---|---|
6 | 1 | 1.80 |
6 | 2 | 2.50 |
7 | 1 | 1.40 |
7 | 2 | 1.80 |
To keep it simple we are only considering data for 2 months.
This SQL query first extracts the month from the request_date and then uses the SUM window function to calculate the total traffic per service per month. The PARTITION BY clause is used to split the data into partitions based on the service_id and month. The results are then ordered by month and service_id.
To practice a related window function SQL problem on DataLemur's free online SQL coding environment, try this Google SQL Interview Question:
For all practical purposes, and do NOT produce the same results.
While both are similar in that they combine two tables, you can think of joins as increasing the width of the resulting table (you'll have more columns in the result set for a left/inner/right join), whereas a union is used to combine rows which increases the height of the result set but keeps the column count the same.
If your familiar with set operations, you can think of a as set addition, whereas a is set multiplication.
Akamai is one of the leading Content Delivery Network (CDN) and cloud service providers. Besides providing services to deliver and accelerate content, another critical aspect of their operation is to monitor network traffic and service response times. For this question, we'll consider a simple scenario where Akamai wants to analyse the traffic of various client requests and the response time they are experiencing.
Imagine Akamai has two tables:
Design a SQL query that computes the volume of data served for each client each month, along with the average response time.
client_id | client_name |
---|---|
1 | ClientA |
2 | ClientB |
3 | ClientC |
data_id | client_id | date_time | file_size_MB | response_time_ms |
---|---|---|---|---|
1 | 1 | 02/01/2022 00:00:00 | 35 | 200 |
2 | 1 | 02/01/2022 01:00:00 | 50 | 250 |
3 | 2 | 02/01/2022 00:00:00 | 40 | 150 |
4 | 3 | 02/01/2022 00:00:00 | 45 | 180 |
5 | 1 | 03/01/2022 00:00:00 | 30 | 210 |
month | client_name | total_data_MB | avg_response_time_ms |
---|---|---|---|
2 | ClientA | 85 | 225 |
2 | ClientB | 40 | 150 |
2 | ClientC | 45 | 180 |
3 | ClientA | 30 | 210 |
To achieve this, we first need to join the 'clients' and the 'website_data' tables on the 'client_id'.
Next, group by month and client name, then sum the 'file_size_MB' and take the mean of 'response_time_ms' for each group.
This query provides a monthly summary of the total data served per client, along with the average response time, which can be valuable for performance monitoring and identifying areas for improvement.
To explain the difference between a primary key and foreign key, let's start with some data from Akamai's marketing analytics database which stores data from Google Ads campaigns:
:
+------------+------------+------------+------------+ | ad_id | campaign_id| keyword | click_count| +------------+------------+------------+------------+ | 1 | 100 | Akamai pricing | 10 | | 2 | 100 | Akamai reviews | 15 | | 3 | 101 | Akamai alternatives | 7 | | 4 | 101 | buy Akamai | 12 | +------------+------------+------------+------------+
In this table, could be a primary key. It is unique for each row in the table and cannot contain null values.
could be a foreign key. It references the of the Google Ads campaign that each ad belongs to, establishing a relationship between the ads and their campaigns. This foreign key allows you to easily query the table to find out which ads belong to a specific campaign, or to find out which campaigns a specific ad belongs to.
It is also possible for a table to have multiple foreign keys that reference different primary keys in other tables. For example, the table could have additional foreign keys for the of the ad group that each ad belongs to, and the of the Google Ads account that the campaigns belong to.
"What is the average bandwidth consumed each week by each customer using Akamai's CDN solution?"
Akamai is well known for its globally-distributed content delivery network (CDN), so this type of question could be relevant for a position related to their core business.
Here are some possible sample tables relating to the aforementioned problem:
usage_id | customer_id | week_end_date | total_bandwidth_gb |
---|---|---|---|
101 | 123 | 06/05/2022 00:00:00 | 50 |
102 | 123 | 06/12/2022 00:00:00 | 75 |
103 | 123 | 06/19/2022 00:00:00 | 60 |
104 | 456 | 06/05/2022 00:00:00 | 25 |
105 | 456 | 06/12/2022 00:00:00 | 40 |
106 | 789 | 06/19/2022 00:00:00 | 80 |
107 | 789 | 06/26/2022 00:00:00 | 65 |
customer_id | average_bandwidth_gb |
---|---|
123 | 61.67 |
456 | 32.50 |
789 | 72.50 |
This query computes the average bandwidth used per customer over all weeks recorded. To specify a particular week or range of weeks, you could modify the and clauses accordingly. For example, to compute the average over only the week ending June 19th, you could use:
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating total usage or this Microsoft Supercloud Customer Question which is similar for tracking customer usage.
The / operator is used to remove to return all rows from the first SELECT statement that are not returned by the second SELECT statement.
Note that is available in PostgreSQL and SQL Server, while MINUS is available in MySQL and Oracle (but don't stress about knowing which DBMS supports what exact commands since the interviewers at Akamai should be lenient!).
Here's a PostgreSQL example of using EXCEPT to find all of Akamai's Facebook video ads with more than 10k views that aren't also being run on YouTube:
If you want to retain duplicates, you can use the EXCEPT ALL operator instead of EXCEPT. The EXCEPT ALL operator will return all rows, including duplicates.
As a network services provider, Akamai deals with large volumes of data involving network usage patterns. Assume you have a table that captures the usage of network bandwidth every 10 minutes for various servers across the globe. You need to compute the average usage of network bandwidth for each server for each day, rounded to two decimal places. The daily usage is computed as the absolute difference between the maximum and minimum usage for that day.
Also, you need to compute the square root of the maximum daily usage for each server and provide it along with the daily averages.
entry_id | server_id | entry_time | bandwidth_usage_mb |
---|---|---|---|
2001 | S1 | 2022-07-06 10:00:00 | 580.26 |
2002 | S2 | 2022-07-06 10:00:00 | 720.15 |
2003 | S1 | 2022-07-06 10:10:00 | 635.35 |
2004 | S1 | 2022-07-06 10:20:00 | 720.58 |
2005 | S2 | 2022-07-06 10:20:00 | 880.26 |
2006 | S1 | 2022-07-07 10:00:00 | 700.34 |
2007 | S2 | 2022-07-07 10:10:00 | 890.57 |
day | server_id | avg_usage | sqrt_max_usage |
---|---|---|---|
2022-07-06 | S1 | 70.16 | 26.85 |
2022-07-06 | S2 | 80.05 | 29.67 |
2022-07-07 | S1 | 0.00 | 26.46 |
2022-07-07 | S2 | 0.00 | 29.84 |
The clause is used here to categorize data by day and server_id. The bandwidth_usage statistics for each server each day are calculated using aggregate functions like MAX, MIN, AVG, ABS and SQRT to perform mathematical operations.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating total usage of a resource or this Twitter Tweets' Rolling Averages Question which is similar for calculating rolling averages over time.
As a content delivery network and cloud service provider, Akamai handles a large amount of web traffic data. Imagine you are a data engineer at Akamai and you're working with timestamp data from the table. Your task? Write a SQL query to calculate the average number of daily requests per service for each month.
log_id | timestamp | service_id | requests |
---|---|---|---|
1 | 2022-01-01 00:00:01 | 100 | 4000 |
2 | 2022-01-02 12:01:00 | 100 | 2500 |
3 | 2022-01-15 10:04:00 | 200 | 2000 |
4 | 2022-01-20 08:30:01 | 100 | 6000 |
5 | 2022-02-12 14:20:10 | 200 | 5000 |
6 | 2022-02-30 03:00:00 | 100 | 4500 |
7 | 2022-02-15 06:15:10 | 200 | 3000 |
Expected Output:
month | service_id | avg_daily_requests |
---|---|---|
1 | 100 | 4166.67 |
1 | 200 | 5000 |
2 | 100 | 4500 |
2 | 200 | 2666.67 |
Here's a PostgreSQL query that could be used to answer that question:
This query does the following:
Since there's no WHERE clause, the query will return data for all months and services in the table.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Akamai SQL interview is to solve as many practice SQL interview questions as you can! In addition to solving the earlier Akamai SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Google, Uber, and Microsoft.
Each SQL question has hints to guide you, detailed solutions and crucially, there is an online SQL coding environment so you can right online code up your query and have it checked.
To prep for the Akamai SQL interview you can also be useful to solve SQL problems from other tech companies like:
However, if your SQL foundations are weak, forget about diving straight into solving questions – improve your SQL foundations with this free SQL for Data Analytics course.
This tutorial covers SQL topics like SQL joins with practice exercises and joining a table to itself – both of which come up frequently during SQL job interviews at Akamai.
Besides SQL interview questions, the other types of problems to practice for the Akamai Data Science Interview are:
The best way to prepare for Akamai Data Science interviews is by reading Ace the Data Science Interview. The book's got: