At DigitalOcean, SQL is used day-to-day for analyzing customer behavior patterns to improve computing & infra services. For this reason DigitalOcean usually evaluates jobseekers on SQL questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.
As such, to help you study for the DigitalOcean SQL interview, this blog covers 11 DigitalOcean SQL interview questions – can you answer each one?
Given the tables and , write a query to identify the 'power users' for DigitalOcean. For this question, define a power user as one who has done more than 100 activities on more than three unique days within the past 30 days.
Please refer to the following sample data:
user_id | username | signup_date |
---|---|---|
1 | user1 | 2021-07-01 |
2 | user2 | 2021-06-10 |
3 | user3 | 2021-05-27 |
4 | user4 | 2021-04-20 |
log_id | user_id | activity_date | activity_type |
---|---|---|---|
101 | 1 | 2022-06-01 | created droplet |
102 | 1 | 2022-06-04 | deleted droplet |
103 | 2 | 2022-06-02 | created snapshot |
104 | 2 | 2022-06-02 | created droplet |
106 | 3 | 2022-06-01 | created droplet |
107 | 3 | 2022-06-01 | deleted droplet |
108 | 4 | 2022-06-03 | upgraded droplet |
110 | 4 | 2022-06-04 | downgraded droplet |
Your PostgreSQL query could look like this:
This query first selects all user activity counts per day for each user who has been active in the past 30 days. It then filters this list for users who have logged more than 100 activities in one day. From this filtered list, it counts unique days of activity and keeps only those users with more than three active days. Finally, it returns the user information from the original users table for these power users.
To practice another SQL customer analytics question where you can code right in the browser and have your SQL solution automatically checked, try this Walmart SQL Interview Question:
As an engineer at DigitalOcean, you are given access to the "user_actions" table which tracks every action (sign up, create a droplet, delete a droplet, etc) performed by a user on a daily basis.
The "user_actions" table is formatted as below:
action_id | user_id | action_type | action_date | droplet_id |
---|---|---|---|---|
1001 | 101 | Sign Up | 2022-06-01 | NULL |
1002 | 101 | Create Droplet | 2022-06-02 | 5000 |
1003 | 101 | Delete Droplet | 2022-06-07 | 5000 |
1004 | 102 | Sign Up | 2022-06-03 | NULL |
1005 | 102 | Create Droplet | 2022-06-05 | 6000 |
1006 | 102 | Delete Droplet | 2022-07-01 | 6000 |
Your task is to write a SQL query that returns the average daily actions per user for a given month. Include only users who performed at least one action during the specified month. Exclude the user's sign up action in your calculation.
Expected output for June 2022:
user_id | average_daily_actions |
---|---|
101 | 1 |
102 | 0.5 |
This query first creates a CTE that aggregates the action counts per user per day, excluding the day the user signed up. Then, the outer query takes the average daily actions per user for the given month.
Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur
In SQL, a join is used to combine rows from different tables based on a shared key or set of keys, resulting in a single merged table of data.
There are four distinct types of JOINs: , , , and .
(INNER) JOIN: Retrieves records that have matching values in both tables involved in the join.
LEFT (OUTER) JOIN: Retrieves all the records/rows from the left and the matched records/rows from the right table.
RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the matched records/rows from the left table.
FULL (OUTER) JOIN: Retrieves all the records where there is a match in either the left or right table.
DigitalOcean offers cloud services to their customers, which are typically businesses and software developers. Suppose you're tasked with designing a new PostgreSQL database to optimize data retrieval performance specifically, for tracking what services each customer is using and how much they're being billed for it.
Here are the requirements:
How would you design this database schema?
For the sake of this question, let's consider that right now we have the following tables:
customer_id | name | sign_up_date |
---|---|---|
1 | Alice | 2020-06-20 |
2 | Bob | 2021-10-07 |
3 | Charlie | 2022-01-13 |
service_id | name | monthly_cost |
---|---|---|
1 | Droplet | 5.00 |
2 | Spaces | 2.00 |
3 | Kubernetes | 10.00 |
customer_id | service_id | start_date |
---|---|---|
1 | 1 | 2020-06-20 |
2 | 1 | 2021-11-25 |
1 | 3 | 2022-02-15 |
2 | 3 | 2022-05-20 |
3 | 2 | 2022-02-05 |
To get the total cost each customer has incurred since they started using each service, you could use the following PostgreSQL query:
This query first joins the table with the and tables. It then calculates how long each customer has been using each service in months using the function. Finally, it calculates the total cost each customer has incurred for each service by multiplying the with the duration in months. It sorts the result in descending order by .
A foreign key is a field in a database table that links to the primary key of another table, establishing a connection between the two tables. For example, let's analyze DigitalOcean's Google Ads campaigns data:
:
+------------+------------+------------+------------+ | ad_id | campaign_id| keyword | click_count| +------------+------------+------------+------------+ | 1 | 201 | DigitalOcean reviews | 120 | | 2 | 202 | DigitalOcean pricing | 150 | | 3 | 101 | buy DigitalOcean | 65 | | 4 | 101 | DigitalOcean alternatives | 135 | +------------+------------+------------+------------+
is a foreign key that connects to the of the corresponding Google Ads campaign. This establishes a relationship between the ads and their campaigns, enabling easy querying to find which ads belong to a specific campaign or which campaigns a specific ad belongs to.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to link each ad to its ad group and the Google Ads account that the campaigns belong to, respectively.
Our company, DigitalOcean, provides various cloud infrastructure resources to customers, like droplets (virtual private servers), databases, storage, etc. We maintain a database of all customer transactions (subscriptions and cancellations).
Write an SQL query that retrieves all active customers in June 2022. Active customers are defined as those who have 'subscribed' to at least one service, without cancelling it before the end of June 2022.
transaction_id | customer_id | transaction_date | service_id | transaction_type |
---|---|---|---|---|
1155 | 2022 | 05/25/2022 00:00:00 | A101 | subscribe |
1729 | 2024 | 05/29/2022 00:00:00 | B202 | subscribe |
2058 | 2022 | 07/01/2022 00:00:00 | A101 | cancel |
3114 | 2025 | 06/05/2022 00:00:00 | C303 | subscribe |
4591 | 2025 | 07/02/2022 00:00:00 | C303 | cancel |
The SQL query would look something like this for a PostgreSQL database:
This SQL query filters for subscription transactions that occur in June and for which there are no corresponding 'cancel' transactions in the same month. We make use of the 'NOT EXISTS' subquery logic to filter these out. The customers and services returned by this query are considered 'active' for June.
To find records in one table that aren't in another, you can use a and check for values in the right-side table.
Here is an example using two tables, DigitalOcean employees and DigitalOcean managers:
This will return all rows from DigitalOcean employees where there is no matching row in managers based on the column.
You can also use the operator in PostgreSQL and Microsoft SQL Server to return the records that are in the first table but not in the second. Here is an example:
This will retrieve all rows from employees that do not appear in managers. The operator works by retreivingthe rows that are returned by the first query, but not by the second.
Please note that is not supported by all DBMS systems, such as MySQL and Oracle (however, you can use the operator to achieve a similar outcome).
As a cloud infrastructure provider, DigitalOcean might be interested in knowing the average resource usage (like CPU, Memory utilization) of their servers. Given a table storing hourly logs of resource usage of various servers, can you write a SQL query to find the average CPU and Memory usage across all hours for each server?
log_id | server_id | log_time | cpu_usage | memory_usage |
---|---|---|---|---|
1 | A | 08/20/2022 08:00:00 | 10 | 20 |
2 | A | 08/20/2022 09:00:00 | 12 | 26 |
3 | B | 08/20/2022 08:00:00 | 15 | 35 |
4 | B | 08/20/2022 09:00:00 | 14 | 30 |
5 | A | 08/20/2022 10:00:00 | 11 | 25 |
server | avg_cpu_usage | avg_memory_usage |
---|---|---|
A | 11.00 | 23.67 |
B | 14.50 | 32.50 |
We would need to group by the server_id while we are calculating the AVG for each column (cpu_usage and memory_usage).
In this solution, The SQL AVG() function is used to return the average value of an expression. Here we are finding the average of cpu_usage and memory_usage for each server. The GROUP BY clause gathers all of the rows together that contain data in the specified column(s) and will allow aggregate functions to be applied to the one or more columns.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for requiring calculations involving servers' running time or this Microsoft Supercloud Customer Question which is similar for focusing on customers' utilization of cloud services.
DigitalOcean provides cloud services that developers use to scale and deploy applications that run simultaneously on multiple computers. A major resource they offer is called a "droplet", which refers to a scalable virtual private server.
The question is: "For each customer, what is the average number of droplets they used per month throughout 2021?"
Product_id corresponds to a specific droplet.
usage_id | user_id | start_date | end_date | product_id |
---|---|---|---|---|
6171 | 123 | 06/01/2021 00:00:00 | 06/30/2021 00:00:00 | 50001 |
7802 | 265 | 06/01/2021 00:00:00 | 06/15/2021 00:00:00 | 69852 |
5293 | 123 | 06/18/2021 00:00:00 | 07/03/2021 00:00:00 | 50001 |
6352 | 192 | 07/26/2021 00:00:00 | 07/31/2021 00:00:00 | 69852 |
4517 | 123 | 07/05/2021 00:00:00 | 07/30/2021 00:00:00 | 69852 |
month | user_id | avg_droplets |
---|---|---|
6 | 123 | 2.00 |
6 | 265 | 1.00 |
7 | 123 | 2.00 |
7 | 192 | 1.00 |
This query first extracts the month from the start date and then, for each user per month, counts the number of usage entries, which corresponds to the number of droplets used by the user. It gives us a subquery with count_product reflecting how many droplets each user used every month. Finally, the outer query calculates the average number of droplets used by every user in each month.
The / operator is used to remove to return all rows from the first SELECT statement that are not returned by the second SELECT statement.
Note that is available in PostgreSQL and SQL Server, while MINUS is available in MySQL and Oracle (but don't stress about knowing which DBMS supports what exact commands since the interviewers at DigitalOcean should be lenient!).
Here's a PostgreSQL example of using EXCEPT to find all of DigitalOcean's Facebook video ads with more than 10k views that aren't also being run on YouTube:
If you want to retain duplicates, you can use the EXCEPT ALL operator instead of EXCEPT. The EXCEPT ALL operator will return all rows, including duplicates.
Given information about the usage of the DigitalOcean services by customers, such as the products (services) they use, the amount of 'bandwidth' they consume, and the dates of such usage, construct a PostgreSQL query to fetch the monthly average consumption for each product for the year 2022.
The table is structured as follows:
customer_id | customer_name | customer_email |
---|---|---|
100 | Alice | alice@example.com |
101 | Bob | bob@example.com |
102 | Charlie | charlie@example.com |
103 | Dave | dave@example.com |
The table is structured as follows:
product_id | product_name |
---|---|
1 | Droplet |
2 | Spaces |
3 | Kubernetes |
4 | Databases |
The table is structured as follows:
customer_id | product_id | usage_date | bandwidth |
---|---|---|---|
100 | 1 | 01-22-2022 | 50 |
100 | 1 | 01-25-2022 | 60 |
101 | 2 | 02-13-2022 | 70 |
102 | 3 | 02-15-2022 | 80 |
103 | 4 | 03-21-2022 | 90 |
This SQL query uses an inner join to combine relevant data from the and tables. The DATE_PART/EXTRACT function extracts the month and year from to group the data by month for the year 2022. The AVG function is then used to calculate the average bandwidth consumed for each product per month. The result is sorted by month and average bandwidth in descending order to promptly display products with higher usage.
Because join questions come up routinely during SQL interviews, practice this interactive Snapchat SQL Interview question using JOINS:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews.
Beyond just solving the above DigitalOcean SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Facebook, Google, and VC-backed startups.
Each exercise has hints to guide you, full answers and best of all, there is an online SQL coding environment so you can instantly run your query and have it graded.
To prep for the DigitalOcean SQL interview you can also be a great idea to practice SQL problems from other tech companies like:
But if your SQL skills are weak, don't worry about going right into solving questions – strengthen your SQL foundations with this DataLemur SQL tutorial.
This tutorial covers topics including working with string/text data and how window functions work – both of which pop up frequently in DigitalOcean SQL interviews.
Beyond writing SQL queries, the other types of questions tested in the DigitalOcean Data Science Interview are:
The best way to prepare for DigitalOcean Data Science interviews is by reading Ace the Data Science Interview. The book's got: