logo

11 DigitalOcean SQL Interview Questions (Updated 2024)

Updated on

March 3, 2024

At DigitalOcean, SQL is used day-to-day for analyzing customer behavior patterns to improve computing & infra services. For this reason DigitalOcean usually evaluates jobseekers on SQL questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.

As such, to help you study for the DigitalOcean SQL interview, this blog covers 11 DigitalOcean SQL interview questions – can you answer each one?

11 DigitalOcean SQL Interview Questions

SQL Question 1: Identify Power Users from DigitalOcean's User Logs

Given the tables and , write a query to identify the 'power users' for DigitalOcean. For this question, define a power user as one who has done more than 100 activities on more than three unique days within the past 30 days.

Please refer to the following sample data:

Example Input:
user_idusernamesignup_date
1user12021-07-01
2user22021-06-10
3user32021-05-27
4user42021-04-20
Example Input:
log_iduser_idactivity_dateactivity_type
10112022-06-01created droplet
10212022-06-04deleted droplet
10322022-06-02created snapshot
10422022-06-02created droplet
10632022-06-01created droplet
10732022-06-01deleted droplet
10842022-06-03upgraded droplet
11042022-06-04downgraded droplet

Answer:

Your PostgreSQL query could look like this:


This query first selects all user activity counts per day for each user who has been active in the past 30 days. It then filters this list for users who have logged more than 100 activities in one day. From this filtered list, it counts unique days of activity and keeps only those users with more than three active days. Finally, it returns the user information from the original users table for these power users.

To practice another SQL customer analytics question where you can code right in the browser and have your SQL solution automatically checked, try this Walmart SQL Interview Question: Walmart Labs SQL Interview Question

SQL Question 2: Analyze User Behavior with Window Functions

As an engineer at DigitalOcean, you are given access to the "user_actions" table which tracks every action (sign up, create a droplet, delete a droplet, etc) performed by a user on a daily basis.

The "user_actions" table is formatted as below:

Example Input:
action_iduser_idaction_typeaction_datedroplet_id
1001101Sign Up2022-06-01NULL
1002101Create Droplet2022-06-025000
1003101Delete Droplet2022-06-075000
1004102Sign Up2022-06-03NULL
1005102Create Droplet2022-06-056000
1006102Delete Droplet2022-07-016000

Your task is to write a SQL query that returns the average daily actions per user for a given month. Include only users who performed at least one action during the specified month. Exclude the user's sign up action in your calculation.

Expected output for June 2022:

Example Output:
user_idaverage_daily_actions
1011
1020.5

Answer:


This query first creates a CTE that aggregates the action counts per user per day, excluding the day the user signed up. Then, the outer query takes the average daily actions per user for the given month.

Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur

SQL Interview Questions on DataLemur

SQL Question 3: What are the various types of joins used in SQL?

In SQL, a join is used to combine rows from different tables based on a shared key or set of keys, resulting in a single merged table of data.

There are four distinct types of JOINs: , , , and .

(INNER) JOIN: Retrieves records that have matching values in both tables involved in the join.


LEFT (OUTER) JOIN: Retrieves all the records/rows from the left and the matched records/rows from the right table.


RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the matched records/rows from the left table.


FULL (OUTER) JOIN: Retrieves all the records where there is a match in either the left or right table.


DigitalOcean SQL Interview Questions

SQL Question 4: Database Design for DigitalOcean's Customer and Services data

DigitalOcean offers cloud services to their customers, which are typically businesses and software developers. Suppose you're tasked with designing a new PostgreSQL database to optimize data retrieval performance specifically, for tracking what services each customer is using and how much they're being billed for it.

Here are the requirements:

  • Each customer can have multiple services.
  • Each service can be used by multiple customers.
  • For each customer and service, we need to track the date the service was started and the monthly cost.

How would you design this database schema?

For the sake of this question, let's consider that right now we have the following tables:

customer_idnamesign_up_date
1Alice2020-06-20
2Bob2021-10-07
3Charlie2022-01-13
service_idnamemonthly_cost
1Droplet5.00
2Spaces2.00
3Kubernetes10.00
customer_idservice_idstart_date
112020-06-20
212021-11-25
132022-02-15
232022-05-20
322022-02-05

Answer:

To get the total cost each customer has incurred since they started using each service, you could use the following PostgreSQL query:


This query first joins the table with the and tables. It then calculates how long each customer has been using each service in months using the function. Finally, it calculates the total cost each customer has incurred for each service by multiplying the with the duration in months. It sorts the result in descending order by .

SQL Question 5: In database design, what do foreign keys do?

A foreign key is a field in a database table that links to the primary key of another table, establishing a connection between the two tables. For example, let's analyze DigitalOcean's Google Ads campaigns data:

:

+------------+------------+------------+------------+ | ad_id | campaign_id| keyword | click_count| +------------+------------+------------+------------+ | 1 | 201 | DigitalOcean reviews | 120 | | 2 | 202 | DigitalOcean pricing | 150 | | 3 | 101 | buy DigitalOcean | 65 | | 4 | 101 | DigitalOcean alternatives | 135 | +------------+------------+------------+------------+

is a foreign key that connects to the of the corresponding Google Ads campaign. This establishes a relationship between the ads and their campaigns, enabling easy querying to find which ads belong to a specific campaign or which campaigns a specific ad belongs to.

The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to link each ad to its ad group and the Google Ads account that the campaigns belong to, respectively.

SQL Question 6: Fetch Active Customers of DigitalOcean Services

Our company, DigitalOcean, provides various cloud infrastructure resources to customers, like droplets (virtual private servers), databases, storage, etc. We maintain a database of all customer transactions (subscriptions and cancellations).

Write an SQL query that retrieves all active customers in June 2022. Active customers are defined as those who have 'subscribed' to at least one service, without cancelling it before the end of June 2022.

Example Input:

transaction_idcustomer_idtransaction_dateservice_idtransaction_type
1155202205/25/2022 00:00:00A101subscribe
1729202405/29/2022 00:00:00B202subscribe
2058202207/01/2022 00:00:00A101cancel
3114202506/05/2022 00:00:00C303subscribe
4591202507/02/2022 00:00:00C303cancel

Answer:

The SQL query would look something like this for a PostgreSQL database:


This SQL query filters for subscription transactions that occur in June and for which there are no corresponding 'cancel' transactions in the same month. We make use of the 'NOT EXISTS' subquery logic to filter these out. The customers and services returned by this query are considered 'active' for June.

SQL Question 7: How do you determine which records in one table are not present in a second table?

To find records in one table that aren't in another, you can use a and check for values in the right-side table.

Here is an example using two tables, DigitalOcean employees and DigitalOcean managers:


This will return all rows from DigitalOcean employees where there is no matching row in managers based on the column.

You can also use the operator in PostgreSQL and Microsoft SQL Server to return the records that are in the first table but not in the second. Here is an example:


This will retrieve all rows from employees that do not appear in managers. The operator works by retreivingthe rows that are returned by the first query, but not by the second.

Please note that is not supported by all DBMS systems, such as MySQL and Oracle (however, you can use the operator to achieve a similar outcome).

SQL Question 8: Calculate the average resource usage for individual servers

As a cloud infrastructure provider, DigitalOcean might be interested in knowing the average resource usage (like CPU, Memory utilization) of their servers. Given a table storing hourly logs of resource usage of various servers, can you write a SQL query to find the average CPU and Memory usage across all hours for each server?

Example Input:
log_idserver_idlog_timecpu_usagememory_usage
1A08/20/2022 08:00:001020
2A08/20/2022 09:00:001226
3B08/20/2022 08:00:001535
4B08/20/2022 09:00:001430
5A08/20/2022 10:00:001125
Example Output:
serveravg_cpu_usageavg_memory_usage
A11.0023.67
B14.5032.50

Answer:

We would need to group by the server_id while we are calculating the AVG for each column (cpu_usage and memory_usage).


In this solution, The SQL AVG() function is used to return the average value of an expression. Here we are finding the average of cpu_usage and memory_usage for each server. The GROUP BY clause gathers all of the rows together that contain data in the specified column(s) and will allow aggregate functions to be applied to the one or more columns.

To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for requiring calculations involving servers' running time or this Microsoft Supercloud Customer Question which is similar for focusing on customers' utilization of cloud services.

SQL Question 9: Average Usage of Droplets per Customer in a Month

DigitalOcean provides cloud services that developers use to scale and deploy applications that run simultaneously on multiple computers. A major resource they offer is called a "droplet", which refers to a scalable virtual private server.

The question is: "For each customer, what is the average number of droplets they used per month throughout 2021?"

Example Input:

Product_id corresponds to a specific droplet.

usage_iduser_idstart_dateend_dateproduct_id
617112306/01/2021 00:00:0006/30/2021 00:00:0050001
780226506/01/2021 00:00:0006/15/2021 00:00:0069852
529312306/18/2021 00:00:0007/03/2021 00:00:0050001
635219207/26/2021 00:00:0007/31/2021 00:00:0069852
451712307/05/2021 00:00:0007/30/2021 00:00:0069852

Example Output:

monthuser_idavg_droplets
61232.00
62651.00
71232.00
71921.00

Answer:


This query first extracts the month from the start date and then, for each user per month, counts the number of usage entries, which corresponds to the number of droplets used by the user. It gives us a subquery with count_product reflecting how many droplets each user used every month. Finally, the outer query calculates the average number of droplets used by every user in each month.

SQL Question 10: What does the operator do?

The / operator is used to remove to return all rows from the first SELECT statement that are not returned by the second SELECT statement.

Note that is available in PostgreSQL and SQL Server, while MINUS is available in MySQL and Oracle (but don't stress about knowing which DBMS supports what exact commands since the interviewers at DigitalOcean should be lenient!).

Here's a PostgreSQL example of using EXCEPT to find all of DigitalOcean's Facebook video ads with more than 10k views that aren't also being run on YouTube:


If you want to retain duplicates, you can use the EXCEPT ALL operator instead of EXCEPT. The EXCEPT ALL operator will return all rows, including duplicates.

SQL Question 11: Aggregate Average Consumption Data

Given information about the usage of the DigitalOcean services by customers, such as the products (services) they use, the amount of 'bandwidth' they consume, and the dates of such usage, construct a PostgreSQL query to fetch the monthly average consumption for each product for the year 2022.

The table is structured as follows:

Example Input:
customer_idcustomer_namecustomer_email
100Alicealice@example.com
101Bobbob@example.com
102Charliecharlie@example.com
103Davedave@example.com

The table is structured as follows:

Example Input:
product_idproduct_name
1Droplet
2Spaces
3Kubernetes
4Databases

The table is structured as follows:

Example Input:
customer_idproduct_idusage_datebandwidth
100101-22-202250
100101-25-202260
101202-13-202270
102302-15-202280
103403-21-202290

Answer:


This SQL query uses an inner join to combine relevant data from the and tables. The DATE_PART/EXTRACT function extracts the month and year from to group the data by month for the year 2022. The AVG function is then used to calculate the average bandwidth consumed for each product per month. The result is sorted by month and average bandwidth in descending order to promptly display products with higher usage.

Because join questions come up routinely during SQL interviews, practice this interactive Snapchat SQL Interview question using JOINS: Snapchat JOIN SQL interview question

DigitalOcean SQL Interview Tips

The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Beyond just solving the above DigitalOcean SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Facebook, Google, and VC-backed startups. DataLemur SQL Interview Questions

Each exercise has hints to guide you, full answers and best of all, there is an online SQL coding environment so you can instantly run your query and have it graded.

To prep for the DigitalOcean SQL interview you can also be a great idea to practice SQL problems from other tech companies like:

But if your SQL skills are weak, don't worry about going right into solving questions – strengthen your SQL foundations with this DataLemur SQL tutorial.

Free SQL tutorial

This tutorial covers topics including working with string/text data and how window functions work – both of which pop up frequently in DigitalOcean SQL interviews.

DigitalOcean Data Science Interview Tips

What Do DigitalOcean Data Science Interviews Cover?

Beyond writing SQL queries, the other types of questions tested in the DigitalOcean Data Science Interview are:

DigitalOcean Data Scientist

How To Prepare for DigitalOcean Data Science Interviews?

The best way to prepare for DigitalOcean Data Science interviews is by reading Ace the Data Science Interview. The book's got:

  • 201 Interview Questions from FAANG tech companies
  • A Crash Course on Python, SQL & ML
  • Amazing Reviews (900+ 5-star reviews on Amazon)

Ace the Data Science Interview by Nick Singh Kevin Huo