11 Dropbox SQL Interview Questions (Updated 2025)

(Ex-Facebook & Best-Selling Data Science Author)

Updated on

May 6, 2025

At Dropbox, SQL is used quite frequently for analyzing product usage data to inform feature development. Because of this, Dropbox almost always asks SQL problems in interviews for Data Science and Data Engineering positions.

As such, to help you ace the Dropbox SQL interview, we've collected 11 Dropbox SQL interview questions – can you solve them?

11 Dropbox SQL Interview Questions

SQL Question 1: Analyze Power Users on Dropbox

Given the customer and file activity database for Dropbox, identify the power users who upload files frequently to the platform.

Below are the sample tables with sample data:

example input:

user_id	user_name	signup_date
101	James	2019-04-21
102	Emma	2020-01-19
103	Olivia	2021-03-05
104	John	2019-08-15

example input:

upload_id	user_id	upload_date	file_size (MB)
201	101	2022-06-10	350
202	101	2022-06-12	120
203	102	2022-06-14	500
204	103	2022-06-16	150
205	101	2022-06-18	320

We are interested in users who have uploaded more than 1000 MB of files in the past month.

Answer:

This query first calculates the total upload size for each user in the past month in the subquery. It then joins this data with the table to get the user names, and finally filters for users who have an uploaded cumulative file size of more than 1000 MB.

To solve a related customer analytics question on DataLemur's free interactive SQL code editor, try this recently asked Microsoft SQL interview question:

SQL Question 2: Monthly Average Rating Per Product

For this question, you work at Dropbox and you're interested in analyzing user feedback. The database stores reviews which users submit for various products, where products are identified by their product_id. In the table, each review has a number of stars (0-5) - this is how users rate products. Also, each review has a submit_date (timestamp), user_id, and review_id.

Write a SQL query that calculates the average rating (), for each product (), for each month. The submit_date () field should be used to determine the month. The results should be ordered by year-month in ascending order, and then product_id in ascending order.

Example Input:

review_id	user_id	submit_date	product_id	stars
6171	123	06-08-2022 00:00:00	50001	4
7802	265	06-10-2022 00:00:00	69852	4
5293	362	06-18-2022 00:00:00	50001	3
6352	192	07-26-2022 00:00:00	69852	3
4517	981	07-05-2022 00:00:00	69852	2

Example Output:

mth	product	avg_stars
2022-06	50001	3.50
2022-06	69852	4.00
2022-07	69852	2.50

Answer:

This query groups the reviews by month and product_id and calculates the average stars for each group. It uses the TO_CHAR function to format the submit_date in the form 'YYYY-MM', which represents the year and month. Ordering is done first by month in ascending order and then by product_id in ascending order. The AVG function calculates the average stars for each group.

Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur

SQL Question 3: How does the LEAD() function differ from the LAG() function?

{#Question-3}

Both the and window functions are used to access a row at a specific offset from the current row.

However, the function retrieves a value from a row that follows the current row, whereas the function retrieves a value from a row that precedes the current row.

Often, the offset for both functions is 1, which gives access to the immediately following/preceding row. Here's a SQL query example:

Dropbox SQL Interview Questions

SQL Question 4: Dropbox File Sharing Management

You are tasked to design a simple SQL database for Dropbox. Dropbox is a hosting service where users can store and share files and folders. They can also track their share activities. For simplification, consider there's only one type of user and there are two tables - and .

The table has columns (unique identifier for users), (email of the user), and (when the user joined Dropbox).

The table has columns (unique identifier for the action), (the user who shared the file, related to in table), (the user with whom the file was shared, related to in table), (unique identifier for the file), (size of the file in MB), and (when the file was shared).

Write a SQL query that returns the email of the user who has shared the most files and the total number of files they have shared.

Example Input:

user_id	email	date_joined
1	john@gmail.com	04/03/2015
2	sarah@yahoo.com	10/21/2017
3	emma@hotmail.com	09/12/2020

Example Input:

activity_id	sender_id	receiver_id	file_id	file_size	date_shared
101	1	2	5001	13.5	08/30/2021
102	1	3	5002	30.2	09/05/2021
103	2	1	5003	7.8	09/12/2021
104	1	2	5004	16.7	09/15/2021
105	3	1	5005	24.5	09/18/2021

Answer:

This query joins the and the table on the and columns to relate the users with their share activities. It then groups the resulting records by and counts the number of records per user with . The result is ordered in descending order () and limited to the top result () to get the email of the user who has shared the most files and the total number of files they have shared.

SQL Question 5: Could you provide a list of the join types in SQL and explain what each one does?

In SQL, a join is used to combine rows from different tables based on a shared key or set of keys, resulting in a single merged table of data.

There are four distinct types of JOINs: , , , and .

(INNER) JOIN: Retrieves records that have matching values in both tables involved in the join.

LEFT (OUTER) JOIN: Retrieves all the records/rows from the left and the matched records/rows from the right table.

RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the matched records/rows from the left table.

FULL (OUTER) JOIN: Retrieves all the records where there is a match in either the left or right table.

SQL Question 6: Filter Active Dropbox Users

Your task is to help Dropbox identify active users in its database. An active user is defined as a user who has uploaded at least one file within the last 30 days and has a premium account. Filter out these users from the given and tables.

Example Input:

user_id	username	email	account_type
1001	john_doe	john_doe@gmail.com	Premium
1002	mary_smith	mary_smith@yahoo.com	Free
1003	sam_brown	sam_brown@hotmail.com	Premium
1004	jane_green	jane_green@gmail.com	Premium
1005	tom_black	tom_black@yahoo.com	Free

Example Input:

file_id	user_id	upload_date
3001	1001	07/20/2022
3002	1002	07/28/2022
3003	1003	06/20/2022
3004	1004	07/10/2022
3005	1001	07/28/2022

Answer:

This PostgreSQL query joins the and tables on to associate each user with their respective files. Then, it uses a WHERE clause with AND to apply the given conditions. It filters for users who are of 'Premium' account type and who have at least one file uploaded within the last 30 days.

SQL Question 7: What are some ways you can identify duplicates in a table?

One way to find duplicatesis to use a clause and then use to find groups

You could also use the operator:

SQL Question 8: Calculate The Average Size Of The Files Stored Within Dropbox

As an employee of Dropbox, we are frequently tasked with analyzing the usage patterns of our users. In this scenario, you are given a 'files' table. Each row represents a file stored in Dropbox, which includes the name of the file, size of the file (in KB) and the user_id of the user who stored the file.

Can you write a PostgreSQL query to determine the average size of the files stored on Dropbox?

Example Input:

file_id	user_id	file_name	size_kb
1	101	"test.txt"	50
2	101	"rock.mp3"	7000
3	102	"image.png"	500
4	103	"time.csv"	10
5	104	"something.mp4"	40000

Example Output:

average_size_kb
9432

Answer:

This PostgreSQL query calculates the average size of the files stored in Dropbox. This result can then be utilized to understand how much storage each file typically consumes. The AVG function calculates the average value of a particular column. Here, it's being applied to the size_kb column of the files table.

To practice a very similar question try this interactive Alibaba Compressed Mean Question which is similar for calculating the mean value or this Facebook Average Review Ratings Question which is similar for determining the average.

SQL Question 9: Analyzing Click-Through-Rates in Dropbox

Dropbox heavily focuses on digital marketing and conversion to paid customers. Let's assume each user starts with a free account and then can click ads or upgrade links to a paid product page. Imagine you're given two tables, one for every time a user views an ad or upgrade link () and one for every time a user has actually clicked (). Calculate the click-through-rate (CTR) for each ad.

Example Input

view_id	user_id	view_date	ad_id
101	57	06/08/2022 00:00:00	202
102	36	06/10/2022 00:00:00	500
103	57	06/18/2022 00:00:00	202
104	42	07/26/2022 00:00:00	300
105	57	07/05/2022 00:00:00	202

Example Input:

click_id	user_id	click_date	ad_id
6201	57	06/08/2022 00:00:00	202
7942	36	06/10/2022 00:00:00	500
7523	57	06/18/2022 00:00:00	202

Answer:

This query starts by joining the and table on the user_id and ad_id columns. It then groups by the ad_id and counts the number of unique views and clicks for each ad. The COUNT DISTINCT is used to ensure that multiple views or clicks from the same user on the same ad are only counted once. The click-through-rate (CTR) is then calculated by dividing the number of clicks by the number of views. The NULLIF function is used to prevent division by zero errors.

To practice a similar problem about calculating rates, solve this TikTok SQL question within DataLemur's online SQL coding environment:

SQL Question 10: What is database denormalization, and when is it a good idea to consider it?

Database denormalization is when you add redundancy to a database, and break typical normalization rules (specified by 1st, 2nd, 3rd normal forms). There's a few reasons to denormalize a database:

Improved performance: Denormalization often reduces the the number of costly join operations that are needed to retrieve data. This is helpful when the database is being used for OLAP (Online Analytical Processing) use cases, as joins can be expensive and slow.
Simplification: Denormalization can also be used to simplify the design of a database by reducing the number of tables and relationships that need to be managed. This can make it easier to understand and maintain the database.
Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.

While denormalization can be a useful tool for improving performance and scalability, it's important to keep in mind that it can make update and delete operations more complex. This is because denormalization can create data duplicates, which can make it harder to maintain data integrity. So, to avoid any headaches, it's a good idea to start with a well-normalized database design and then consider denormalization only if it's absolutely necessary for your specific performance and scalability needs. In other words, denormalization can be a bit of a wild card, so it's best to handle it with care!

SQL Question 11: Calculate the Average Storage Used Per User

As a data analyst at Dropbox, you are given the task to analyze user behavior. Given the and tables shown below, can you write a SQL query that calculates the average storage used per user each month?

Example Input:

user_id	username	registration_date
1	userA	2021-02-01
2	userB	2021-05-01
3	userC	2021-09-01

Example Input:

usage_id	user_id	usage_date	storage_used
101	1	2021-07-10	50
102	1	2021-07-20	55
103	2	2021-07-15	60
104	2	2021-07-25	75
105	3	2021-07-30	85

Your output should have the format:

Example:

month	year	avg_storage
7	2021	70

Answer:

This query extracts the month and year from the and groups by these extracted values to calculate the average storage used per month. The function is used to convert the date into a string in the specified format for grouping and the function is used for average calculation. The result is then ordered by year and month.

Preparing For The Dropbox SQL Interview

The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Beyond just solving the earlier Dropbox SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Amazon, Microsoft, Meta, and smaller tech companies.

Each problem on DataLemur has multiple hints, fully explained answers along with a discussion board to see how others solved it and best of all, there's an interactive coding environment so you can instantly run your SQL query answer and have it graded.

To prep for the Dropbox SQL interview it is also helpful to solve SQL problems from other tech companies like:

However, if your SQL foundations are weak, don't worry about diving straight into solving questions – improve your SQL foundations with this SQL interview tutorial.

This tutorial covers SQL concepts such as using ORDER BY and LEAD/LAG – both of which come up routinely during SQL job interviews at Dropbox.

Dropbox Data Science Interview Tips

What Do Dropbox Data Science Interviews Cover?

In addition to SQL interview questions, the other topics to prepare for the Dropbox Data Science Interview are:

Probability & Stats Questions
Coding Questions in Python or R
Product-Sense and Business Case Questions
Machine Learning and Predictive Modeling Questions
Behavioral Interview Questions

Dropbox Data Scientist

How To Prepare for Dropbox Data Science Interviews?

The best way to prepare for Dropbox Data Science interviews is by reading Ace the Data Science Interview. The book's got:

201 Interview Questions from FAANG (FB, Apple, Amazon, Netflix, Google)
A Refresher on Product Analytics, SQL & ML
Great Reviews (900+ 5-star reviews on Amazon)

11 Dropbox SQL Interview Questions (Updated 2025)

11 Dropbox SQL Interview Questions

SQL Question 1: Analyze Power Users on Dropbox

example input:

example input:

Answer:

SQL Question 2: Monthly Average Rating Per Product

Example Input:

Example Output:

Answer:

SQL Question 3: How does the LEAD() function differ from the LAG() function?

SQL Question 4: Dropbox File Sharing Management

Example Input:

Example Input:

Answer:

SQL Question 5: Could you provide a list of the join types in SQL and explain what each one does?

SQL Question 6: Filter Active Dropbox Users

Example Input:

Example Input:

Answer:

SQL Question 7: What are some ways you can identify duplicates in a table?

SQL Question 8: Calculate The Average Size Of The Files Stored Within Dropbox

Example Input:

Example Output:

Answer:

SQL Question 9: Analyzing Click-Through-Rates in Dropbox

Example Input

Example Input:

Answer:

SQL Question 10: What is database denormalization, and when is it a good idea to consider it?

SQL Question 11: Calculate the Average Storage Used Per User

Example Input:

Example Input:

Example:

Answer:

Preparing For The Dropbox SQL Interview

Dropbox Data Science Interview Tips

What Do Dropbox Data Science Interviews Cover?

How To Prepare for Dropbox Data Science Interviews?

Career Resources

Support

Interview Questions