11 Dropbox SQL Interview Questions (Updated 2024)

Updated on

March 1, 2024

At Dropbox, SQL is used quite frequently for analyzing product usage data to inform feature development. Because of this, Dropbox almost always asks SQL problems in interviews for Data Science and Data Engineering positions.

As such, to help you ace the Dropbox SQL interview, we've collected 11 Dropbox SQL interview questions – can you solve them?

11 Dropbox SQL Interview Questions

SQL Question 1: Analyze Power Users on Dropbox

Given the customer and file activity database for Dropbox, identify the power users who upload files frequently to the platform.

Below are the sample tables with sample data:

example input:
user_iduser_namesignup_date
101James2019-04-21
102Emma2020-01-19
103Olivia2021-03-05
104John2019-08-15
example input:
upload_iduser_idupload_datefile_size (MB)
2011012022-06-10350
2021012022-06-12120
2031022022-06-14500
2041032022-06-16150
2051012022-06-18320

We are interested in users who have uploaded more than 1000 MB of files in the past month.

Answer:


This query first calculates the total upload size for each user in the past month in the subquery. It then joins this data with the table to get the user names, and finally filters for users who have an uploaded cumulative file size of more than 1000 MB.

To solve a related customer analytics question on DataLemur's free interactive SQL code editor, try this recently asked Microsoft SQL interview question: Microsoft SQL Interview Question: Teams Super User

SQL Question 2: Monthly Average Rating Per Product

For this question, you work at Dropbox and you're interested in analyzing user feedback. The database stores reviews which users submit for various products, where products are identified by their product_id. In the table, each review has a number of stars (0-5) - this is how users rate products. Also, each review has a submit_date (timestamp), user_id, and review_id.

Write a SQL query that calculates the average rating (), for each product (), for each month. The submit_date () field should be used to determine the month. The results should be ordered by year-month in ascending order, and then product_id in ascending order.

Example Input:
review_iduser_idsubmit_dateproduct_idstars
617112306-08-2022 00:00:00500014
780226506-10-2022 00:00:00698524
529336206-18-2022 00:00:00500013
635219207-26-2022 00:00:00698523
451798107-05-2022 00:00:00698522
Example Output:
mthproductavg_stars
2022-06500013.50
2022-06698524.00
2022-07698522.50

Answer:


This query groups the reviews by month and product_id and calculates the average stars for each group. It uses the TO_CHAR function to format the submit_date in the form 'YYYY-MM', which represents the year and month. Ordering is done first by month in ascending order and then by product_id in ascending order. The AVG function calculates the average stars for each group.

Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur

DataLemur Window Function SQL Questions

SQL Question 3: How does the LEAD() function differ from the LAG() function?

{#Question-3}

Both the and window functions are used to access a row at a specific offset from the current row.

However, the function retrieves a value from a row that follows the current row, whereas the function retrieves a value from a row that precedes the current row.

Often, the offset for both functions is 1, which gives access to the immediately following/preceding row. Here's a SQL query example:


Dropbox SQL Interview Questions

SQL Question 4: Dropbox File Sharing Management

You are tasked to design a simple SQL database for Dropbox. Dropbox is a hosting service where users can store and share files and folders. They can also track their share activities. For simplification, consider there's only one type of user and there are two tables - and .

The table has columns (unique identifier for users), (email of the user), and (when the user joined Dropbox).

The table has columns (unique identifier for the action), (the user who shared the file, related to in table), (the user with whom the file was shared, related to in table), (unique identifier for the file), (size of the file in MB), and (when the file was shared).

Write a SQL query that returns the email of the user who has shared the most files and the total number of files they have shared.

Example Input:
user_idemaildate_joined
1john@gmail.com04/03/2015
2sarah@yahoo.com10/21/2017
3emma@hotmail.com09/12/2020
Example Input:
activity_idsender_idreceiver_idfile_idfile_sizedate_shared
10112500113.508/30/2021
10213500230.209/05/2021
1032150037.809/12/2021
10412500416.709/15/2021
10531500524.509/18/2021

Answer:


This query joins the and the table on the and columns to relate the users with their share activities. It then groups the resulting records by and counts the number of records per user with . The result is ordered in descending order () and limited to the top result () to get the email of the user who has shared the most files and the total number of files they have shared.

SQL Question 5: Could you provide a list of the join types in SQL and explain what each one does?

In SQL, a join is used to combine rows from different tables based on a shared key or set of keys, resulting in a single merged table of data.

There are four distinct types of JOINs: , , , and .

(INNER) JOIN: Retrieves records that have matching values in both tables involved in the join.


LEFT (OUTER) JOIN: Retrieves all the records/rows from the left and the matched records/rows from the right table.


RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the matched records/rows from the left table.


FULL (OUTER) JOIN: Retrieves all the records where there is a match in either the left or right table.


SQL Question 6: Filter Active Dropbox Users

Your task is to help Dropbox identify active users in its database. An active user is defined as a user who has uploaded at least one file within the last 30 days and has a premium account. Filter out these users from the given and tables.

Example Input:
user_idusernameemailaccount_type
1001john_doejohn_doe@gmail.comPremium
1002mary_smithmary_smith@yahoo.comFree
1003sam_brownsam_brown@hotmail.comPremium
1004jane_greenjane_green@gmail.comPremium
1005tom_blacktom_black@yahoo.comFree
Example Input:
file_iduser_idupload_date
3001100107/20/2022
3002100207/28/2022
3003100306/20/2022
3004100407/10/2022
3005100107/28/2022

Answer:


This PostgreSQL query joins the and tables on to associate each user with their respective files. Then, it uses a WHERE clause with AND to apply the given conditions. It filters for users who are of 'Premium' account type and who have at least one file uploaded within the last 30 days.

SQL Question 7: What are some ways you can identify duplicates in a table?

One way to find duplicatesis to use a clause and then use to find groups


You could also use the operator:


SQL Question 8: Calculate The Average Size Of The Files Stored Within Dropbox

As an employee of Dropbox, we are frequently tasked with analyzing the usage patterns of our users. In this scenario, you are given a 'files' table. Each row represents a file stored in Dropbox, which includes the name of the file, size of the file (in KB) and the user_id of the user who stored the file.

Can you write a PostgreSQL query to determine the average size of the files stored on Dropbox?

Example Input:
file_iduser_idfile_namesize_kb
1101"test.txt"50
2101"rock.mp3"7000
3102"image.png"500
4103"time.csv"10
5104"something.mp4"40000
Example Output:
average_size_kb
9432

Answer:


This PostgreSQL query calculates the average size of the files stored in Dropbox. This result can then be utilized to understand how much storage each file typically consumes. The AVG function calculates the average value of a particular column. Here, it's being applied to the size_kb column of the files table.

To practice a very similar question try this interactive Alibaba Compressed Mean Question which is similar for calculating the mean value or this Facebook Average Review Ratings Question which is similar for determining the average.

SQL Question 9: Analyzing Click-Through-Rates in Dropbox

Dropbox heavily focuses on digital marketing and conversion to paid customers. Let's assume each user starts with a free account and then can click ads or upgrade links to a paid product page. Imagine you're given two tables, one for every time a user views an ad or upgrade link () and one for every time a user has actually clicked (). Calculate the click-through-rate (CTR) for each ad.

Example Input
view_iduser_idview_datead_id
1015706/08/2022 00:00:00202
1023606/10/2022 00:00:00500
1035706/18/2022 00:00:00202
1044207/26/2022 00:00:00300
1055707/05/2022 00:00:00202
Example Input:
click_iduser_idclick_datead_id
62015706/08/2022 00:00:00202
79423606/10/2022 00:00:00500
75235706/18/2022 00:00:00202

Answer:


This query starts by joining the and table on the user_id and ad_id columns. It then groups by the ad_id and counts the number of unique views and clicks for each ad. The COUNT DISTINCT is used to ensure that multiple views or clicks from the same user on the same ad are only counted once. The click-through-rate (CTR) is then calculated by dividing the number of clicks by the number of views. The NULLIF function is used to prevent division by zero errors.

To practice a similar problem about calculating rates, solve this TikTok SQL question within DataLemur's online SQL coding environment: TikTok SQL Interview Question

SQL Question 10: What is database denormalization, and when is it a good idea to consider it?

Database denormalization is when you add redundancy to a database, and break typical normalization rules (specified by 1st, 2nd, 3rd normal forms). There's a few reasons to denormalize a database:

  • Improved performance: Denormalization often reduces the the number of costly join operations that are needed to retrieve data. This is helpful when the database is being used for OLAP (Online Analytical Processing) use cases, as joins can be expensive and slow.

  • Simplification: Denormalization can also be used to simplify the design of a database by reducing the number of tables and relationships that need to be managed. This can make it easier to understand and maintain the database.

  • Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.

While denormalization can be a useful tool for improving performance and scalability, it's important to keep in mind that it can make update and delete operations more complex. This is because denormalization can create data duplicates, which can make it harder to maintain data integrity. So, to avoid any headaches, it's a good idea to start with a well-normalized database design and then consider denormalization only if it's absolutely necessary for your specific performance and scalability needs. In other words, denormalization can be a bit of a wild card, so it's best to handle it with care!

SQL Question 11: Calculate the Average Storage Used Per User

As a data analyst at Dropbox, you are given the task to analyze user behavior. Given the and tables shown below, can you write a SQL query that calculates the average storage used per user each month?

Example Input:
user_idusernameregistration_date
1userA2021-02-01
2userB2021-05-01
3userC2021-09-01
Example Input:
usage_iduser_idusage_datestorage_used
10112021-07-1050
10212021-07-2055
10322021-07-1560
10422021-07-2575
10532021-07-3085

Your output should have the format:

Example:
monthyearavg_storage
7202170

Answer:


This query extracts the month and year from the and groups by these extracted values to calculate the average storage used per month. The function is used to convert the date into a string in the specified format for grouping and the function is used for average calculation. The result is then ordered by year and month.

Preparing For The Dropbox SQL Interview

The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Beyond just solving the earlier Dropbox SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Amazon, Microsoft, Meta, and smaller tech companies. DataLemur SQL Interview Questions

Each problem on DataLemur has multiple hints, fully explained answers along with a discussion board to see how others solved it and best of all, there's an interactive coding environment so you can instantly run your SQL query answer and have it graded.

To prep for the Dropbox SQL interview it is also helpful to solve SQL problems from other tech companies like:

However, if your SQL foundations are weak, don't worry about diving straight into solving questions – improve your SQL foundations with this SQL interview tutorial.

Interactive SQL tutorial

This tutorial covers SQL concepts such as using ORDER BY and LEAD/LAG – both of which come up routinely during SQL job interviews at Dropbox.

Dropbox Data Science Interview Tips

What Do Dropbox Data Science Interviews Cover?

In addition to SQL interview questions, the other topics to prepare for the Dropbox Data Science Interview are:

Dropbox Data Scientist

How To Prepare for Dropbox Data Science Interviews?

The best way to prepare for Dropbox Data Science interviews is by reading Ace the Data Science Interview. The book's got:

  • 201 Interview Questions from FAANG (FB, Apple, Amazon, Netflix, Google)
  • A Refresher on Product Analytics, SQL & ML
  • Great Reviews (900+ 5-star reviews on Amazon)

Acing Data Science Interview

© 2024 DataLemur, Inc

Career Resources

Free 9-Day Data Interview Crash CourseFree SQL Tutorial for Data AnalyticsSQL Interview Cheat Sheet PDFUltimate SQL Interview GuideAce the Data Job Hunt Video CourseAce the Data Science InterviewBest Books for Data Analysts