logo

10 Uber SQL Interview Questions (Updated 2024)

Updated on

April 16, 2024

Uber manages a fleet of Cassandra database servers which store petabytes of data, and handle millions of queries per second. To manage this database fleet, and extract insights from their data, Uber uses SQL across every business unit.

That's why Uber evaluates jobseekers on SQL coding questions during interviews for Uber Data Analyst, Uber Business Analyst, Data Science, and BI jobs. For example, one Redditor said they applied for a Data Science role, and it was "3 medium level SQL heavily focused on window functions (PARTITION OVER/LEAD/LAG)."

So, if you're studying for a SQL Assessment, we've collected 10 real Uber SQL interview questions to practice, which were asked recently at Uber – how many can you solve?

Uber SQL Interview Questions

10 Uber SQL Interview Questions

SQL Question 1: Uber Rider's 3rd Trip

Assume you are given the table below on Uber transactions made by users. Write a query to obtain the third transaction of every user. Output the user id, spend and transaction date.

Table:
Column NameType
user_idinteger
spenddecimal
transaction_datetimestamp
Example Input:
user_idspendtransaction_date
111100.5001/08/2022 12:00:00
11155.0001/10/2022 12:00:00
12136.0001/18/2022 12:00:00
14524.9901/26/2022 12:00:00
11189.6002/05/2022 12:00:00
Example Output:
user_idspendtransaction_date
11189.6002/05/2022 12:00:00

To practice this question interactively, and run SQL queries to this Uber problem directly in the browser, go here:

Uber SQL Interview Question

Answer:

First, we obtain the order of transaction numbers for each user. We can do this by using the window function where we the all transactions by and the transaction_date.


Here's how the first 5 rows of output looks like:

user_idspendtransaction_daterow_num
111100.5001/08/2022 12:00:001
11155.0001/10/2022 12:00:002
11189.6002/05/2022 12:00:003
12136.0001/18/2022 12:00:001
12122.2004/01/2022 12:00:002

From there on, we can simply convert the query into a subquery and filter for the users' third transaction which is their third transaction sorted by the transaction date (and is denoted as ).


If you aren't familiar with the ROW_NUM window function, see this tutorial, or try to solve the many window function questions on DataLemur:

Window Function Questions on DataLemur

SQL Question 2: Average Duration of Rides Per City

As a data analyst at Uber, you are tasked with analyzing the ride data to answer the following question - what is the average duration of rides for each city categorized by day of the week?

The duration is categorized into time slots: morning (5:00 to 11:59), afternoon (12:00 to 16:59), evening (17:00 to 20:59), and night (21:00 to 4:59).

Example Input:

ride_idcity_idstart_timeend_time
001NY2022-05-01 07:30:002022-05-01 07:45:00
002SF2022-05-01 12:45:002022-05-01 13:15:00
003LA2022-05-01 17:00:002022-05-01 17:25:00
004SF2022-05-02 09:30:002022-05-02 09:45:00
005LA2022-05-02 21:00:002022-05-02 21:20:00
006NY2022-05-03 06:30:002022-05-03 06:50:00
007LA2022-05-03 14:30:002022-05-03 14:55:00
008SF2022-05-03 19:00:002022-05-03 19:30:00

Answer:


In the query above, a CTE (Common Table Expression) is created to add the and data. The SELECT statement uses the window function AVG() over the time_slot, city_id, and day_of_week to calculate the average duration for each category. The ORDER BY clause is used to order the result.

SQL Question 3: Can you explain the meaning of database denormalization?

Denormalization is a technique used to improve the read performance of a database, typically at the expense of some write performance.

By adding redundant copies of data or grouping data together in a way that does not follow normalization rules, denormalization improves the performance and scalability of a database by eliminating costly join operations, which is important for OLAP use cases that are read-heavy and have minimal updates/inserts.

Uber SQL Interview Questions

SQL Question 4: Find the Most Used Vehicle Type by Uber's Customers In the Past Year

Uber has a diverse range of vehicles from bikes, scooters, to premium luxury cars. In order to cater their services better, Uber wants to understand their customers' preference. The task is to write a SQL query that filters out the most used vehicle type by Uber's customers in the past year. To provide a more holistic view, the results should also exclude rides that were cancelled by either the driver or the user.

The database contains two tables: and . The table has a record of all rides taken place and the table provides information about the various types of vehicles offered by Uber.

Example Input:

ride_iduser_idvehicle_type_idstart_timeend_timecancelled
880316102352021-07-01 08:15:002021-07-01 08:45:00false
880326102412021-07-01 09:15:002021-07-01 09:45:00false
880336102522021-07-01 10:15:002021-07-01 10:45:00true
880346102652021-07-01 11:15:002021-07-01 11:45:00false
880356102732021-07-01 12:15:002021-07-01 12:45:00false

Example Input:

type_idvehicle_type
1Bike
2Car
3SUV
4Luxury Car
5Scooter

Answer:


This query joins the table with the table based on , then it filters only completed rides () from the past year. The result is grouped by and the number of total rides for each type is counted. In the end, we order the result in descending order by the number of rides and limit the output to the most used vehicle type.

To solve a similar problem about calculating rates, try this SQL interview question from TikTok within DataLemur's interactive SQL code editor:

TikTok SQL Interview Question

SQL Question 5: What is normalization?

Normalization involves breaking up your tables into smaller, more specialized ones and using primary and foreign keys to define relationships between them. Not only does this make your database more flexible and scalable, it also makes it easier to maintain. Plus, normalization helps to keep your data accurate by reducing the chance of inconsistencies and errors.

The only downside is now is that your queries will involve more joins, which are slow AF and often a DB performance bottleneck.

SQL Question 6: Calculating Conversion Rates for Uber

Assume that you are a data analyst at Uber. The marketing team runs several digital ads attracting users to book a cab. You are required to analyze the click-through conversion rates, i.e., from viewing an ad to booking a cab.

You have two tables - and .

Example input:
click_iduser_idad_idclick_date
101750007/01/2022 00:00:00
102860007/02/2022 00:00:00
103750007/03/2022 00:00:00
104860007/04/2022 00:00:00
105970007/05/2022 00:00:00
Example input:
booking_iduser_idbooking_date
201707/01/2022 00:00:00
202707/03/2022 00:00:00
203907/05/2022 00:00:00
2041007/06/2022 00:00:00

The task is to find the click-through conversion rate for each ad, which is the number of bookings made after viewing an ad, divided by the total number of clicks for that ad (i.e., ).

Answer:

In PostgreSQL, you can use the following query:


The query joins the table with the table on and condition that the ad click occurred before the booking. It then groups by and calculates the conversion rate as the count of bookings divided by the count of clicks. We used the typecast to get fractional results instead of integers.

Please note that this logic assumes no other factors influencing a user's decision to book a cab other than seeing these ads.

To solve a related CTR problem on DataLemur's free interactive SQL code editor, try this SQL interview question asked by Facebook:

SQL interview question asked by Facebook

SQL Question 7: Could you describe a self-join and provide a scenario in which it would be used?

A self-join is a type of join in which a table is joined to itself. To perform a self-join, you need to specify the table name twice in the FROM clause, and give each instance of the table a different alias. You can then join the two instances of the table using a JOIN clause, and use a WHERE clause to specify the relationship between the rows.

Self-joins are the go-to technique for any data analysis that involves pairs of the same thing, like identifying pairs of products that are frequently purchased together like in this Walmart SQL interview question.

For another example, say you were doing an HR analytics project and needed to analyze how much all Uber employees in the same department interact with each other. Here's a self-join query you could use to retrieve all pairs of Uber employees who work in the same department:


This query returns all pairs of Uber employees who work in the same department, and excludes pairs where the employee's id is the same (since this would represent the same Uber employee being paired with themselves).

SQL Question 8: Average ratings per Driver per City

As a data analyst for Uber, you are asked to determine each driver's average ratings for each city. This will help Uber to monitor performance and perhaps highlight any problems that might be arising in any specific city.

We have two tables, and .

Sample Input:
ride_iddriver_idcityfare_amount
101201New York25.50
102202San Francisco18.00
103203Chicago22.75
104201San Francisco30.00
105202New York20.00
Sample Input:
ride_idrating
1014.3
1024.1
1034.8
1044.7
1053.9

Your task is to write a SQL query that will create a summary table with each driver's average ratings for each city.

Example Output:
driver_idcityavg_rating
201New York4.3
202San Francisco4.1
203Chicago4.8
201San Francisco4.7
202New York3.9

Answer:


This query joins the table with table on . It then groups the resulting table by and , applying the AVG function to the column. This gives the average rating for each driver in each city.

SQL Question 9: Retrieve Users with Gmail IDs

As an SQL analyst at Uber, you are assigned to filter out the customers who have registered using their Gmail IDs. You are given a database named 'users'. The records in this table contain multiple email domains. You need to write an SQL query that filters only those records where the 'email' field contains 'gmail.com'.

Example Input:
user_idfull_nameregistration_dateemail
7162John Doe05/04/2019johndoe@gmail.com
7625Jane Smith11/09/2020janesmith@yahoo.com
5273Steve Johnson06/20/2018stevejohnson@gmail.com
6322Emily Davis08/14/2021emilydavis@hotmail.com
4812Olivia Brown09/30/2019oliviabrown@gmail.com
Example Output:
user_idfull_nameregistration_dateemail
7162John Doe05/04/2019johndoe@gmail.com
5273Steve Johnson06/20/2018stevejohnson@gmail.com
4812Olivia Brown09/30/2019oliviabrown@gmail.com

Answer:


This query uses the LIKE operator in SQL with the wildcard character '%' to filter all records where the 'email' field contains 'gmail.com'. The '%' character is a wildcard in SQL that matches any sequence of characters. When used in conjunction with the LIKE operator, it allows you to perform pattern matching on your data. In this case, we're interested in any email addresses that end with 'gmail.com', regardless of what characters precede it.

SQL Question 10: Ride-Sharing Database Design

A Data Engineer interviewing at Uber was asked about the most important tables when it comes to Uber's ride-sharing service. Specifically, they were asked:

  • What tables would you use?
  • How would these tables relate to each other?
  • What columns would you index on?
  • What bottlenecks might occur that affect QPS (queries-per-second)?

Answer:

In an interview, you'll want to draw an entity-relationship-diagram (ERD) similar to this: Uber Database Design Interview

For a more comprehensive solution, checkout this Uber/Lyft System Design which goes more in-depth into this!

How To Prepare for the Uber SQL Interview

The key to acing a Uber SQL interview is to practice, practice, and then practice some more! In addition to solving the above Uber SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like FAANG (FB, Apple, Amazon, Netflix, Google).

DataLemur Questions

Each DataLemur SQL question has multiple hints, detailed solutions and best of all, there's an online SQL coding environment so you can right online code up your SQL query and have it executed.

To prep for the Uber SQL interview it is also a great idea to practice interview questions from other tech companies like:

But if your SQL query skills are weak, forget about going right into solving questions – improve your SQL foundations with this SQL interview tutorial.

SQL tutorial for Data Scientists & Analysts

This tutorial covers things like CASE/WHEN/ELSE statements and turning a subquery into a CTE – both of these show up routinely during Uber SQL assesments.

Uber Data Science Interview Tips

What Do Uber Data Science Interviews Cover?

For the Uber Data Science Interview, in addition to SQL query questions, the other types of questions to practice:

Uber Data Scientist

How To Prepare for Uber Data Science Interviews?

To prepare for Uber Data Science interviews read the book Ace the Data Science Interview because it's got:

  • 201 interview questions sourced from companies like Uber, Lyft, Doordash
  • a refresher on Stats, SQL & ML
  • over 900+ 5-star reviews on Amazon

Acing Data Science Interview

Here's a testimonial from someone who actually read the book and got an Uber Data Science job in 2022: Uber Data Science Interview Testimonial