Back to questions
This is the same question as problem #11 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the table below on Uber transactions made by users. Write a query to obtain the third transaction of every user. Output the user id, spend and transaction date.
Column Name | Type |
---|---|
user_id | integer |
spend | decimal |
transaction_date | timestamp |
user_id | spend | transaction_date |
---|---|---|
111 | 100.50 | 01/08/2022 12:00:00 |
111 | 55.00 | 01/10/2022 12:00:00 |
121 | 36.00 | 01/18/2022 12:00:00 |
145 | 24.99 | 01/26/2022 12:00:00 |
111 | 89.60 | 02/05/2022 12:00:00 |
user_id | spend | transaction_date |
---|---|---|
111 | 89.60 | 02/05/2022 12:00:00 |
The dataset you are querying against may have different input & output - this is just an example!
p.s. for more Uber SQL interview tips & problems, check out the Uber SQL Interview Guide
First, we obtain the order of transaction numbers for each user. We can do this by using the window function where we the all transactions by and the transaction_date.
Here's how the first 5 rows of output looks like:
user_id | spend | transaction_date | row_num |
---|---|---|---|
111 | 100.50 | 01/08/2022 12:00:00 | 1 |
111 | 55.00 | 01/10/2022 12:00:00 | 2 |
111 | 89.60 | 02/05/2022 12:00:00 | 3 |
121 | 36.00 | 01/18/2022 12:00:00 | 1 |
121 | 22.20 | 04/01/2022 12:00:00 | 2 |
From there on, we can simply convert the query into a subquery and filter for the users' third transaction which is their third transaction sorted by the transaction date (and is denoted as ).
Results:
user_id | spend | transaction_date |
---|---|---|
111 | 89.60 | 02/05/2022 12:00:00 |
121 | 67.90 | 04/03/2022 12:00:00 |
Apart from using subquery to solve this question, you can also use a CTE. Do you know the differences between a subquery and a CTE?
A CTE is a temporary data set to be used as part of a query and it exists during the entire query session. A subquery is a nested query. It’s a query within a query and unlike CTE, it can be used within that query only. Read here and here for more understanding.
Both methods give the same output and perform fairly similarly. Differences are CTE is reusable during the entire session and more readable, whereas subquery can be used in FROM and WHERE clauses and can act as a column with a single value. We share more resources here (1, 2, 3) on their use cases.
Solution #1: Using Subquery
Solution #2: Using CTE