Back to questions
This is the same question as problem #13 in the SQL Chapter of Ace the Data Science Interview!
Assume you're given a table on Walmart user transactions. Based on their most recent transaction date, write a query that retrieve the users along with the number of products they bought.
Output the user's most recent transaction date, user ID, and the number of products, sorted in chronological order by the transaction date.
Column Name | Type |
---|---|
product_id | integer |
user_id | integer |
spend | decimal |
transaction_date | timestamp |
product_id | user_id | spend | transaction_date |
---|---|---|---|
3673 | 123 | 68.90 | 07/08/2022 12:00:00 |
9623 | 123 | 274.10 | 07/08/2022 12:00:00 |
1467 | 115 | 19.90 | 07/08/2022 12:00:00 |
2513 | 159 | 25.00 | 07/08/2022 12:00:00 |
1452 | 159 | 74.50 | 07/10/2022 12:00:00 |
transaction_date | user_id | purchase_count |
---|---|---|
07/08/2022 12:00:00 | 115 | 1 |
07/08/2022 12:00:000 | 123 | 2 |
07/10/2022 12:00:00 | 159 | 1 |
The dataset you are querying against may have different input & output - this is just an example!
To determine the latest transaction date for each user, we use the window function.
By partitioning the data by user ID and ordering it by transaction date in descending order, we assign a rank to each transaction date. The rank 1 represents the latest transaction date.
Next, we filter the records to include only the transactions with the highest rank (rank 1). This ensures that we retain only the latest transaction for each user.
In this step, we select the transaction date, user ID, and product ID from the filtered records. Additionally, we include a count of the product IDs to determine the number of products purchased in each transaction.