Back to questions
This is the same question as problem #9 in the SQL Chapter of Ace the Data Science Interview!
Assume you're given a table containing Etsy user transactions. Write a query that retrieves the customers whose first transaction was valued at $50 or more. Output the total number of users who meet this criteria.
Instructions:
Effective June 14th, 2023, the solution and hints have been revised.
| Column Name | Type |
|---|---|
| transaction_id | integer |
| user_id | integer |
| spend | decimal |
| transaction_date | timestamp |
| transaction_id | user_id | spend | transaction_date |
|---|---|---|---|
| 759274 | 111 | 49.50 | 02/03/2022 00:00:00 |
| 850371 | 111 | 51.00 | 03/15/2022 00:00:00 |
| 615348 | 145 | 36.30 | 03/22/2022 00:00:00 |
| 137424 | 156 | 151.00 | 04/04/2022 00:00:00 |
| 248475 | 156 | 87.00 | 04/16/2022 00:00:00 |
| users |
|---|
| 1 |
Among all the users, only user ID 156 had their first transaction valued at over $50.
The dataset you are querying against may have different input & output - this is just an example!
Let's get to it!
Step 1: Getting the ordering of customer purchases
We'll start by determining the order of customer purchases. To do this, we use the window function. It helps us retrieve the value of a specific column from the first row within a group of rows. Think of it as picking out the initial value from a particular column for each group or partition in our data.
Step 2: Filtering customers with their first purchase valued at $50 or more
Next, we filter the customers based on their first purchase. We can accomplish this using either a Common Table Expression (CTE) or a subquery. The CTE/subquery includes the column, allowing us to consider the purchase value.
We select customers whose first purchase, indicated by the spend value, is $50 or more.
Method #2: Using RANK() function
Alternatively, we can achieve the same result by using the RANK() function.