Back to questions
This is the same question as problem #7 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the table below containing information on user purchases. Write a query to obtain the number of users who purchased the same product on two or more different days. Output the number of unique users.
PS. On 26 Oct 2022, we expanded the data set, thus the official output may vary from before.
Column Name | Type |
---|---|
user_id | integer |
product_id | integer |
quantity | integer |
purchase_date | datetime |
user_id | product_id | quantity | purchase_date |
---|---|---|---|
536 | 3223 | 6 | 01/11/2022 12:33:44 |
827 | 3585 | 35 | 02/20/2022 14:05:26 |
536 | 3223 | 5 | 03/02/2022 09:33:28 |
536 | 1435 | 10 | 03/02/2022 08:40:00 |
827 | 2452 | 45 | 04/09/2022 00:00:00 |
repeat_purchasers |
---|
1 |
The dataset you are querying against may have different input & output - this is just an example!
Objective: Obtain the number of unique users who purchased the same product on two or more different days.
As the is in timestamp format, we first convert it into format by appending to to fulfill the condition of "purchases on two or more different days".
By counting the number of distinct grouped by and , we are counting the number of purchases made by each user for each product. More than 1 count constitutes a repeat purchase of the same product, which is reflected by the clause.
Finally, we retrieve the number of unique users in the final query by applying a in the function.
Solution #1: Using Subquery (As per the explanation above)
Solution #2: Using CTE
Solution #3: Using Self-Join