Histogram of Users and Purchases
This is the same question as problem #13 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the below table on transactions from users. Bucketing users based on their latest transaction date, write a query to obtain the number of users who made a purchase and the total number of products bought for each transaction date.
Output the transaction date, number of users and number of products.
First, we obtain the latest transaction date for each user. This can be done with a common table expression (CTE) or subquery using a window function to rank the transaction dates per user and per product.
Then, using the results from the previous CTE or subquery, we both the user ids and product ids where the rank is 1 (which refers to the latest transaction) while grouping by each transaction date.
A CTE is a temporary data set to be used as part of a query and it exists during the entire query session. A subquery is a nested query. It’s a query within a query and unlike CTE, it can be used within that query only. Read here and here for more details.
Both methods give the same output and perform fairly similarly. Differences are CTE is reusable during the entire session and more readable, whereas subquery can be used in FROM and WHERE clauses and can act as a column with a single value. We share more resources here (1, 2, 3 on their use cases.
Below are two methods of solving the question using CTE and subquery.
Solution #1: Using CTE
Solution #2: Using Subquery