Back to questions
This is the same question as problem #33 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the table below containing information on user transactions on Stripe. Write a query to obtain all of the users' rolling 3-day earnings. Output the user ID, transaction date and rolling 3-day earnings.
User 525's rolling 3-day earning of 571.20 is a cumulative total of the 2 previous days (124.20 + 324.00) and current day (123.00).
The dataset you are querying against may have different input & output - this is just an example!
If this is your first time with the rolling sum, don't worry – we'll take you through the solution step-by-step!
Objective: Find the rolling 3-day earnings for each user by transaction dates.
The table is recorded by individual transactions and not grouped by the date, so we have to sum the earnings by user for each date.
The query should look something like this:
The output for user 423:
Keep in mind that you do not need to order the results using at this stage, because the above query will be converted into a Common Table Expression (CTE).
Next, we'll wrap the query into a CTE called to reuse the output; then, we can calculate rolling 3-day earnings using the field (see below).
Let's break down the rolling sum calculation, where is essentially used as a window function.
Taken alone, the block looks like this:
Steps within the clause:
Now, if you're familiar with using the window function, I'm sure you've heard of as well. Do you know why we're using the former instead of the latter?
Have a run with this query and we'll explain the differences to you.
Output for user 423 only:
field is the rolling 3-day earnings using (as used in the solution) whereas field uses .
Let's focus on the 3rd row, 01/04/2022. Using RANGE, we're accounting for the previous 2 days, 01/03/2022 401.38 and 01/04/2022 123.45 and the current day's earnings, 01/03/2022 401.38 = 524.83. The days are taken into account and not the rows.
If we used ROWS, we'd have to consider the previous 2 rows and the current row's earnings which are 01/01/2022 240.13, 01/03/2022 401.38 and 01/04/2022 123.45 = 764.96. Rows don't directly correspond to days, so it doesn't make sense to pull 3 rows and assume that they represent 3 distinct days.
As the question is asking for a 3-day rolling sum, it's more appropriate to use RANGE, not ROWS.
We hope you learned something new in this question, and that you had as much fun as we did!