Back to questions
This is the same question as problem #10 in the SQL Chapter of Ace the Data Science Interview!
Given a table of tweet data over a specified time period, calculate the 3-day rolling average of tweets for each user. Output the user ID, tweet date, and rolling averages rounded to 2 decimal places.
Notes:
Effective April 7th, 2023, the problem statement, solution and hints for this question have been revised.
Column Name | Type |
---|---|
user_id | integer |
tweet_date | timestamp |
tweet_count | integer |
user_id | tweet_date | tweet_count |
---|---|---|
111 | 06/01/2022 00:00:00 | 2 |
111 | 06/02/2022 00:00:00 | 1 |
111 | 06/03/2022 00:00:00 | 3 |
111 | 06/04/2022 00:00:00 | 4 |
111 | 06/05/2022 00:00:00 | 5 |
user_id | tweet_date | rolling_avg_3d |
---|---|---|
111 | 06/01/2022 00:00:00 | 2.00 |
111 | 06/02/2022 00:00:00 | 1.50 |
111 | 06/03/2022 00:00:00 | 2.00 |
111 | 06/04/2022 00:00:00 | 2.67 |
111 | 06/05/2022 00:00:00 | 4.00 |
The dataset you are querying against may have different input & output - this is just an example!
To obtain the average rolling average tweet count for each user, we use the following query which calculates the average tweet count for each user ID and date using window function.
Output showing the first 5 rows:
user_id | tweet_date | tweet_count | rolling_avg |
---|---|---|---|
111 | 06/01/2022 00:00:00 | 2 | 2.0000000000000000 |
111 | 06/02/2022 00:00:00 | 1 | 1.5000000000000000 |
111 | 06/03/2022 00:00:00 | 3 | 2.0000000000000000 |
111 | 06/04/2022 00:00:00 | 4 | 2.5000000000000000 |
111 | 06/05/2022 00:00:00 | 5 | 3.0000000000000000 |
The output shows the rolling average tweet count for the cumulative number of days.
Calculating the rolling average tweet count:
However, this is not what we need as we want to calculate the rolling average over a 3-day period. To achieve this, we add the expression to the window function.
To calculate the rolling average tweet count by 3-day period, we modify the previous query by adding the expression to the window function.
Output showing the first 5 rows:
user_id | tweet_date | tweet_count | rolling_avg |
---|---|---|---|
111 | 06/01/2022 00:00:00 | 2 | 2.0000000000000000 |
111 | 06/02/2022 00:00:00 | 1 | 1.5000000000000000 |
111 | 06/03/2022 00:00:00 | 3 | 2.0000000000000000 |
111 | 06/04/2022 00:00:00 | 4 | 2.6666666666666667 |
111 | 06/05/2022 00:00:00 | 5 | 4.0000000000000000 |
This query outputs the rolling average tweet count by 3-day period, as required by the question.
Calculating the rolling averages using
Finally, we round up the rolling averages to the nearest 2 decimal points by incorporating the function into the previous query.