logo

Back to questions

Tweets' Rolling Averages [Twitter SQL Interview Question]

Medium

This is the same question as problem #10 in the SQL Chapter of Ace the Data Science Interview!

The table below contains information about tweets over a given period of time. Calculate the 3-day rolling average of tweets published by each user for each date that a tweet was posted. Output the user id, tweet date, and rolling averages rounded to 2 decimal places.

Important Assumptions:

  • Rows in this table are consecutive and ordered by date.
  • Each row represents a different day
  • A day that does not correspond to a row in this table is not counted. The most recent day is the next row above the current row.

Note: Rolling average is a metric that helps us analyze data points by creating a series of averages based on different subsets of a dataset. It is also known as a moving average, running average, moving mean, or rolling mean.

Table:

Column NameType
tweet_idinteger
user_idinteger
tweet_datetimestamp

Example Input:

tweet_iduser_idtweet_date
21425211106/01/2022 12:00:00
73925211106/01/2022 12:00:00
84640211106/02/2022 12:00:00
24142525406/02/2022 12:00:00
13737411106/04/2022 12:00:00

Example Output:

user_idtweet_daterolling_avg_3days
11106/01/2022 12:00:002.00
11106/02/2022 12:00:001.50
11106/04/2022 12:00:001.33
25406/02/2022 12:00:001.00

Explanation

User 111 made 2 tweets on 06/01/2022, and 1 tweet the next day. By 06/02/2022, the user had made in total 3 tweets over the course of 2 days; thus, the rolling average is 3/2=1.5. By 06/04/2022, there are 4 tweets that were made during 3 days: 4/3 = 1.33 rolling average.

The dataset you are querying against may have different input & output - this is just an example!

PostgreSQL 14