Back to questions
You're given a table containing Facebook posts from users who have posted at least twice. Write a query to determine the average number of days between each user’s posts specifically during 2021.
Output user ID along with the average number of days (rounded to the nearest integer) between their posts.
Effective June 8th, 2023, the problem statement and solution have been updated.
Column Name | Type |
---|---|
user_id | integer |
post_id | integer |
post_date | timestamp |
post_content | text |
user_id | post_id | post_date | post_content |
---|---|---|---|
004239 | 784254 | 07/04/2021 11:00:00 | Happy 4th of July! |
151652 | 599415 | 07/10/2021 12:00:00 | Need a hug |
151652 | 111766 | 07/12/2021 19:00:00 | I'm so done with covid - need travelling ASAP! |
151652 | 994156 | 07/15/2021 12:00:00 | Does anyone have an extra iPhone charger to sell? |
661093 | 442560 | 07/08/2021 14:00:00 | Just going to cry myself to sleep after watching Marley and Me. |
661093 | 624356 | 07/29/2021 13:00:00 | Bed. Class 8-12. Work 12-3. Gym 3-5 or 6. Then class 6-10. Another day that's gonna fly by. I miss my girlfriend |
user_id | avg_hiatus_days |
---|---|
151652 | 3 |
661093 | 21 |
Explanation: Let's consider User 151652 as an example. Between their first and second post, there is a time difference of 2 days. Similarly, between their second and third post, there is a time difference of 3 days. By calculating the average of these time differences, we find that the average number of days between User 661903's posts rounded to the nearest integer is 3 days.
The dataset you are querying against may have different input & output - this is just an example!
Let's break down the solution into 2 steps:
Step 1: Find the days between posts
To calculate the days between posts, we first need to determine the most recent (aliased as ) and earliest (aliased as ) dates for each user.
We'll use the and functions respectively and group the results by . Additionally, we'll filter the posts for the year 2021 and include only users who have posted at least twice.
Here's the results:
user_id | max_date | min_date | post_count |
---|---|---|---|
151652 | 12/01/2021 11:00:00 | 01/28/2021 11:00:00 | 3 |
661093 | 09/08/2021 10:00:00 | 02/14/2021 11:00:00 | 3 |
Click here to learn about the difference between and clauses.
Step 2: Calculate the average days between posts
To calculate the average days between posts, we use the output from Step 1 and wrap it in a Common Table Expression (CTE) called . This allows us to reuse the results to find the average hiatus between posts for each user.
In the final query, we calculate the difference in days between the most recent and earliest dates using to extract the day value. We then divide this by the minus 1 as we subtract 1 to account for the intervals between posts.
Finally, the result is rounded to the nearest integer.
And, here's the final results:
user_id | avg_hiatus_days |
---|---|
151652 | 154 |
661093 | 103 |