Back to questions
This is the same question as problem #23 in the SQL Chapter of Ace the Data Science Interview!
Assume you have the table below containing information on Facebook user actions. Write a query to obtain the active user retention in July 2022. Output the month (in numerical format 1, 2, 3) and the number of monthly active users (MAUs).
Hint: An active user is a user who has user action ("sign-in", "like", or "comment") in the current month and last month.
Column Name | Type |
---|---|
user_id | integer |
event_id | integer |
event_type | string ("sign-in, "like", "comment") |
event_date | datetime |
user_id | event_id | event_type | event_date |
---|---|---|---|
445 | 7765 | sign-in | 05/31/2022 12:00:00 |
742 | 6458 | sign-in | 06/03/2022 12:00:00 |
445 | 3634 | like | 06/05/2022 12:00:00 |
742 | 1374 | comment | 06/05/2022 12:00:00 |
648 | 3124 | like | 06/18/2022 12:00:00 |
month | monthly_active_users |
---|---|
6 | 1 |
In June 2022, there was only one monthly active user (MAU), 445.
Note: We are showing you output for June 2022 as the user_actions table only have event_dates in June 2022. You should work out the solution for July 2022.
The dataset you are querying against may have different input & output - this is just an example!
In order to calculate the active user retention, we will have to check for each user whether they were active in the current month versus last month.
We can use a 2-step approach:
Step 1
Let's start with finding users who were active in the last month.
This is a tricky question, so take your time and read through the solution a few times if you have to.
Sharing the code here with the interpretation below.
Important: Bear in mind that you will not able to run this query on its own because it is referencing to another table curr_month which do not reside in this query. We will show you the query to run later. For now, just try to understand the logic behind the solution.
Step 2
Using operator, we find for users in the current month which also exists in the subquery, which represents active users in the last month (in step 1). Note that the user_actions table is aliased as curr_month (to say "Hey, this is current month's user information!").
To bucket days into its month, we extract the month information by using . Then, we use a over user id to obtain the monthly active user count for the month.
Now you can run this query in the editor.
Results:
month | monthly_active_users |
---|---|
7 | 2 |
In July 2022, there are only 2 distinct users who are active in the last month and current month.
PostgreSQL 14