Back to questions
This is the same question as problem #31 in the SQL Chapter of Ace the Data Science Interview!
Imagine you're provided with a table containing information about user logins on Facebook in 2022. Write a query that determines the number of reactivated users for a given month. Reactivated users are those who were inactive the previous month but logged in during the current month.
Output the month in numerical format along with the count of reactivated users.
Here's some important assumptions to consider:
As of Aug 4th, 2023, we have carefully reviewed the feedback received and made necessary updates to the solution.
Column Name | Type |
---|---|
user_id | integer |
login_date | datetime |
user_id | login_date |
---|---|
123 | 02/22/2022 12:00:00 |
112 | 03/15/2022 12:00:00 |
245 | 03/28/2022 12:00:00 |
123 | 05/01/2022 12:00:00 |
725 | 05/25/2022 12:00:00 |
mth | reactivated_users |
---|---|
2 | 1 |
3 | 2 |
5 | 2 |
In February 2022, we have 1 reactivated user, user 123. They had previously logged in during 2021 and recently reactivated their account in February 2022.
In March 2022, we have 2 reactivated users. Users 112 and 245 were inactive in the previous months but logged in during March 2022, indicating that they reactivated their accounts.
Moving on to May 2022, we still have 2 reactivated users. User 123, who had previously reactivated in February 2022, reactivated their account again in May 2022. Additionally, user 725, who was inactive in the previous months, logged in during May 2022, indicating that they reactivated their account once again.
The dataset you are querying against may have different input & output - this is just an example!
We're given a table containing information about user logins on Facebook in 2022.
The task is to find the number of reactivated users for a given month. Reactivated users are those who were inactive in the previous month but logged in during the current month.
The problem provides important assumptions to consider:
The problem requires us to output the month in numerical format along with the count of reactivated users. To achieve this, we need to extract the month from the and count the distinct for each month.
The solution uses a query with a subquery to find reactivated users. The subquery checks if there is no previous login record for a user in the previous month. If a previous login record does not exist, it means the user is reactivated in the current month.
The main query selects the month and the count of reactivated users for each month:
The subquery checks for the absence of a previous login record for a user in the previous month:
Here, is used to filter only those records where the subquery returns no rows. The subquery looks for a user's previous login record in the table, and the condition ensures we're checking for the previous month's login.
Finally, we integrate the main query with the subquery to count the distinct , and then we group and order the results by the month.