Back to questions
New TikTok users sign up with their emails. They confirmed their signup by replying to the text confirmation to activate their accounts. Users may receive multiple text messages for account confirmation until they have confirmed their new account.
Write a query to find the activation rate of the users. Round the percentage to 2 decimal places.
Definitions:
As of 29 Nov 2022, the term confirmation rate is changed to activation rate to reflect the nature of the new user activation process.
Column Name | Type |
---|---|
email_id | integer |
user_id | integer |
signup_date | datetime |
email_id | user_id | signup_date |
---|---|---|
125 | 7771 | 06/14/2022 00:00:00 |
236 | 6950 | 07/01/2022 00:00:00 |
433 | 1052 | 07/09/2022 00:00:00 |
Column Name | Type |
---|---|
text_id | integer |
email_id | integer |
signup_action | varchar |
text_id | email_id | signup_action |
---|---|---|
6878 | 125 | Confirmed |
6920 | 236 | Not Confirmed |
6994 | 236 | Confirmed |
'Confirmed' in means the user has activated their account and successfully completed the signup process.
confirm_rate |
---|
0.67 |
67% of users have successfully completed their signup and activated their accounts. The remaining 33% have not yet replied to the text to confirm their signup.
The dataset you are querying against may have different input & output - this is just an example!
The following steps are suggested to solve this question:
First, we use a LEFT JOIN to connect the and the tables. Take note that you should not use an - we'll explain more below.
Then, we keep the users who successfully signed up and received a 'Confirmed' text confirmation in the column located in the table.
Your output should look something like this:
user_id | signup_action |
---|---|
7771 | Confirmed |
6950 | Confirmed |
1052 | NULL |
Let's break down the above query:
Take note that the might not have a matching value in the table for 2 reasons:
Why and not ?
In these cases, the will create a new row that contains . The value for the column will simply be because no match for was found in the table.
If we replaced the with , the output would show only the overlap between the two tables with no values. It would look something like this:
user_id | signup_action |
---|---|
7771 | Confirmed |
6950 | Confirmed |
We’re missing the user ID 1052 because the user did not initiate any text-confirmation activity, and has no corresponding value in the table.
Therefore, this would give us an incorrect activation rate, as relevant users might be excluded from the calculation completely.
Next, we separate the users who have activated vs. those who have not. Using a conditional statement, we assign 1 to the users with non-null values (users who have activated) and 0 to the users with null values (users who have not activated).
After wrapping the above query into a Common Table Expression (CTE), we calculate the signup activation rate by referring to the following formula.
You can also use a subquery, but CTEs tend to be the best practice.
Signup Activation Rate = Users who activated their accounts / All users in the table
So far, our output looks like this:
user_id | signup |
---|---|
7771 | 1 |
6950 | 1 |
1052 | 0 |
Among the 3 users in the output, 2 of them have activated their accounts.
To avoid integer division and have the results in a decimal format, one of the integers used in the formula, either the denominator or numerator must be cast into a type. Read here to learn more about division issues in PostgreSQL.
Finally, the function is used to truncate the result to 2 decimal places, as specified in the instructions.
Solution #2: Without CTE
PostgreSQL 14