Back to questions
New TikTok users sign up with their emails. They confirmed their signup by replying to the text confirmation to activate their accounts. Users may receive multiple text messages for account confirmation until they have confirmed their new account.
A senior analyst is interested to know the activation rate of specified users in the table. Write a query to find the activation rate. Round the percentage to 2 decimal places.
Definitions:
Assumptions:
Effective April 4th 2023, we added an assumption to the question to provide additional clarity.
Column Name | Type |
---|---|
email_id | integer |
user_id | integer |
signup_date | datetime |
email_id | user_id | signup_date |
---|---|---|
125 | 7771 | 06/14/2022 00:00:00 |
236 | 6950 | 07/01/2022 00:00:00 |
433 | 1052 | 07/09/2022 00:00:00 |
Column Name | Type |
---|---|
text_id | integer |
email_id | integer |
signup_action | varchar |
text_id | email_id | signup_action |
---|---|---|
6878 | 125 | Confirmed |
6920 | 236 | Not Confirmed |
6994 | 236 | Confirmed |
'Confirmed' in means the user has activated their account and successfully completed the signup process.
confirm_rate |
---|
0.67 |
67% of users have successfully completed their signup and activated their accounts. The remaining 33% have not yet replied to the text to confirm their signup.
The dataset you are querying against may have different input & output - this is just an example!
To start, we will use a LEFT JOIN to connect the and tables. It's important to note that an won't work for this question and we'll explain why shortly.
We'll keep only the users who successfully signed up and received a 'Confirmed' text confirmation in the column located in the table.
Your output should look something like this:
email_id | email_id |
---|---|
125 | 125 |
236 | 236 |
433 | |
450 |
Why is Necessary in this Query?
Now, it's important to note that not every in the table will have a matching value in the table, and this is where the comes into play.
When we perform a , all the rows from the left table ( in this case) are returned along with matching rows from the right table ( in this case). If there is no match for a particular row in the right table, then the columns from the right table will be NULL.
If we were to use an , only the matching rows between the two tables would be returned, effectively filtering out any values from the table that do not have a corresponding match in the table.
Output from using an :
email_id | email_id |
---|---|
125 | 125 |
236 | 236 |
This could result in relevant users being excluded from the calculation completely, which is not what we want.
Signup Activation Rate = Number of users who confirmed their accounts / Number of users in the table
With the given formula, we expressed them in the query below.
But wait, did you get an activation rate of '0' when you ran the query? That's because dividing an integer with another integer would sometimes result in '0'.
To avoid this, we'll need to cast either the denominator or the numerator to DECIMAL type.
Finally, the function is used to truncate the result to 2 decimal places, as specified in the instructions.