Back to questions
This is the same question as problem #25 in the SQL Chapter of Ace the Data Science Interview!
Assume you're given tables with information on Snapchat users, including their ages and time spent sending and opening snaps.
Write a query to obtain a breakdown of the time spent sending vs. opening snaps as a percentage of total time spent on these activities grouped by age group. Round the percentage to 2 decimal places in the output.
Notes:
Effective April 15th, 2023, the solution has been updated and optimised.
Column Name | Type |
---|---|
activity_id | integer |
user_id | integer |
activity_type | string ('send', 'open', 'chat') |
time_spent | float |
activity_date | datetime |
activity_id | user_id | activity_type | time_spent | activity_date |
---|---|---|---|---|
7274 | 123 | open | 4.50 | 06/22/2022 12:00:00 |
2425 | 123 | send | 3.50 | 06/22/2022 12:00:00 |
1413 | 456 | send | 5.67 | 06/23/2022 12:00:00 |
1414 | 789 | chat | 11.00 | 06/25/2022 12:00:00 |
2536 | 456 | open | 3.00 | 06/25/2022 12:00:00 |
Column Name | Type |
---|---|
user_id | integer |
age_bucket | string ('21-25', '26-30', '31-25') |
user_id | age_bucket |
---|---|
123 | 31-35 |
456 | 26-30 |
789 | 21-25 |
age_bucket | send_perc | open_perc |
---|---|---|
26-30 | 65.40 | 34.60 |
31-35 | 43.75 | 56.25 |
Using the age bucket 26-30 as example, the time spent sending snaps was 5.67 and the time spent opening snaps was 3.
To calculate the percentage of time spent sending snaps, we divide the time spent sending snaps by the total time spent on sending and opening snaps, which is 5.67 + 3 = 8.67.
So, the percentage of time spent sending snaps is 5.67 / (5.67 + 3) = 65.4%, and the percentage of time spent opening snaps is 3 / (5.67 + 3) = 34.6%.
The dataset you are querying against may have different input & output - this is just an example!
We start by joining the table with the table using the field as common identifier. We then filter the results by 'send' and 'open' because we are interested in focusing on sending and opening snaps. Finally, we group the results by to aggregate the data.
Next, we calculate the total time spent on sending and opening snaps using aggregate function along with FILTER clause for each activity type ('send' and 'open').
Here's how the output looks like:
age_bucket | send_perc | open_perc |
---|---|---|
21-25 | 6.24 | 5.25 |
26-30 | 13.91 | 3.00 |
31-35 | 3.50 | 5.75 |
Next, we convert the query into a Common Table Expression (CTE) called . Within the CTE, we calculate the percentages of time spent on sending and opening snaps using the formula:
Percentage of time spent on sending/opening snaps = Time spent on sending/opening snaps / Total time spent on sending and opening snaps
This formula will help us determine the proportion of time spent on sending or opening snaps relative to the total time spent on both activities.
age_bucket | send_perc | open_perc |
---|---|---|
21-25 | 0.54308093994778067885 | 0.45691906005221932115 |
26-30 | 0.82259018332347723241 | 0.17740981667652276759 |
31-35 | 0.37837837837837837838 | 0.62162162162162162162 |
Finally, we round the percentages to 2 decimal places. Ensure that you're wrapping the expression correctly.
Lastly, we ensure that the calculated percentages are rounded to 2 decimal places using the function with the appropriate precision specified.
Here's the results:
age_bucket | send_perc | open_perc |
---|---|---|
21-25 | 54.31 | 45.69 |
26-30 | 82.26 | 17.74 |
31-35 | 37.84 | 62.16 |
Here's a summary of the results:
A useful tip to keep in mind is that when dividing two integers using the division operator in PostgreSQL, only the integer part of the result is considered, disregarding any remainder (e.g., .123, .352). To avoid integer division and ensure that the resulting values are converted into decimal values, it is important to multiply the values by 100.0 before performing the division. This will ensure that the calculated percentages are accurate and expressed as decimal values.
Solution #2: Using CTE and CASE Statement
PostgreSQL 14