Back to questions
This is the same question as problem #25 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the tables below containing information on Snapchat users, their ages, and their time spent sending and opening snaps. Write a query to obtain a breakdown of the time spent sending vs. opening snaps (as a percentage of total time spent on these activities) for each age group.
Output the age bucket and percentage of sending and opening snaps. Round the percentage to 2 decimal places.
Notes:
Column Name | Type |
---|---|
activity_id | integer |
user_id | integer |
activity_type | string ('send', 'open', 'chat') |
time_spent | float |
activity_date | datetime |
activity_id | user_id | activity_type | time_spent | activity_date |
---|---|---|---|---|
7274 | 123 | open | 4.50 | 06/22/2022 12:00:00 |
2425 | 123 | send | 3.50 | 06/22/2022 12:00:00 |
1413 | 456 | send | 5.67 | 06/23/2022 12:00:00 |
1414 | 789 | chat | 11.00 | 06/25/2022 12:00:00 |
2536 | 456 | open | 3.00 | 06/25/2022 12:00:00 |
Column Name | Type |
---|---|
user_id | integer |
age_bucket | string ('21-25', '26-30', '31-25') |
user_id | age_bucket |
---|---|
123 | 31-35 |
456 | 26-30 |
789 | 21-25 |
age_bucket | send_perc | open_perc |
---|---|---|
26-30 | 65.40 | 34.60 |
31-35 | 43.75 | 56.25 |
For the age bucket 26-30, the time spent sending snaps was 5.67 and opening 3. The percent of time sending snaps was 5.67/(5.67+3)=65.4%, and the percent of time opening snaps was 3/(5.67+3)=34.6%.
The dataset you are querying against may have different input & output - this is just an example!
First, we join the field from the table on the corresponding field with the table. Next, we filter for 'send' and 'open' because we have to find the percentage of sending and opening snaps and then group by .
Subsequently, we obtain the total time spent on sending and opening snaps using conditional statements for each activity type ('send' and 'open') and also find the total time spent.
Here's what the output looks like:
age_bucket | send_timespent | open_timespent | total_timespent |
---|---|---|---|
21-25 | 6.24 | 5.25 | 11.49 |
26-30 | 13.91 | 3.00 | 16.91 |
31-35 | 3.50 | 5.75 | 9.25 |
Then, we convert the query into a CTE called and using the formula below, we calculate the percentages of the time spent sending and opening snaps.
Percentage of time spend on sending/opening snaps = Time spent on sending/opening snaps / Total time spent on sending and opening snaps
Tip: When dividing 2 integers, the division / operator in PostgreSQL only considers the integer part of the results and does not deal with the remainder (ie. .123, .352). Hence, to avoid integer division which happens when we multiply by 100 and to convert the numerical values into decimal values, we multiply the values by 100.0.
Results:
age_bucket | send_perc | open_perc |
---|---|---|
21-25 | 54.31 | 45.69 |
26-30 | 82.26 | 17.74 |
31-35 | 37.84 | 62.16 |
Solution #1: Using CASE Statement & CTE (as above)
Solution #2: Using FILTER
PostgreSQL 14