At Snapchat, SQL does the heavy lifting for analyzing in-app behavior for product improvements and to help advertisers analyze their Return-on-ad-spend (ROAS) for Snap ad campaigns. Because of this, Snap LOVES to ask SQL coding questions during interviews for Data Science, Data Engineering and Data Analytics jobs. That's why we've curated 9 Snap SQL interview questions to practice, which are similar to recently asked questions at Snap.
Assume you're given tables with information on Snapchat users, including their ages and time spent sending and opening snaps.
Write a query to obtain a breakdown of the time spent sending vs. opening snaps as a percentage of total time spent on these activities grouped by age group. Round the percentage to 2 decimal places in the output.
Notes:
Interactively explore the dataset, and write code directly in the browser to solve this Snapchat Join SQL question:
Column Name | Type |
---|---|
activity_id | integer |
user_id | integer |
activity_type | string ('send', 'open', 'chat') |
time_spent | float |
activity_date | datetime |
activity_id | user_id | activity_type | time_spent | activity_date |
---|---|---|---|---|
7274 | 123 | open | 4.50 | 06/22/2022 12:00:00 |
2425 | 123 | send | 3.50 | 06/22/2022 12:00:00 |
1413 | 456 | send | 5.67 | 06/23/2022 12:00:00 |
1414 | 789 | chat | 11.00 | 06/25/2022 12:00:00 |
2536 | 456 | open | 3.00 | 06/25/2022 12:00:00 |
Column Name | Type |
---|---|
user_id | integer |
age_bucket | string ('21-25', '26-30', '31-25') |
user_id | age_bucket |
---|---|
123 | 31-35 |
456 | 26-30 |
789 | 21-25 |
age_bucket | send_perc | open_perc |
---|---|---|
26-30 | 65.40 | 34.60 |
31-35 | 43.75 | 56.25 |
Using the age bucket 26-30 as example, the time spent sending snaps was 5.67 and the time spent opening snaps was 3.
To calculate the percentage of time spent sending snaps, we divide the time spent sending snaps by the total time spent on sending and opening snaps, which is 5.67 + 3 = 8.67.
So, the percentage of time spent sending snaps is 5.67 / (5.67 + 3) = 65.4%, and the percentage of time spent opening snaps is 3 / (5.67 + 3) = 34.6%.
To see a step-by-step explanation of this Snapchat problem, click here.
Snap (the parent company of Snapchat) wants to identify its VIP users - the users who are most active on the platform. They decide to define a VIP user as someone who sends more than 100 snaps per week. As an interviewee, your task is to write a SQL query that will generate a list of VIP users for the past 4 weeks.
snap_id | user_id | snap_date |
---|---|---|
103 | 567 | 2022-09-01 |
206 | 503 | 2022-09-02 |
317 | 567 | 2022-09-02 |
401 | 784 | 2022-09-03 |
562 | 567 | 2022-09-03 |
618 | 78 | 2022-09-03 |
725 | 503 | 2022-09-04 |
851 | 784 | 2022-09-05 |
998 | 567 | 2022-09-05 |
Assume today's date is '2022-09-06'.
This query starts by selecting the user_id and count of snaps from the snaps table for each user who sent a snap in the past 4 weeks. Then, it checks the count and filters out any users who sent less than or equal to 400 snaps in that time (since 4 weeks equals nearly four we are considering). The result is a list of only those users who sent more than 400 snaps in the past 4 weeks, essentially our VIP users according to Snap's definition.
In case this marketing problem was interesting, checkout how this blog on how marketing analysts use SQL.
Snap Inc. has a table, , that records the daily number of views each snap (an ephemeral photo or video capture) receives. This table's columns are (which identifies the snap), (the date the snap was viewed), and (the number of views that day).
Your task is to write an SQL query that determines the running total of views each snap has received, as well as the number of views each snap received in the past week (7 days).
snap_id | date | views |
---|---|---|
101 | 2022-08-01 | 5000 |
101 | 2022-08-02 | 4500 |
101 | 2022-08-03 | 5500 |
102 | 2022-08-01 | 6000 |
102 | 2022-08-02 | 6500 |
snap_id | date | total_views_to_date | weekly_views |
---|---|---|---|
101 | 2022-08-01 | 5000 | 5000 |
101 | 2022-08-02 | 9500 | 9500 |
101 | 2022-08-03 | 15000 | 15000 |
102 | 2022-08-01 | 6000 | 6000 |
102 | 2022-08-02 | 12500 | 12500 |
This query first calculates the running total of the views for each snap using the function over a window that's defined by the and ordered by . The window function ensures that the sum is computed for each snap individually, and the clause ensures the sum is cumulative up to the current date.
The query then calculates the number of views each snap received in the past week. It uses a similar approach but specifies a different window frame: the current row and the 6 preceding rows. This results in a 7-day window of views, which is equivalent to the past week.
Note: This solution assumes that the table contains an entry for each day for each snap. If a snap doesn't receive any views on a particular day, and thus has no row for that day in the table, the calculation may include data from more than 7 days ago. You may need to preprocess the data to include zero-view rows for such days, or adjust the window frame criteria to suit your data.
p.s. Window functions show up super often during Snap SQL interviews, so practice the 27+ window function questions on DataLemur
combines the results from two or more SELECT statements, but only returns the rows that are exactly the same between the two sets.
For example, say you were doing an HR Analytics project for Snap, and had access to Snap's contractors and employees data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables. You could use operator to find all employees who also show up in the contractors table:
Snap wants to understand user engagement, specifically they want to identify users who have been active in the last 30 days (as of 2021-08-31), have more than 5 friends, and have sent at least 20 messages.
user_id | last_active_date | friend_count | message_count |
---|---|---|---|
123 | 2021-08-30 | 10 | 30 |
265 | 2021-08-01 | 4 | 40 |
362 | 2021-07-01 | 10 | 50 |
192 | 2021-08-01 | 6 | 19 |
981 | 2021-08-30 | 6 | 20 |
user_id |
---|
123 |
981 |
Here's the SQL query which selects users based on the conditions:
In this query, we have filtered out the users based on three conditions:
The users satisfying all three conditions are selected.
Before we address vs. question, let's address the clause which is used in a statement to group rows into a set of summary rows. It's typically used in conjunction with aggregate functions such as:
The clause is used to filter the groups created by the clause. It's similar to the clause, but it is used to specify conditions on the groups created by the clause, rather than on the individual rows of the table.
For example:
This query retrieves the total salary for each department and groups the rows by department. The clause then filters the groups to include only Snap departments where the total salary is greater than $1 million
Note that the clause must be used after the clause and can only reference columns listed in the clause or aggregated columns selected by the statement.
To track the performance of Snapchat ad campaigns, it is vital to analyze the click-through-rate (CTR). The click-through-rate is calculated by dividing the number of users who clicked on an ad by the number of total impressions (times the ad was shown).
Given the following data about Snap's ads, can you calculate the click-through-rate for each campaign?
camp_id | camp_name |
---|---|
1 | Summer Sale |
2 | Winter Wonderland |
3 | Back to School |
impression_id | user_id | timestamp | camp_id |
---|---|---|---|
1001 | 4501 | 06/15/2022 17:20:00 | 1 |
1012 | 4502 | 06/16/2022 15:30:00 | 1 |
1023 | 4503 | 06/17/2022 10:25:00 | 2 |
1034 | 4504 | 06/18/2022 12:45:00 | 2 |
1045 | 4505 | 06/19/2022 23:33:00 | 3 |
click_id | user_id | timestamp | camp_id |
---|---|---|---|
2011 | 4501 | 06/15/2022 17:21:00 | 1 |
2022 | 4502 | 06/17/2022 15:31:30 | 1 |
2033 | 4503 | 06/17/2022 10:26:00 | 2 |
This SQL query joins the table with the table and table, based on the campaign id. It then calculates the CTR as the total count of distinct click ids divided by the total count of distinct impression ids for each campaign. The function is used to convert the number of clicks to a floating point number to allow for decimal division.
p.s. if you happen to be a Machine Learning nerd, you might like this related article from Snap's engineering blog on how they do ML for Ad Rankings which includes Ad CTR as a ranking signal in terms of ad relevancy
The primary key of a table is a column or set of columns that serves as a unique identifier for each row. It ensures that all rows are distinct and does not allow null values.
For example, say you had stored some Snapchat ad campaign data:
The CampaignID column is used to uniquely identify each row in the table, and the constraint ensures that there are no duplicate CampaignID values. This helps to maintain the accuracy of the data by preventing duplicate rows. The primary key is also an important part of the table because it enables you to easily identify and reference specific campaigns in your Snap Ad data. You can use it to join to other tables in the database, such as a table with data on the results of the campaigns.
Snap, the parent company of Snapchat, has numerous users who engage with different activities on the app platform. They would like to have an analysis of user engagement in terms of the total number of hours spent on the app by month and by activity.
Consider the following data available in the table "user_activity":
activity_id | user_id | activity_date | activity_name | hours_spent |
---|---|---|---|---|
101 | 1010 | 06/08/2022 | Videochat | 1.5 |
102 | 2050 | 06/10/2022 | Snap Map | 2.0 |
103 | 1010 | 06/18/2022 | Stories | 0.5 |
104 | 4150 | 07/26/2022 | Snap Map | 2.5 |
105 | 2050 | 07/05/2022 | Videochat | 3.0 |
The question is to form a SQL query that will provide the total hours spent on each activity per month.
This SQL query will group the data by the month in which the activity was performed and by the name of the activity, summing the total number of hours spent on each activity. The result will provide an overview of which activities are most engaging for users within each month.
mon | activity_name | sum |
---|---|---|
6 | Videochat | 1.5 |
6 | Snap Map | 2.0 |
6 | Stories | 0.5 |
7 | Snap Map | 2.5 |
7 | Videochat | 3.0 |
Please note: The SQL syntax for extracting the month might be different depending on the SQL dialect used. The above SQL is based on PostgreSQL.
As a data analyst at Snap, you are given access to two tables - and . The table contains information about the users such as , , and . The table, on the other hand, keeps track of all friendships, specifying the who sent the friend request, who received it and the of the friend request.
Write a SQL query to find all users who have signed up in the last 30 days and have sent at least one friend request which has been accepted.
user_id | user_name | sign_up_date | last_log_in_date |
---|---|---|---|
1 | John Doe | 2022-09-01 00:00:00 | 2022-09-14 00:00:00 |
2 | Jane Doe | 2022-09-10 00:00:00 | 2022-09-15 00:00:00 |
3 | Harry Potter | 2022-08-28 00:00:00 | 2022-09-16 00:00:00 |
4 | Hermione Granger | 2022-08-20 00:00:00 | 2022-09-15 00:00:00 |
user_id1 | user_id2 | status |
---|---|---|
1 | 2 | Accepted |
1 | 3 | Pending |
3 | 4 | Accepted |
2 | 4 | Rejected |
This SQL query does a join on the and table on the common . It then filters for recent users who have signed up in the last 30 days. From this result, it further filters for users with at least one friend request accepted. We are using function to calculate date 30 days ago from the current date and clause to filter grouped data.
The key to acing a Snap SQL interview is to practice, practice, and then practice some more! Beyond just solving the above Snap SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Netflix, Airbnb, and Amazon.
Each DataLemur SQL question has hints to guide you, fully explained answers along with a discussion board to see how others solved it and crucially, there's an online SQL coding environment so you can easily right in the browser your SQL query answer and have it executed.
To prep for the Snap SQL interview you can also be useful to solve SQL questions from other consumer-tech/ ad-tech companies like:
However, if your SQL skills are weak, forget about diving straight into solving questions – refresh your SQL knowledge with this interactive SQL tutorial.
This tutorial covers things like handling date/timestamp data and INTERCEPT/EXCEPT – both of these come up often in Snap SQL assessments.
For the Snap Data Science Interview, in addition to SQL query questions, the other types of questions which are covered:
To prepare for Snap Data Science interviews read the book Ace the Data Science Interview because it's got: