Facebook/Meta LOVES to ask SQL coding questions during interviews for Data Analyst, Data Science, and Data Engineering jobs. I know this first-hand, because I used to work at FB, and I also co-authored Ace the Data Science Interview with Kevin Huo (an Ex-Facebook Data Scientist).
Because we know the Meta/Facebook SQL interview process intimately well, we curated 9 real Facebook/Meta SQL interview questions to practice which come from recent Meta interviews.
Given a table of Facebook posts, for each user who posted at least twice in 2024, write a SQL query to find the number of days between each user’s first post of the year and last post of the year in the year 2024. Output the user and number of the days between each user's first and last post.
You can solve this SQL problem interactively and run your solution directly on DataLemur:
Column Name | Type |
---|---|
user_id | integer |
post_id | integer |
post_date | timestamp |
post_content | text |
user_id | post_id | post_date | post_content |
---|---|---|---|
151652 | 599415 | 07/10/2024 12:00:00 | Need a hug |
661093 | 624356 | 07/29/2024 13:00:00 | Bed. Class 8-12. Work 12-3. Gym 3-5 or 6. Then class 6-10. Another day that's gonna fly by. I miss my girlfriend |
004239 | 784254 | 07/04/2024 11:00:00 | Happy 4th of July! |
661093 | 442560 | 07/08/2024 14:00:00 | Just going to cry myself to sleep after watching Marley and Me. |
151652 | 111766 | 07/12/2024 19:00:00 | I'm so done with covid - need traveling ASAP! |
user_id | days_between |
---|---|
151652 | 2 |
661093 | 21 |
First, we can use the and aggregate functions on the column to retrieve the earliest and latest post dates, and substract one from another accordingly.
To calculate the difference for each user, we the results by , and then filter for posts made in the year 2024. To do so, we use the function to extract the year from the column.
In the final step, to exclude users who have posted only once during the year, we apply the clause with a condition greater than 1.
This yields the final solution:
p.s. DataLemur Premium users can attempt part 2 of the problem which is much harder.
A Facebook power user is defined as someone who posts a ton, and gets a lot of reactions on their post. For the purpose of this question, consider a Facebook power user as someone who posts at least twice a day and receives an average of 150 comments and/or reactions per post.
Write a SQL query to return the IDs of all Facebook power users, along with the number of posts, and the average number of reactions per post.
Use the following tables "user_post" and "post_interactions":
user_id | post_id | post_date |
---|---|---|
1 | 1001 | 2024-09-01 |
1 | 1002 | 2024-09-01 |
2 | 1003 | 2024-09-02 |
2 | 1004 | 2024-09-03 |
1 | 1005 | 2024-09-02 |
post_id | comments | reactions |
---|---|---|
1001 | 75 | 200 |
1002 | 85 | 250 |
1004 | 60 | 90 |
1005 | 100 | 150 |
1003 | 50 | 70 |
The clause works similarly to the clause, but it is used to filter the groups of rows created by the clause rather than the rows of the table themselves.
For example, say you were analyzing Facebook ads data:
This query retrieves the total sales from all ads in each region, and uses the clause to only sales made after January 1, 2024. The rows are then grouped by region and the clause filters the groups to include only those with total sales greater than $500k.
Assume you're given a table containing information on Facebook user actions. Write a SQL query to obtain number of monthly active users (MAUs) in July 2022, including the month in numerical format "1, 2, 3".
Hint: An active user is defined as a user who has performed actions such as 'sign-in', 'like', or 'comment' in both the current month and the previous month.
You can type up + execute your SQL query interactively to this problem on DataLemur:
Column Name | Type |
---|---|
user_id | integer |
event_id | integer |
event_type | string ("sign-in, "like", "comment") |
event_date | datetime |
user_id | event_id | event_type | event_date |
---|---|---|---|
445 | 7765 | sign-in | 06/31/2022 12:00:00 |
742 | 6458 | sign-in | 07/03/2022 12:00:00 |
445 | 3634 | like | 07/05/2022 12:00:00 |
742 | 1374 | comment | 07/05/2022 12:00:00 |
648 | 3124 | like | 07/18/2022 12:00:00 |
month | monthly_active_users |
---|---|
6 | 1 |
In July 2022, there was only one monthly active user (MAU) with the 445.
For a full step-by-step explanation of the problem, click here.
In SQL, a join generally retrieves rows from multiple tables and combines them into a single result set. For an example of the difference between a left vs. right join, suppose you had a table of Facebook users and Facebook posts.
A retrieves all rows from the left table (in this case, the users table) and any matching rows from the right table (the posts table). If there is no match in the right table, NULL values will be returned for the right table's columns.
A combines all rows from the right table (in this case, the posts table) and any matching rows from the left table (the users table). If there is no match in the left table, NULL values will be displayed for the left table's columns.
Facebook wants to recommend new friends to people who show interest in attending 2 or more of the same private Facebook events.
Notes:
Column Name | Type |
---|---|
user_a_id | integer |
user_b_id | integer |
status | enum ('friends', 'not_friends') |
Each row of this table indicates the status of the friendship between user_a_id and user_b_id.
user_a_id | user_b_id | status |
---|---|---|
111 | 333 | not_friends |
222 | 333 | not_friends |
333 | 222 | not_friends |
222 | 111 | friends |
111 | 222 | friends |
333 | 111 | not_friends |
Column Name | Type |
---|---|
user_id | integer |
event_id | integer |
event_type | enum ('public', 'private') |
attendance_status | enum ('going', 'maybe', 'not_going') |
event_date | date |
user_id | event_id | event_type | attendance_status | event_date |
---|---|---|---|---|
111 | 567 | public | going | 07/12/2022 |
222 | 789 | private | going | 07/15/2022 |
333 | 789 | private | maybe | 07/15/2022 |
111 | 234 | private | not_going | 07/18/2022 |
222 | 234 | private | going | 07/18/2022 |
333 | 234 | private | going | 07/18/2022 |
user_a_id | user_b_id |
---|---|
222 | 333 |
333 | 222 |
Users 222 and 333 who are not friends have shown interest in attending 2 or more of the same private events.
To find pairs of friends to be recommended to each other if they're interested in attending 2 or more of the same private events we'll:
This leads to the following query:
For a full step-by-step solution, and to run the code yourself, subscribe to DataLemur premium to unlock this problem:
Also, if you're a nerd for Machine Learning, you can learn more about Facebook's People-You-May-Know (PYMK) algorithm here.
Constraints are just rules for your DBMS to follow when updating/inserting/deleting data.
Say you had a table of Facebook employees, and their salaries, job titles, and performance review data. Here's some examples of SQL constraints you could implement:
NOT NULL: This constraint could be used to ensure that certain columns in the employee table, such as the employee's first and last name, cannot contain NULL values.
UNIQUE: This constraint could be used to ensure that the employee ID is unique. This would prevent duplicate entries in the employee table.
PRIMARY KEY: This constraint could be used to combine the NOT NULL and UNIQUE constraints to create a primary key for the employee table. The employee ID could serve as the primary key.
FOREIGN KEY: This constraint could be used to establish relationships between the employee table and other tables in the database. For example, you could use a foreign key to link the employee ID to the department ID in a department table to track which department each employee belongs to.
CHECK: This constraint could be used to ensure that certain data meets specific conditions. For example, you could use a CHECK constraint to ensure that salary values are always positive numbers.
DEFAULT: This constraint could be used to specify default values for certain columns. For example, you could use a DEFAULT constraint to set the employee hire date to the current date if no value is provided when a new employee is added to the database.
As a data analyst at Facebook, you are asked to find the average number of shares per post for each user.
In the user_posts table, each row represents a post by a user. Each user may have zero or more posts.
In the post_shares table, each row represents a share of a post. Each post may have zero or more shares.
Please write a SQL query to find the average number of shares per post for each user.
post_id | user_id | post_text | post_date |
---|---|---|---|
1 | 1 | Hello world! | 06/08/2022 00:00:00 |
2 | 2 | What a beautiful day! | 06/10/2022 00:00:00 |
3 | 1 | Hope everyone is having a good day! | 06/18/2022 00:00:00 |
4 | 3 | Facebook is amazing! | 07/26/2022 00:00:00 |
5 | 2 | Enjoying a great meal! | 07/05/2022 00:00:00 |
share_id | post_id | share_date |
---|---|---|
1 | 1 | 06/09/2022 00:00:00 |
2 | 2 | 06/11/2022 00:00:00 |
3 | 1 | 06/19/2022 00:00:00 |
4 | 1 | 06/29/2022 00:00:00 |
5 | 3 | 07/27/2022 00:00:00 |
user_id | avg_shares_per_post |
---|---|
1 | 1.67 |
2 | 0.50 |
3 | 0.00 |
This query works by first finding the total number of shares for each post by grouping on in the table. Then, it joins this with on to get the user_id of the user who made each post. It finally averages the total number of shares for each user using the function. Null values are replaced with zero so that users who have no shares do not return null for .
In case this marketing problem was interesting, checkout how this blog on how marketing analysts use SQL.
Assume you have an events table on Facebook app analytics. Write a SQL query to calculate the click-through rate (CTR) for the app in 2022 and round the results to 2 decimal places.
Definition and note:
Before reading our solution, practice this Meta SQL question interactively:
Column Name | Type |
---|---|
app_id | integer |
event_type | string |
timestamp | datetime |
app_id | event_type | timestamp |
---|---|---|
123 | impression | 07/18/2022 11:36:12 |
123 | impression | 07/18/2022 11:37:12 |
123 | click | 07/18/2022 11:37:42 |
234 | impression | 07/18/2022 14:15:12 |
234 | click | 07/18/2022 14:16:12 |
app_id | ctr |
---|---|
123 | 50.00 |
234 | 100.00 |
We first find the number of clicks and impressions using the statement to assign a value of 1 for 'click' events and 0 for other events.
Then, to calculate the percentage of click-through rate (CTR) we divide the number of clicks by the number of impressions and multiplying by 100.0, rounded to 2 decimal places using the function.
This yields the following solution:
The key to acing a Facebook SQL interview is to practice, practice, and then practice some more! In addition to solving the earlier Facebook SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like FAANG (FB, Apple, Amazon, Netflix, Google).
Each exercise has hints to guide you, full answers and crucially, there is an online SQL coding environment so you can easily right in the browser your SQL query and have it graded.
To prep for the Facebook SQL interview you can also be useful to practice SQL questions from other tech companies like:
In case your SQL coding skills are weak, forget about jumping right into solving questions – improve your SQL foundations with this DataLemur SQL tutorial.
This tutorial covers SQL topics like aggregate window functions and math functions – both of which show up routinely during Facebook interviews.
For the Facebook Data Science Interview, in addition to SQL query questions, the other types of questions to prepare for are:
The best way to prepare for Facebook Data Science interviews is by reading Ace the Data Science Interview. The book's got:
Also check out these 6 ways Meta uses Data Science to boost their profits!