At Pinterest, SQL does the heavy lifting for analyzing user engagement trends and for managing their MySQL fleet. Because of this, Pinterest almost always asks SQL query questions during interviews for Data Analytics, Data Science, and Data Engineering jobs.
In case you're trying to prepare for the SQL Assessment, here’s 11 Pinterest SQL interview questions to practice, which are similar to recently asked questions at Pinterest – can you solve them?
Assume you're given a table containing information about user sessions, including the start and end times of each session. Write a query to retrieve the user session(s) that occur concurrently with the other user sessions.
Output the session ID and the number of concurrent user sessions, sorted in descending order.
Assumptions:
sessions
Table:Column Name | Type |
---|---|
session_id | integer |
start_time | datetime |
end_time | datetime |
sessions
Example Input:session_id | start_time | end_time |
---|---|---|
746382 | 01/02/2024 12:00:00 | 02/01/2024 16:48:00 |
143145 | 01/02/2024 14:25:00 | 02/01/2024 15:05:00 |
134514 | 01/02/2024 15:23:00 | 02/01/2024 18:15:00 |
242354 | 01/02/2024 21:34:00 | 03/01/2024 00:11:00 |
143256 | 01/06/2024 06:55:00 | 01/06/2024 09:05:00 |
session_id | concurrent_sessions |
---|---|
746382 | 2 |
143256 | 1 |
Session 746382 has 2 concurrent sessions: session_id 134514 and 242354.
These comparisons indicate that session 746382 overlaps with both sessions 134514 and 242354 since its end time is earlier than the end times of both concurrent sessions.
DataLemur Premium subscribers can try this problem interactively in the browser:
SELECT sessions_1.session_id, COUNT(sessions_2.session_id) AS concurrent_sessions FROM sessions AS sessions_1 INNER JOIN sessions AS sessions_2 ON sessions_1.session_id != sessions_2.session_id AND (sessions_2.start_time BETWEEN sessions_1.start_time AND sessions_1.end_time OR sessions_1.start_time BETWEEN sessions_2.start_time AND sessions_2.end_time) GROUP BY sessions_1.session_id ORDER BY concurrent_sessions DESC;
For a step-by-step solution, subscribe to DataLemur Premium to unlock this Pinterest SQL Question & Solution.
Write a SQL query to find the average number of pins a user pins, and the difference between the current average pin count and the previous month average pin count for specific users.
Given the pins
table representing the pins users added to their boards.
pins
Example Input:pin_id | user_id | pin_date |
---|---|---|
7543 | 123 | 06/02/2024 |
9528 | 265 | 06/11/2024 |
7023 | 123 | 06/30/2024 |
9875 | 192 | 07/12/2024 |
3654 | 123 | 07/25/2024 |
7564 | 981 | 07/28/2024 |
mth | user_id | avg_monthly_pin | pin_change |
---|---|---|---|
6 | 123 | 2 | NULL |
6 | 265 | 1 | NULL |
7 | 123 | 1 | -1 |
7 | 192 | 1 | NULL |
7 | 981 | 1 | NULL |
SELECT EXTRACT(MONTH FROM pin_date) as month, user_id, COUNT(*) OVER(PARTITION BY user_id, EXTRACT(MONTH FROM pin_date)) as avg_monthly_pin, COUNT(*) OVER(PARTITION BY user_id, EXTRACT(MONTH FROM pin_date)) - LAG(COUNT(*)) OVER(PARTITION BY user_id order by EXTRACT(MONTH FROM pin_date)) as pin_change FROM pins
This query calculates the total number of pins a users pins each month by using the COUNT OVER PARTITION BY
clause. To calculate the difference in pins from the previous month we use the LAG
and count(*) OVER PARTITION BY user_id order by EXTRACT(MONTH FROM pin_date)
to retrieve the count from the previous row (which is the previous month due to our order by clause), and subtract it from the current month's count.
To practice a similar window function interview problem which uses RANK() on DataLemur's free online SQL code editor, solve this Amazon SQL question asked in a BI Engineer interview:
Cross join and natural join are like two sides of a coin in the world of SQL.
Cross joins is like the wild and reckless cousin who creates a giant new table by combining every row from table A with every row from table B, no questions asked, no common key needed.
Natural joins are like the more refined and selective cousin who only combines rows from multiple tables if they have something in common (i.e., common columns/keys).
While cross join doesn't discriminate and will create a massive table if given the chance, natural join is more selective and only returns a table with the number of rows equal to the number of matching rows in the input tables. So, choose your JOIN wisely!
Pinterest users can create and share content in the form of personal "boards". Each board is categorized under a specific theme, such as "Home Decor" or "Travel".
Write a SQL query to find the top 3 most popular boards in the "Home Decor" category.
Users
Example Input:user_id | username |
---|---|
101 | 'jane_doe' |
102 | 'john_doe' |
103 | 'doe_rayme' |
Boards
Example Input:board_id | user_id | category | followers_count |
---|---|---|---|
2001 | 101 | 'Home Decor' | 500 |
2002 | 102 | 'Travel' | 800 |
2003 | 101 | 'Cooking' | 300 |
2004 | 103 | 'Home Decor' | 1000 |
2005 | 102 | 'Home Decor' | 700 |
SELECT u.username, b.board_id, b.followers_count FROM Boards b INNER JOIN Users u ON b.user_id = u.user_id WHERE b.category = 'Home Decor' ORDER BY b.followers_count DESC LIMIT 3;
This SQL query joins Users
and Boards
tables on user_id
, and filters out boards that are not in the 'Home Decor' category. It then orders the output by followers_count
in descending order, and finally limits the output to just the top 3 board.
To find records in one table that aren't in another, you can use a LEFT JOIN
and check for NULL
values in the right-side table.
Here's an example using two tables, Pinterest employees and Pinterest managers:
SELECT * FROM pinterest_employees LEFT JOIN pinterest_managers ON pinterest_employees.id = pinterest_managers.id WHERE pinterest_managers.id IS NULL;
This query returns all rows from Pinterest employees where there is no matching row in managers based on the id
column.
You can also use the EXCEPT
operator in PostgreSQL and Microsoft SQL Server to return the records that are in the first table but not in the second. Here is an example:
SELECT * FROM pinterest_employees EXCEPT SELECT * FROM pinterest_managers
This will return all rows from employees that are not in managers. The EXCEPT
operator works by returning the rows that are returned by the first query, but not by the second.
Note that EXCEPT
isn't supported by all DBMS systems, like in MySQL and Oracle (but have no fear, since you can use the MINUS
operator to achieve a similar result).
As a data analyst for Pinterest, we are interested in getting insight into our users' activity and interests. For this reason, we would like to write a SQL query which lists all users who have pinned at least 5 objects in "Cooking" category and have not pinned any objects in "Gardening" category within the past 30 days.
Assume we have a 'users' table and 'pins' table with the following fields and data:
users
Example Input:user_id | username | account_created_date |
---|---|---|
001 | mary123 | 2021-01-21 |
002 | cookFan99 | 2019-05-15 |
003 | pinterestLover | 2018-09-01 |
004 | pinner | 2024-01-30 |
pins
Example Input:pin_id | user_id | category | pinned_date |
---|---|---|---|
1001 | 001 | Cooking | 2024-07-15 |
1002 | 001 | Cooking | 2024-07-16 |
1003 | 001 | Cooking | 2024-07-17 |
1004 | 002 | Gardening | 2024-07-18 |
1005 | 001 | Cooking | 2024-07-15 |
1006 | 001 | Cooking | 2024-07-16 |
1007 | 001 | Gardening | 2024-07-17 |
1008 | 003 | Cooking | 2024-07-16 |
SELECT u.user_id, u.username FROM users AS u JOIN ( SELECT p.user_id FROM pins AS p WHERE p.category = 'Cooking' AND p.pinned_date > DATEADD(day, -30, GETDATE()) GROUP BY p.user_id HAVING COUNT(p.pin_id) >= 5 ) AS cooking_users ON u.user_id = cooking_users.user_id WHERE NOT EXISTS ( SELECT 1 FROM pins AS p2 WHERE p2.user_id = u.user_id AND p2.category = 'Gardening' AND p2.pinned_date > DATEADD(day, -30, GETDATE()) );
This query first filters users who have pinned at least 5 objects in 'Cooking' category in the subquery named cooking_users
. The main query then checks if these users have pinned any object in 'Gardening' category within the last 30 days. If not, those users will be included in the final output.
A cross-join, also known as a cartesian join, is a JOIN that produces the cross-product of two tables. In a cross-join, each row from the first table is matched with every row from the second table, resulting in a new table with a row for each possible combination of rows from the two input tables.
Let's say you were building a Machine Learning model that attempts to score the probability of a customer purchasing a Pinterest product. Before working in Pandas and Tensorflow, you might want to do some Exploratory Data Analysis (EDA) in SQL, and generate all pairs of customers and Pinterest products.
Here's a cross-join query you could run:
SELECT customers.id AS customer_id, pinterest_products.id AS product_id FROM customers CROSS JOIN pinterest_products;
Cross-joins are useful for generating all possible combinations, but they can also create huge tables if you're not careful. For instance, if you had 10,000 potential customers and Pinterest had 500 different product SKUs, the resulting cross-join would have 5 million rows!
For Pinterest, we could consider a situation where we want to find the average number of pins places on each board by a specific user.
Given a pins
table representing pins created by users and a boards
table representing boards created by users, find out the average number of pins per board for a specific user.
pins
Example Input:pin_id | user_id | board_id | pin_date |
---|---|---|---|
201 | 123 | 1 | 06/08/2024 00:00:00 |
231 | 123 | 1 | 06/10/2024 00:00:00 |
456 | 123 | 2 | 06/18/2024 00:00:00 |
789 | 123 | 2 | 07/26/2024 00:00:00 |
567 | 123 | 3 | 07/05/2024 00:00:00 |
boards
Example Input:board_id | user_id | board_name |
---|---|---|
1 | 123 | Travel |
2 | 123 | Recipes |
3 | 123 | Fashion |
4 | 789 | Travel |
5 | 789 | Recipes |
SELECT user_id, AVG(pins_count) AS avg_pins FROM ( SELECT pins.user_id, boards.board_id, COUNT(pins.pin_id) AS pins_count FROM boards INNER JOIN pins ON boards.board_id = pins.board_id WHERE pins.user_id = 123 GROUP BY user_id, board_id ) AS user_boards GROUP BY user_id;
This SQL block joins the pins
and boards
tables where the user_id is 123, groups the data by user and board, and calculates the count of pins for each board. It then finds the average count of pins across all boards for the user. It will provide the average number of pins per board for the user with id 123.
You are a data analyst at Pinterest. The marketing department has asked you to analyze their digital ads. Specifically, they are interested in the click-through rates for each ad.
Here is what the data looks like:
ad_clicks
Table:click_id | user_id | ad_id | click_date |
---|---|---|---|
2675 | 145 | 1002 | 07/10/2024 |
3826 | 265 | 1356 | 07/15/2024 |
1749 | 362 | 1002 | 07/20/2024 |
7032 | 654 | 1356 | 07/25/2024 |
8917 | 295 | 1258 | 07/30/2024 |
ad_views
Table:view_id | user_id | ad_id | view_date |
---|---|---|---|
1937 | 145 | 1002 | 07/10/2024 |
5296 | 265 | 1356 | 07/15/2024 |
6123 | 362 | 1258 | 07/20/2024 |
9152 | 654 | 1002 | 07/25/2024 |
7134 | 295 | 1356 | 07/30/2024 |
Compute the Click Through Rate (CTR) for each ad for the month of July 2024. CTR is calculated by dividing the total number of clicks that an ad receives by the total number of views, then multiplying by 100.
Consider the following SQL query:
SELECT a.ad_id, (COUNT(c.click_id)/ COUNT(v.view_id))*100 as CTR FROM ad_clicks c JOIN ad_views v ON c.user_id = v.user_id AND c.ad_id = v.ad_id JOIN ads a ON a.ad_id = c.ad_id WHERE EXTRACT(MONTH FROM click_date) = 7 AND EXTRACT(YEAR FROM click_date) = 2024 GROUP BY a.ad_id
In this query, we join the ad_clicks
table with the ad_views
table on user_id
and ad_id
to get the total number of clicks and views per ad. We filter out any clicks that did not occur in July 2024. Finally, we compute the CTR by dividing the count of clicks by the count of views and multiply by 100 for the percentage.
To learn more about Pinter
BETWEEN
is used to select rows that match a range of values, whereas the IN
operator checks for values in a specified list of values.
For example, say you were a Data Analyst at Pinterest and had a table of advertising campaign data.
To find campaigns with between 10k in spend, you could use BETWEEN`:
SELECT * FROM pinterest_ad_campaigns WHERE spend BETWEEN 500 AND 10000;
To find ad campaigns that were run on Facebook and Google's Display Network, you could use IN
:
SELECT * FROM pinterest_ad_campaigns WHERE ad_platform IN ("fb", "google_display");
To practice a similar problem about calculating rates, solve this SQL interview question from TikTok on DataLemur's interactive coding environment:
Pinterest maintains a vast database of users, boards, and pin ideas. For this SQL question, we want to understand the user interactions in the context of our broad topics available on Pinterest.
Your task is to write a SQL query that filters the user interaction records for the boards that are related to "Food" and "Diy" topics. Specifically, find all user interactions where the board name starts with "Food" or "Diy."
user_interactions
Example Input:interaction_id | user_id | board_id | board_name | interaction_date | interaction_type |
---|---|---|---|---|---|
111 | 001 | 101 | Food_For_Health | 12/01/2024 | Pin |
112 | 002 | 102 | DIY_Home_Decor | 12/04/2024 | Like |
113 | 003 | 103 | Exercise_Regimens | 12/08/2024 | Pin |
114 | 004 | 104 | DIY_Gardening | 12/12/2024 | Comment |
115 | 005 | 105 | Food_Cake_Recipes | 12/15/2024 | Pin |
SELECT * FROM user_interactions WHERE board_name LIKE 'Food%' OR board_name LIKE 'Diy%';
This SQL query filters the interactions and returns all records where the board_name
column starts with either "Food" or "Diy". The '%' in the LIKE statement acts as a wildcard and matches any sequence of characters that follow the specified string.
The best way to prepare for a Pinterest SQL interview is to practice, practice, practice. In addition to solving the above Pinterest SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like Netflix, Google, and Amazon.
Each exercise has hints to guide you, fully explained answers along with a discussion board to see how others solved it and crucially, there's an online SQL coding environment so you can right online code up your SQL query and have it checked.
To prep for the Pinterest SQL interview you can also solve SQL problems from other tech companies like:
But if your SQL foundations are weak, forget about jumping right into solving questions – improve your SQL foundations with this free SQL for Data Analytics course.
This tutorial covers SQL concepts such as aggregate functions and Union vs. UNION ALL – both of which show up often during Pinterest interviews.
For the Pinterest Data Science Interview, beyond writing SQL queries, the other types of questions which are covered:
The best way to prepare for Pinterest Data Science interviews is by reading Ace the Data Science Interview. The book's got: