At Pinterest, SQL does the heavy lifting for analyzing user engagement trends and for managing their MySQL fleet. Because of this, Pinterest almost always asks SQL query questions during interviews for Data Analytics, Data Science, and Data Engineering jobs.
In case you're trying to prepare for the SQL Assessment, here’s 11 Pinterest SQL interview questions to practice, which are similar to recently asked questions at Pinterest – can you solve them?
Assume you're given a table containing information about user sessions, including the start and end times of each session. Write a query to retrieve the user session(s) that occur concurrently with the other user sessions.
Output the session ID and the number of concurrent user sessions, sorted in descending order.
Assumptions:
Column Name | Type |
---|---|
session_id | integer |
start_time | datetime |
end_time | datetime |
session_id | start_time | end_time |
---|---|---|
746382 | 01/02/2024 12:00:00 | 02/01/2024 16:48:00 |
143145 | 01/02/2024 14:25:00 | 02/01/2024 15:05:00 |
134514 | 01/02/2024 15:23:00 | 02/01/2024 18:15:00 |
242354 | 01/02/2024 21:34:00 | 03/01/2024 00:11:00 |
143256 | 01/06/2024 06:55:00 | 01/06/2024 09:05:00 |
session_id | concurrent_sessions |
---|---|
746382 | 2 |
143256 | 1 |
Session 746382 has 2 concurrent sessions: session_id 134514 and 242354.
These comparisons indicate that session 746382 overlaps with both sessions 134514 and 242354 since its end time is earlier than the end times of both concurrent sessions.
DataLemur Premium subscribers can try this problem interactively in the browser:
For a step-by-step solution, subscribe to DataLemur Premium to unlock this Pinterest SQL Question & Solution.
Write a SQL query to find the average number of pins a user pins, and the difference between the current average pin count and the previous month average pin count for specific users.
Given the table representing the pins users added to their boards.
pin_id | user_id | pin_date |
---|---|---|
7543 | 123 | 06/02/2024 |
9528 | 265 | 06/11/2024 |
7023 | 123 | 06/30/2024 |
9875 | 192 | 07/12/2024 |
3654 | 123 | 07/25/2024 |
7564 | 981 | 07/28/2024 |
mth | user_id | avg_monthly_pin | pin_change |
---|---|---|---|
6 | 123 | 2 | NULL |
6 | 265 | 1 | NULL |
7 | 123 | 1 | -1 |
7 | 192 | 1 | NULL |
7 | 981 | 1 | NULL |
This query calculates the total number of pins a users pins each month by using the clause. To calculate the difference in pins from the previous month we use the and to retrieve the count from the previous row (which is the previous month due to our order by clause), and subtract it from the current month's count.
To practice a similar window function interview problem which uses RANK() on DataLemur's free online SQL code editor, solve this Amazon SQL question asked in a BI Engineer interview:
Cross join and natural join are like two sides of a coin in the world of SQL.
Cross joins is like the wild and reckless cousin who creates a giant new table by combining every row from table A with every row from table B, no questions asked, no common key needed.
Natural joins are like the more refined and selective cousin who only combines rows from multiple tables if they have something in common (i.e., common columns/keys).
While cross join doesn't discriminate and will create a massive table if given the chance, natural join is more selective and only returns a table with the number of rows equal to the number of matching rows in the input tables. So, choose your JOIN wisely!
Pinterest users can create and share content in the form of personal "boards". Each board is categorized under a specific theme, such as "Home Decor" or "Travel".
Write a SQL query to find the top 3 most popular boards in the "Home Decor" category.
user_id | username |
---|---|
101 | 'jane_doe' |
102 | 'john_doe' |
103 | 'doe_rayme' |
board_id | user_id | category | followers_count |
---|---|---|---|
2001 | 101 | 'Home Decor' | 500 |
2002 | 102 | 'Travel' | 800 |
2003 | 101 | 'Cooking' | 300 |
2004 | 103 | 'Home Decor' | 1000 |
2005 | 102 | 'Home Decor' | 700 |
This SQL query joins and tables on , and filters out boards that are not in the 'Home Decor' category. It then orders the output by in descending order, and finally limits the output to just the top 3 board.
To find records in one table that aren't in another, you can use a and check for values in the right-side table.
Here's an example using two tables, Pinterest employees and Pinterest managers:
This query returns all rows from Pinterest employees where there is no matching row in managers based on the column.
You can also use the operator in PostgreSQL and Microsoft SQL Server to return the records that are in the first table but not in the second. Here is an example:
This will return all rows from employees that are not in managers. The operator works by returning the rows that are returned by the first query, but not by the second.
Note that isn't supported by all DBMS systems, like in MySQL and Oracle (but have no fear, since you can use the operator to achieve a similar result).
As a data analyst for Pinterest, we are interested in getting insight into our users' activity and interests. For this reason, we would like to write a SQL query which lists all users who have pinned at least 5 objects in "Cooking" category and have not pinned any objects in "Gardening" category within the past 30 days.
Assume we have a 'users' table and 'pins' table with the following fields and data:
user_id | username | account_created_date |
---|---|---|
001 | mary123 | 2021-01-21 |
002 | cookFan99 | 2019-05-15 |
003 | pinterestLover | 2018-09-01 |
004 | pinner | 2024-01-30 |
pin_id | user_id | category | pinned_date |
---|---|---|---|
1001 | 001 | Cooking | 2024-07-15 |
1002 | 001 | Cooking | 2024-07-16 |
1003 | 001 | Cooking | 2024-07-17 |
1004 | 002 | Gardening | 2024-07-18 |
1005 | 001 | Cooking | 2024-07-15 |
1006 | 001 | Cooking | 2024-07-16 |
1007 | 001 | Gardening | 2024-07-17 |
1008 | 003 | Cooking | 2024-07-16 |
This query first filters users who have pinned at least 5 objects in 'Cooking' category in the subquery named . The main query then checks if these users have pinned any object in 'Gardening' category within the last 30 days. If not, those users will be included in the final output.
A cross-join, also known as a cartesian join, is a JOIN that produces the cross-product of two tables. In a cross-join, each row from the first table is matched with every row from the second table, resulting in a new table with a row for each possible combination of rows from the two input tables.
Let's say you were building a Machine Learning model that attempts to score the probability of a customer purchasing a Pinterest product. Before working in Pandas and Tensorflow, you might want to do some Exploratory Data Analysis (EDA) in SQL, and generate all pairs of customers and Pinterest products.
Here's a cross-join query you could run:
Cross-joins are useful for generating all possible combinations, but they can also create huge tables if you're not careful. For instance, if you had 10,000 potential customers and Pinterest had 500 different product SKUs, the resulting cross-join would have 5 million rows!
For Pinterest, we could consider a situation where we want to find the average number of pins places on each board by a specific user.
Given a table representing pins created by users and a table representing boards created by users, find out the average number of pins per board for a specific user.
pin_id | user_id | board_id | pin_date |
---|---|---|---|
201 | 123 | 1 | 06/08/2024 00:00:00 |
231 | 123 | 1 | 06/10/2024 00:00:00 |
456 | 123 | 2 | 06/18/2024 00:00:00 |
789 | 123 | 2 | 07/26/2024 00:00:00 |
567 | 123 | 3 | 07/05/2024 00:00:00 |
board_id | user_id | board_name |
---|---|---|
1 | 123 | Travel |
2 | 123 | Recipes |
3 | 123 | Fashion |
4 | 789 | Travel |
5 | 789 | Recipes |
This SQL block joins the and tables where the user_id is 123, groups the data by user and board, and calculates the count of pins for each board. It then finds the average count of pins across all boards for the user. It will provide the average number of pins per board for the user with id 123.
You are a data analyst at Pinterest. The marketing department has asked you to analyze their digital ads. Specifically, they are interested in the click-through rates for each ad.
Here is what the data looks like:
click_id | user_id | ad_id | click_date |
---|---|---|---|
2675 | 145 | 1002 | 07/10/2024 |
3826 | 265 | 1356 | 07/15/2024 |
1749 | 362 | 1002 | 07/20/2024 |
7032 | 654 | 1356 | 07/25/2024 |
8917 | 295 | 1258 | 07/30/2024 |
view_id | user_id | ad_id | view_date |
---|---|---|---|
1937 | 145 | 1002 | 07/10/2024 |
5296 | 265 | 1356 | 07/15/2024 |
6123 | 362 | 1258 | 07/20/2024 |
9152 | 654 | 1002 | 07/25/2024 |
7134 | 295 | 1356 | 07/30/2024 |
Compute the Click Through Rate (CTR) for each ad for the month of July 2024. CTR is calculated by dividing the total number of clicks that an ad receives by the total number of views, then multiplying by 100.
Consider the following SQL query:
In this query, we join the table with the table on and to get the total number of clicks and views per ad. We filter out any clicks that did not occur in July 2024. Finally, we compute the CTR by dividing the count of clicks by the count of views and multiply by 100 for the percentage.
To learn more about Pinter
is used to select rows that match a range of values, whereas the operator checks for values in a specified list of values.
For example, say you were a Data Analyst at Pinterest and had a table of advertising campaign data.
To find campaigns with between 10k in spend, you could use BETWEEN`:
To find ad campaigns that were run on Facebook and Google's Display Network, you could use :
To practice a similar problem about calculating rates, solve this SQL interview question from TikTok on DataLemur's interactive coding environment:
Pinterest maintains a vast database of users, boards, and pin ideas. For this SQL question, we want to understand the user interactions in the context of our broad topics available on Pinterest.
Your task is to write a SQL query that filters the user interaction records for the boards that are related to "Food" and "Diy" topics. Specifically, find all user interactions where the board name starts with "Food" or "Diy."
interaction_id | user_id | board_id | board_name | interaction_date | interaction_type |
---|---|---|---|---|---|
111 | 001 | 101 | Food_For_Health | 12/01/2024 | Pin |
112 | 002 | 102 | DIY_Home_Decor | 12/04/2024 | Like |
113 | 003 | 103 | Exercise_Regimens | 12/08/2024 | Pin |
114 | 004 | 104 | DIY_Gardening | 12/12/2024 | Comment |
115 | 005 | 105 | Food_Cake_Recipes | 12/15/2024 | Pin |
This SQL query filters the interactions and returns all records where the column starts with either "Food" or "Diy". The '%' in the LIKE statement acts as a wildcard and matches any sequence of characters that follow the specified string.
The best way to prepare for a Pinterest SQL interview is to practice, practice, practice. In addition to solving the above Pinterest SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like Netflix, Google, and Amazon.
Each exercise has hints to guide you, fully explained answers along with a discussion board to see how others solved it and crucially, there's an online SQL coding environment so you can right online code up your SQL query and have it checked.
To prep for the Pinterest SQL interview you can also solve SQL problems from other tech companies like:
But if your SQL foundations are weak, forget about jumping right into solving questions – improve your SQL foundations with this free SQL for Data Analytics course.
This tutorial covers SQL concepts such as aggregate functions and Union vs. UNION ALL – both of which show up often during Pinterest interviews.
For the Pinterest Data Science Interview, beyond writing SQL queries, the other types of questions which are covered:
The best way to prepare for Pinterest Data Science interviews is by reading Ace the Data Science Interview. The book's got: