Data Analysts & Data Scientists at New York Times uses SQL to extract insights from readership data to understand consumer behavior, such as which articles are most popular, how users interact with the website, and what topics are trending. They also use it to manage the digital ads based on readers' preferences and engagement, which is why, New York Times asks prospective hires SQL interview questions.
To help you study for the New York Times SQL interview, here's 11 NYT SQL interview questions in this blog.
This is the same question as problem #3 in the SQL Chapter of Ace the Data Science Interview!
Assume you're given the table on user viewership categorised by device type where the three types are laptop, tablet, and phone.
Write a query that calculates the total viewership for laptops and mobile devices where mobile is defined as the sum of tablet and phone viewership. Output the total viewership for laptops as and the total viewership for mobile devices as .
Column Name | Type |
---|---|
user_id | integer |
device_type | string ('laptop', 'tablet', 'phone') |
view_time | timestamp |
user_id | device_type | view_time |
---|---|---|
123 | tablet | 01/02/2022 00:00:00 |
125 | laptop | 01/07/2022 00:00:00 |
128 | laptop | 02/09/2022 00:00:00 |
129 | phone | 02/09/2022 00:00:00 |
145 | tablet | 02/24/2022 00:00:00 |
laptop_views | mobile_views |
---|---|
2 | 3 |
Practice this NYT SQL Interview Question on our FREE interactive coding platform!
Given a table of New York Times employee salary data, write a SQL query to find the top 3 highest earning employees in each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
Code your solution to this question directly within the browser on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
f the solution above is confusing, you can find a step-by-step solution here: Top 3 Department Salaries.
Normalization is the process of dividing a larger table into smaller tables, to eliminate redundancy and dependency. Although there are 5 levels of normalization (normal forms), the 3 most important ones that you should know for the New York Times SQL interview are:
The New York Times wants to analyze the monthly activity of their online articles. They would like to understand how many unique readers they have for each article every month. You are given a table which tracks each time a user opens an article. Every row in the dataset means that a user has read an article. Duplicate entries are possible and indicate that a user read the article multiple times. Write a SQL query to return the month, the article_id and the number of unique readers for that article for that month.
Consider below is the dataset:
read_id | user_id | read_date | article_id |
---|---|---|---|
1001 | 123 | 01/15/2022 | 101 |
1002 | 234 | 01/19/2022 | 101 |
1003 | 123 | 01/28/2022 | 101 |
1004 | 123 | 01/28/2022 | 102 |
1005 | 345 | 02/01/2022 | 103 |
1006 | 456 | 02/02/2022 | 104 |
1007 | 345 | 02/05/2022 | 105 |
1008 | 345 | 02/07/2022 | 103 |
month | article | unique_users |
---|---|---|
1 | 101 | 2 |
1 | 102 | 1 |
2 | 103 | 1 |
2 | 104 | 1 |
2 | 105 | 1 |
In PostgreSQL, you can utilize the clause to get the month from a date. Hence, your SQL query might look like:
This query counts the unique users (denoted by unique ) separated by and . The clause allows for unique counts within each grouping.
To solve another window function question on DataLemur's free online SQL code editor, solve this Google SQL Interview Question:
A database index is a way to optimize the performance of a database by reducing the amount of data that needs to be searched to retrieve a record.
There are several types of indexes:
You've been hired as a database designer for the New York Times. Your task is to model a database to keep track of magazine subscriptions, articles, and authors. Each author can write many articles, and multiple authors can collaborate on a single article. An article can belong to multiple magazine issues. A subscription can access multiple magazine issues.
Design a database schema and construct a PostgreSQL query to find all the authors who wrote more than three articles in any magazine issue a subscriber has access to. Assume all articles are in the English language.
subscription_id | subscriber_name | magazine_id |
---|---|---|
4 | John Doe | 2 |
7 | Jane Doe | 1 |
8 | Jack Doe | 3 |
issue_id | magazine_id | publication_date |
---|---|---|
100 | 1 | 2021-07-10 |
101 | 2 | 2021-08-10 |
102 | 3 | 2021-09-10 |
article_id | issue_id | author_id |
---|---|---|
500 | 100 | 45 |
501 | 101 | 45 |
502 | 101 | 46 |
503 | 100 | 46 |
504 | 102 | 45 |
505 | 102 | 47 |
author_id | author_name |
---|---|
45 | Author1 |
46 | Author2 |
47 | Author3 |
This SQL query begins by joining the subscriptions, magazine issues, articles, and authors tables. The join is based on the common columns between these tables, constructing a mega table which includes subscription information, articles, magazine issues, and authors all at once. Afterward, it groups the data based on subscription and author, and calculates the count of distinct articles for each author. Finally, the HAVING clause filters out authors who have written more than 3 articles for any magazine issues that a subscriber has access to.
Using a join in SQL, you can retrieve data from multiple tables and merge the results into a single table.
In SQL, there are four distinct types of JOINs. To demonstrate each kind, Imagine you had two database tables: an table that contains data on Google Ads keywords and their bid amounts, and a table with information on product sales and the Google Ads keywords that drove those sales.
: An INNER JOIN retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the table and the table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the table matches the keyword in the table.
: A LEFT JOIN retrieves all rows from the left table (in this case, the table) and any matching rows from the right table (the Sales table). If there is no match in the right table, values will be returned for the right table's columns.
: A RIGHT JOIN retrieves all rows from the right table (in this case, the Sales table) and any matching rows from the left table (the table). If there is no match in the left table, values will be returned for the left table's columns.
: A FULL OUTER JOIN retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, values will be returned for the columns of the non-matching table.
As a data analyst at the New York Times, you have been asked to determine the average number of shares per article for the last month. The company wants to understand the reach of their articles on social platforms. Assume we have a table with the columns , , and , and another table with the columns , , , and .
Here's a sample of the and tables:
article_id | publish_date | title |
---|---|---|
1 | 2023-01-01 | New Year's Festivities Around the World |
2 | 2023-01-01 | Economic Outlook for 2023 |
3 | 2023-01-02 | The Resurgence of Physical Books |
share_id | article_id | share_date | social_platform |
---|---|---|---|
101 | 1 | 2023-01-01 | |
102 | 1 | 2023-01-01 | |
103 | 1 | 2023-01-02 | |
104 | 2 | 2023-01-01 | |
105 | 2 | 2023-01-03 | |
106 | 3 | 2023-01-02 |
You can determine the average number of shares per article for the last month with the following PostgreSQL query:
This query first selects the and from the table, and computes the average number of share IDs () from the table. The operation is used to merge the and tables based on matching s. The clause restricts the data to articles published in the last month. The clause allows calculating the average number of shares per article.
To practice a very similar question try this interactive New York Times Laptop vs. Mobile Viewership Question which is similar for dealing with article analytics or this Facebook Average Post Hiatus (Part 1) Question which is similar for querying date-based statistics.
Assuming that New York Times (NYT) wants to calculate the click-through-rate (CTR) for its articles. Each time an article is served on the homepage, it is counted as an impression. If a user clicks on the article to read it, it is counted as a click.
Calculate the CTR as the total number of clicks on an article divided by the total number of impressions, for articles served in the top slot on the homepage, on an hourly basis.
impression_id | article_id | impression_time |
---|---|---|
1 | 101 | 2022-07-01 08:00:00 |
2 | 101 | 2022-07-01 08:15:00 |
3 | 102 | 2022-07-01 08:30:00 |
4 | 102 | 2022-07-01 09:00:00 |
5 | 101 | 2022-07-01 09:15:00 |
click_id | article_id | click_time |
---|---|---|
1 | 101 | 2022-07-01 08:05:00 |
2 | 101 | 2022-07-01 08:20:00 |
3 | 102 | 2022-07-01 08:35:00 |
4 | 102 | 2022-07-01 09:05:00 |
5 | 101 | 2022-07-01 09:20:00 |
Here, we use a LEFT JOIN to combine with , because we want to keep all impressions (the served articles) even if there are no corresponding clicks. We use date_trunc('hour') to round down the impression_time and click_time to an hour, so that clicks and impressions within the same hour are counted together. For each combination of hour and article, we calculate the total number of clicks and impressions, and then calculate the CTR by dividing total clicks by total impressions. Finally, we order the results by hour and ctr in descending order.
To solve a similar problem about calculating rates, try this SQL interview question from TikTok within DataLemur's online SQL coding environment:
One way to find duplicate records in a table is by using , and then seeing which groups have more than one occurence:
Another way is by using the operator:
You are given two tables, and . The table records different users' subscription status at the New York Times. The table details the various subscription plans the company offers.
Your task is to write a SQL query to find the total revenue generated by each subscription plan per year.
Here is a sample representation of the and tables:
user_id | subscription_id | start_date | end_date |
---|---|---|---|
101 | 301 | 2020-01-01 | 2021-01-01 |
102 | 301 | 2020-05-15 | 2021-05-15 |
103 | 302 | 2020-07-01 | 2021-07-01 |
104 | 302 | 2020-08-15 | 2021-08-15 |
105 | 303 | 2020-10-01 | 2021-10-01 |
subscription_id | subscription_plan | price_per_month |
---|---|---|
301 | Basic | 15 |
302 | Premium | 25 |
303 | Deluxe | 35 |
Here's a SQL query to solve this question:
The above SQL query is joining the and tables using the field. We're grouping by the subscription plan and year, and calculating the total revenue by multiplying the number of subscriptions with the monthly price of each plan and the number of months in a year (assuming each subscription lasts exactly a year). The query is filtered to only include records where the is greater than the , to avoid including cancelled subscriptions.
Since joins come up frequently during SQL interviews, try an interactive Spotify JOIN SQL question:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the earlier New York Times SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Netflix, Airbnb, and Amazon.
Each interview question has multiple hints, step-by-step solutions and best of all, there's an online SQL code editor so you can easily right in the browser your SQL query answer and have it checked.
To prep for the New York Times SQL interview you can also be useful to solve SQL problems from other media companies like:
Stay ahead of the curve with The New York Times' in-depth coverage of Artificial Intelligence trends and breakthroughs!
In case your SQL skills are weak, forget about diving straight into solving questions – refresh your SQL knowledge with this free SQL tutorial.
This tutorial covers things like filtering with LIKE and sorting data with ORDER BY – both of which come up frequently in SQL interviews at New York Times.
In addition to SQL query questions, the other question categories covered in the New York Times Data Science Interview are:
To prepare for New York Times Data Science interviews read the book Ace the Data Science Interview because it's got:
Don't ignore the behavioral interview – prep for that using this list of behavioral interview questions for Data Scientists.