logo

11 New York Times SQL Interview Questions (Updated 2024)

Updated on

August 11, 2024

Data Analysts & Data Scientists at New York Times uses SQL to extract insights from readership data to understand consumer behavior, such as which articles are most popular, how users interact with the website, and what topics are trending. They also use it to manage the digital ads based on readers' preferences and engagement, which is why, New York Times asks prospective hires SQL interview questions.

To help you study for the New York Times SQL interview, here's 11 NYT SQL interview questions in this blog.

New Yorkt Times SQL Interview Questions

11 NYT SQL Interview Questions

SQL Question 1: Laptop vs. Mobile Viewership

This is the same question as problem #3 in the SQL Chapter of Ace the Data Science Interview!

Assume you're given the table on user viewership categorised by device type where the three types are laptop, tablet, and phone.

Write a query that calculates the total viewership for laptops and mobile devices where mobile is defined as the sum of tablet and phone viewership. Output the total viewership for laptops as and the total viewership for mobile devices as .

Table

Column NameType
user_idinteger
device_typestring ('laptop', 'tablet', 'phone')
view_timetimestamp

Example Input

user_iddevice_typeview_time
123tablet01/02/2022 00:00:00
125laptop01/07/2022 00:00:00
128laptop02/09/2022 00:00:00
129phone02/09/2022 00:00:00
145tablet02/24/2022 00:00:00

Example Output

laptop_viewsmobile_views
23

Answer:


Practice this NYT SQL Interview Question on our FREE interactive coding platform!

New York Times SQL Interview Question

SQL Question 2: Top Department Salaries

Given a table of New York Times employee salary data, write a SQL query to find the top 3 highest earning employees in each department.

New York Times Example Input:

employee_idnamesalarydepartment_id
1Emma Thompson38001
2Daniel Rodriguez22301
3Olivia Smith20001
4Noah Johnson68002
5Sophia Martinez17501
8William Davis68002
10James Anderson40001

Example Input:

department_iddepartment_name
1Data Analytics
2Data Science

Example Output:

department_namenamesalary
Data AnalyticsJames Anderson4000
Data AnalyticsEmma Thompson3800
Data AnalyticsDaniel Rodriguez2230
Data ScienceNoah Johnson6800
Data ScienceWilliam Davis6800

Code your solution to this question directly within the browser on DataLemur:

Top 3 Department Salaries

Answer:

We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.


f the solution above is confusing, you can find a step-by-step solution here: Top 3 Department Salaries.

SQL Question 3: What are the three different normal forms?

Normalization is the process of dividing a larger table into smaller tables, to eliminate redundancy and dependency. Although there are 5 levels of normalization (normal forms), the 3 most important ones that you should know for the New York Times SQL interview are:

  1. First Normal Form (1NF): Remove a table's duplicate columns, and make sure each value in the column is a singular value (no containers or lists of data). Each row of table should have a unique identifier as well.
  2. Second Normal Form (2NF): A table is in 2NF if it meets all requirements of the 1NF the non-key columns are dependent only on the primary key. You do this by separating subsets of columns subsets, and associating the tables by using primary/foreign keys.
  3. Third Normal Form (3NF): The table should be in 2NF and there shouldn't be any dependency on any non-key attributes (meaning a primary key should be the only thing needed to identify a row).

NYT SQL Interview Questions

SQL Question 4: Analyzing Article Reads by Month

The New York Times wants to analyze the monthly activity of their online articles. They would like to understand how many unique readers they have for each article every month. You are given a table which tracks each time a user opens an article. Every row in the dataset means that a user has read an article. Duplicate entries are possible and indicate that a user read the article multiple times. Write a SQL query to return the month, the article_id and the number of unique readers for that article for that month.

Consider below is the dataset:

Example Input:
read_iduser_idread_datearticle_id
100112301/15/2022101
100223401/19/2022101
100312301/28/2022101
100412301/28/2022102
100534502/01/2022103
100645602/02/2022104
100734502/05/2022105
100834502/07/2022103
Expected Output:
montharticleunique_users
11012
11021
21031
21041
21051

Answer:

In PostgreSQL, you can utilize the clause to get the month from a date. Hence, your SQL query might look like:


This query counts the unique users (denoted by unique ) separated by and . The clause allows for unique counts within each grouping.

To solve another window function question on DataLemur's free online SQL code editor, solve this Google SQL Interview Question:

Google SQL Interview Question

SQL Question 5: What is a database index, and what are the different types of indexes?

A database index is a way to optimize the performance of a database by reducing the amount of data that needs to be searched to retrieve a record.

There are several types of indexes:

  • unique & non-inuqie indexes
  • primary & composite indexes
  • clustered & non-clustered indexes

SQL Question 6: Design and Query a Database for New York Times Subscriptions and Articles

You've been hired as a database designer for the New York Times. Your task is to model a database to keep track of magazine subscriptions, articles, and authors. Each author can write many articles, and multiple authors can collaborate on a single article. An article can belong to multiple magazine issues. A subscription can access multiple magazine issues.

Design a database schema and construct a PostgreSQL query to find all the authors who wrote more than three articles in any magazine issue a subscriber has access to. Assume all articles are in the English language.

Example Input:
subscription_idsubscriber_namemagazine_id
4John Doe2
7Jane Doe1
8Jack Doe3
Example Input:
issue_idmagazine_idpublication_date
10012021-07-10
10122021-08-10
10232021-09-10
Example Input:
article_idissue_idauthor_id
50010045
50110145
50210146
50310046
50410245
50510247
Example Input:
author_idauthor_name
45Author1
46Author2
47Author3

Answer:


This SQL query begins by joining the subscriptions, magazine issues, articles, and authors tables. The join is based on the common columns between these tables, constructing a mega table which includes subscription information, articles, magazine issues, and authors all at once. Afterward, it groups the data based on subscription and author, and calculates the count of distinct articles for each author. Finally, the HAVING clause filters out authors who have written more than 3 articles for any magazine issues that a subscriber has access to.

SQL Question 7: What are the various types of joins used in SQL?

Using a join in SQL, you can retrieve data from multiple tables and merge the results into a single table.

In SQL, there are four distinct types of JOINs. To demonstrate each kind, Imagine you had two database tables: an table that contains data on Google Ads keywords and their bid amounts, and a table with information on product sales and the Google Ads keywords that drove those sales.

  • : An INNER JOIN retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the table and the table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the table matches the keyword in the table.

  • : A LEFT JOIN retrieves all rows from the left table (in this case, the table) and any matching rows from the right table (the Sales table). If there is no match in the right table, values will be returned for the right table's columns.

  • : A RIGHT JOIN retrieves all rows from the right table (in this case, the Sales table) and any matching rows from the left table (the table). If there is no match in the left table, values will be returned for the left table's columns.

  • : A FULL OUTER JOIN retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, values will be returned for the columns of the non-matching table.

SQL Question 8: Average Number of Shares per Article

As a data analyst at the New York Times, you have been asked to determine the average number of shares per article for the last month. The company wants to understand the reach of their articles on social platforms. Assume we have a table with the columns , , and , and another table with the columns , , , and .

Here's a sample of the and tables:

:
article_idpublish_datetitle
12023-01-01New Year's Festivities Around the World
22023-01-01Economic Outlook for 2023
32023-01-02The Resurgence of Physical Books
:
share_idarticle_idshare_datesocial_platform
10112023-01-01Twitter
10212023-01-01Facebook
10312023-01-02Instagram
10422023-01-01LinkedIn
10522023-01-03Twitter
10632023-01-02Facebook

Answer:

You can determine the average number of shares per article for the last month with the following PostgreSQL query:


This query first selects the and from the table, and computes the average number of share IDs () from the table. The operation is used to merge the and tables based on matching s. The clause restricts the data to articles published in the last month. The clause allows calculating the average number of shares per article.

To practice a very similar question try this interactive New York Times Laptop vs. Mobile Viewership Question which is similar for dealing with article analytics or this Facebook Average Post Hiatus (Part 1) Question which is similar for querying date-based statistics.

SQL Question 9: Click-through-rate for NY Times Articles

Assuming that New York Times (NYT) wants to calculate the click-through-rate (CTR) for its articles. Each time an article is served on the homepage, it is counted as an impression. If a user clicks on the article to read it, it is counted as a click.

Calculate the CTR as the total number of clicks on an article divided by the total number of impressions, for articles served in the top slot on the homepage, on an hourly basis.

Example Input:
impression_idarticle_idimpression_time
11012022-07-01 08:00:00
21012022-07-01 08:15:00
31022022-07-01 08:30:00
41022022-07-01 09:00:00
51012022-07-01 09:15:00
Example Input:
click_idarticle_idclick_time
11012022-07-01 08:05:00
21012022-07-01 08:20:00
31022022-07-01 08:35:00
41022022-07-01 09:05:00
51012022-07-01 09:20:00

Answer:


Here, we use a LEFT JOIN to combine with , because we want to keep all impressions (the served articles) even if there are no corresponding clicks. We use date_trunc('hour') to round down the impression_time and click_time to an hour, so that clicks and impressions within the same hour are counted together. For each combination of hour and article, we calculate the total number of clicks and impressions, and then calculate the CTR by dividing total clicks by total impressions. Finally, we order the results by hour and ctr in descending order.

To solve a similar problem about calculating rates, try this SQL interview question from TikTok within DataLemur's online SQL coding environment:

TikTok SQL Interview Question

SQL Question 10: How do you identify duplicated data in a table?

One way to find duplicate records in a table is by using , and then seeing which groups have more than one occurence:


Another way is by using the operator:


SQL Question 11: Joining and Analyzing User Subscription Data

You are given two tables, and . The table records different users' subscription status at the New York Times. The table details the various subscription plans the company offers.

Your task is to write a SQL query to find the total revenue generated by each subscription plan per year.

Here is a sample representation of the and tables:

Example Input:
user_idsubscription_idstart_dateend_date
1013012020-01-012021-01-01
1023012020-05-152021-05-15
1033022020-07-012021-07-01
1043022020-08-152021-08-15
1053032020-10-012021-10-01
Example Input:
subscription_idsubscription_planprice_per_month
301Basic15
302Premium25
303Deluxe35

Answer:

Here's a SQL query to solve this question:


The above SQL query is joining the and tables using the field. We're grouping by the subscription plan and year, and calculating the total revenue by multiplying the number of subscriptions with the monthly price of each plan and the number of months in a year (assuming each subscription lasts exactly a year). The query is filtered to only include records where the is greater than the , to avoid including cancelled subscriptions.

Since joins come up frequently during SQL interviews, try an interactive Spotify JOIN SQL question:

Spotify JOIN SQL question

Preparing For The New York Times SQL Interview

The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the earlier New York Times SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Netflix, Airbnb, and Amazon.

DataLemur Questions

Each interview question has multiple hints, step-by-step solutions and best of all, there's an online SQL code editor so you can easily right in the browser your SQL query answer and have it checked.

To prep for the New York Times SQL interview you can also be useful to solve SQL problems from other media companies like:

Stay ahead of the curve with The New York Times' in-depth coverage of Artificial Intelligence trends and breakthroughs!

In case your SQL skills are weak, forget about diving straight into solving questions – refresh your SQL knowledge with this free SQL tutorial.

DataLemur SQL Course

This tutorial covers things like filtering with LIKE and sorting data with ORDER BY – both of which come up frequently in SQL interviews at New York Times.

NYT Data Science Interview Tips

What Do New York Times Data Science Interviews Cover?

In addition to SQL query questions, the other question categories covered in the New York Times Data Science Interview are:

New York Times Data Scientist

How To Prepare for New York Times Data Science Interviews?

To prepare for New York Times Data Science interviews read the book Ace the Data Science Interview because it's got:

  • 201 interview questions sourced from companies like Google, Tesla, & Goldman Sachs
  • a refresher covering Python, SQL & ML
  • over 1000+ reviews on Amazon & 4.5-star rating

Acing Data Science Interview

Don't ignore the behavioral interview – prep for that using this list of behavioral interview questions for Data Scientists.