logo

8 Yelp SQL Interview Questions (Updated 2024)

Updated on

March 3, 2024

At Yelp, SQL is crucial for analyzing customer reviews for trends and managing data integrity of reviews and restaurant information. So, it shouldn't surprise you that Yelp typically asks SQL coding questions during interviews for Data Science and Data Engineering positions.

So, to help you study for the Yelp SQL interview, here’s 8 Yelp SQL interview questions – able to solve them?

Yelp SQL Interview

8 Yelp SQL Interview Questions

SQL Question 1: Find the Most Active Reviewers on Yelp

For Yelp, a "whale" or VIP user might be one who frequently reviews businesses since their reviews can significantly drive traffic and the platform's credibility. For this exercise, let's presume we want to identify users who have submitted the most reviews in the past year.

Suppose we have a table containing all reviews made on Yelp structured as below:

Example Input:
review_iduser_idsubmit_date
112306/08/2021
212306/10/2021
336206/18/2021
419207/26/2021
512307/05/2021
Example Input:
user_idusername
123JohnDoe
265JaneSmith
362JimBrown
192JillWhite
481JoeBlack

Answer:


This query joins the and tables on , filters for reviews submitted in the past year using a WHERE clause, groups the remaining rows by and , and counts the number of rows (reviews) per group. In the end, it orders by the count in descending order and limits the output to the top 10 users.

The query will return a table where each row corresponds to a user and includes their user_id, username, and the number of reviews they have submitted this past year, ordered from the user with the highest review count to the user with the 10th highest count.

To solve a related customer analytics question on DataLemur's free interactive SQL code editor, try this recently asked Microsoft SQL interview question: Microsoft SQL Interview Question: Teams Super User

SQL Question 2: Average Review Rating Per Month

In this problem, you are provided with a table called , which contains information of user reviews for different products on Yelp. Each row represents one user's review for a specific product.

The columns include:

  • : A unique ID for each review;
  • : The user's unique ID;
  • : The date and exact time the review was submitted;
  • : The ID of the product being reviewed;
  • : The rating score given by the user, from 1 to 5 stars.

Write a SQL query to find the average rating (stars) of each product per month.

Example Input:
review_iduser_idsubmit_dateproduct_idstars
61711232022-06-08 00:00:00500014
78022652022-06-10 00:00:00698524
52933622022-06-18 00:00:00500013
63521922022-07-26 00:00:00698523
45179812022-07-05 00:00:00698522

Answer:

Here's how you might write the query:


In this query, we use the EXTRACT function to get the month from the . Then we use the AVG function with a window function to get the average stars given by users for each product in each month. The window is defined by the PARTITION BY clause, which divides the result into groups based on and . The ORDER BY clause is used to sort the result by month and product ID.

Pro Tip: Window functions are a frequent SQL interview topic, so practice every window function problem on DataLemur

DataLemur SQL Questions

SQL Question 3: What are database views, and when would you use them?

A database view is a virtual table that is created based on the results of a SELECT statement, and provides you a customized, read-only version of your data that you can query just like a regular table.

You'd want to use a view for a few reasons:

  • views allow you to create a simpler versions of your data based on whose running the query (such as hiding un-important columns/rows from business analysts if they're just random noisy artifricats of your Data Infrastructure pipelines)
  • views can help you comply with information security requirements by hiding sensitive data from certain users (important for regulated industries like govermnet and healthcare!)
  • views often improve performance for complicated queries by pre-computing the results and saving them in a view (which is often faster than re-executing the original query)... just be careful since static views don't update if the underlying data changes!

Yelp SQL Interview Questions

SQL Question 4: Analyzing Yelp's Restaurant Click-Through Rates

Yelp is a platform that allows users to search for and review various types of businesses. One business type they focus on is restaurants. They often need to analyze click-through rates. Suppose Yelp defines click-through rate (CTR) as the number of users who clicked a restaurant's detail page divided by the number of users who viewed any restaurant's list page.

To analyze the click-through rate, you are given two tables, and . The table logs each time a user views a page on Yelp. The table logs each time a user clicks on a restaurant's detail page from the list page. For simplicity, let’s assume that all page views and clicks in these tables are from the same date.

You are tasked to calculate the click-through rate for this particular day.

Example Input:
view_iduser_idpage_type
101123list
102456list
103123list
104789detail
105456list
Example Input:
click_iduser_idrestaurant_id
20112350001
20245669852
20312350001
20478969852
20512350001

Answer:

Using PostgreSQL, the SQL query for this calculation would be as follows:


This query counts the total number of clicks from the table and divides it by the total number of list page views from the table. The function is used to change the count into a float, allowing for decimal division and a more accurate rate. The result is the click-through-rate for restaurants on Yelp for the specified day.

To practice a similar problem about calculating rates, solve this SQL interview question from TikTok on DataLemur's interactive coding environment: SQL interview question from TikTok

SQL Question 5: Can you explain the concept of a constraint in SQL?

SQL constraints are used to specify rules for the data in a table. They can be applied to single or multiple fields in a table when the table is created, or after the table has been created using the ALTER TABLE command.

For example, say you had a database that stores ad campaign data from Yelp's Google Analytics account.

Here's what some constraints could look like:


SQL Question 6: Find Restaurants based on Name Patterns

As a Yelp data analyst, your task is to search the database and retrieve all restaurant records where the name begins with a specific string. For instance, in order to fetch all restaurants starting with "Piz", you'll have to form a query using the SQL LIKE keyword.

Example Input:

restaurant_idnamelocation_idcuisineavg_rating
105Pizza Hut172Italian3.9
232Pizzeria Locale293Italian4.2
322Domino's Pizza172Italian3.7
488Pizzagate324Italian4.6
521Taco Bell293Mexican3.8

Example Output:

restaurant_idnamelocation_idcuisineavg_rating
105Pizza Hut172Italian3.9
232Pizzeria Locale293Italian4.2
322Domino's Pizza172Italian3.7
488Pizzagate324Italian4.6

Answer:


This SQL query uses the keyword in combination with which is a wildcard character in SQL, to filter rows in the restaurants table based on the condition specified. The symbol matches zero or more characters. Therefore, 'Piz%' matches any string that starts with 'Piz'. It returns all columns of the restaurants which have names starting with 'Piz'. This pattern can be replaced by any other desirable pattern based on the requirement.

SQL Question 7: What is a self-join?

A self-join is a type of JOIN where a table is joined to itself. To execute a self-join, you must include the table name twice in the FROM clause and assign a different alias to each instance. You can then join the two copies of the table using a JOIN clause, and use a WHERE clause to specify the relationship between the rows.

For example, say you had website visitor data for Yelp, exported from the company's Google Analytics account. In support of the web-dev team, you had to analyze pairs of pages for UX or navigational issues. As part of that analysis, you wanted to generate all pairs of URLs, but needed to avoid pairs where both the URLs were the same since that's not a valid pair.

The self-join query would like the following:


This query returns the url of each page () along with the url of the page that referred to it (). The self-join is performed using the field, which specifies the id of the page that referred the visitor to the current page, and avoids any pages that referred themself (aka data anomalies).

SQL Question 8: Computing Average Ratings and Count of Reviews, Rounded to Nearest Whole Number

You are given a table named , with columns , , , , and . Here, each row represents a review given by a user (identified by ) to a business (identified by ), with a star rating (out of 5) the user gave the business.

Write a PostgreSQL query to calculate the following information for each business: The average star rating, rounded to two decimal places, and the total count of reviews submitted for the business. For the count of reviews, round this number down to the nearest whole integer.

Also, make sure to exclude businesses with an average star rating below 2.

Example Input:
review_iduser_idsubmit_datebusiness_idstars
617112306/08/2022600014
780226506/10/2022698521
529336206/12/2022600013
635219207/02/2022698523
451798107/04/2022698522

Answer:


The function is used to find the average star rating for each business, which is then rounded to 2 decimal places using the function. The function is used to find the total number of reviews for each business, which is then rounded down to the nearest integer using the function. The clause is used to filter out businesses with an average star rating of less than 2.

To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for calculating average and grouping by business or this Twitter Histogram of Tweets Question which is similar for counting instances related to unique ids.

Yelp SQL Interview Tips

The key to acing a Yelp SQL interview is to practice, practice, and then practice some more! Beyond just solving the above Yelp SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Google, Microsoft and Silicon Valley startups. DataLemur Questions

Each SQL question has hints to guide you, fully explained answers along with a discussion board to see how others solved it and most importantly, there's an interactive SQL code editor so you can right online code up your query and have it checked.

To prep for the Yelp SQL interview you can also be helpful to solve SQL questions from other tech companies like:

But if your SQL coding skills are weak, don't worry about going right into solving questions – go learn SQL with this DataLemur SQL tutorial.

DataLemur SQL tutorial

This tutorial covers SQL concepts such as CASE/WHEN statements and INTERCEPT/EXCEPT – both of these show up often during SQL interviews at Yelp.

Yelp Data Science Interview Tips

What Do Yelp Data Science Interviews Cover?

In addition to SQL query questions, the other types of questions to practice for the Yelp Data Science Interview are:

Yelp Data Scientist

How To Prepare for Yelp Data Science Interviews?

The best way to prepare for Yelp Data Science interviews is by reading Ace the Data Science Interview. The book's got:

  • 201 Interview Questions from tech companies like Google & Microsoft
  • A Crash Course covering Stats, ML, & Data Case Studies
  • Great Reviews (900+ 5-star reviews on Amazon)

Ace the Data Science Interview by Nick Singh Kevin Huo