At Yelp, SQL is crucial for analyzing customer reviews for trends and managing data integrity of reviews and restaurant information. So, it shouldn't surprise you that Yelp typically asks SQL coding questions during interviews for Data Science and Data Engineering positions.
So, to help you study for the Yelp SQL interview, here’s 8 Yelp SQL interview questions – able to solve them?
Assume you are given the table below containing information on user reviews. Write a query to obtain the number and percentage of businesses that are top rated. A top-rated busines is defined as one whose reviews contain only 4 or 5 stars.
Output the number of businesses and percentage of top rated businesses rounded to the nearest integer.
Assumption:
Column Name | Type |
---|---|
business_id | integer |
review_id | integer |
review_stars | integer |
review_date | datetime |
business_id | review_id | review_stars | review_date |
---|---|---|---|
532 | 1234 | 5 | 07/13/2022 12:00:00 |
824 | 1452 | 3 | 07/13/2022 12:00:00 |
819 | 2341 | 5 | 07/13/2022 12:00:00 |
716 | 1325 | 4 | 07/14/2022 12:00:00 |
423 | 1434 | 2 | 07/14/2022 12:00:00 |
business_count | top_rated_pct |
---|---|
3 | 60 |
This is the same question as problem #27 in the SQL Chapter of Ace the Data Science Interview!
To solve this question on DataLemur's free interactive SQL code editor, try this Yelp SQL interview question:
In this problem, you are provided with a table called , which contains information of user reviews for different products on Yelp. Each row represents one user's review for a specific product.
The columns include:
Write a SQL query to find the average rating (stars) of each product per month.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 00:00:00 | 50001 | 4 |
7802 | 265 | 2022-06-10 00:00:00 | 69852 | 4 |
5293 | 362 | 2022-06-18 00:00:00 | 50001 | 3 |
6352 | 192 | 2022-07-26 00:00:00 | 69852 | 3 |
4517 | 981 | 2022-07-05 00:00:00 | 69852 | 2 |
Here's how you might write the query:
In this query, we use the EXTRACT function to get the month from the . Then we use the AVG function with a window function to get the average stars given by users for each product in each month. The window is defined by the PARTITION BY clause, which divides the result into groups based on and . The ORDER BY clause is used to sort the result by month and product ID.
Pro Tip: Window functions are a frequent SQL interview topic, so practice every window function problem on DataLemur
A database view is a virtual table that is created based on the results of a SELECT statement, and provides you a customized, read-only version of your data that you can query just like a regular table.
You'd want to use a view for a few reasons:
Yelp is a platform that allows users to search for and review various types of businesses. One business type they focus on is restaurants. They often need to analyze click-through rates. Suppose Yelp defines click-through rate (CTR) as the number of users who clicked a restaurant's detail page divided by the number of users who viewed any restaurant's list page.
To analyze the click-through rate, you are given two tables, and . The table logs each time a user views a page on Yelp. The table logs each time a user clicks on a restaurant's detail page from the list page. For simplicity, let’s assume that all page views and clicks in these tables are from the same date.
You are tasked to calculate the click-through rate for this particular day.
view_id | user_id | page_type |
---|---|---|
101 | 123 | list |
102 | 456 | list |
103 | 123 | list |
104 | 789 | detail |
105 | 456 | list |
click_id | user_id | restaurant_id |
---|---|---|
201 | 123 | 50001 |
202 | 456 | 69852 |
203 | 123 | 50001 |
204 | 789 | 69852 |
205 | 123 | 50001 |
Using PostgreSQL, the SQL query for this calculation would be as follows:
This query counts the total number of clicks from the table and divides it by the total number of list page views from the table. The function is used to change the count into a float, allowing for decimal division and a more accurate rate. The result is the click-through-rate for restaurants on Yelp for the specified day.
To practice a similar problem about calculating rates, solve this SQL interview question from TikTok on DataLemur's interactive coding environment:
SQL constraints are used to specify rules for the data in a table. They can be applied to single or multiple fields in a table when the table is created, or after the table has been created using the ALTER TABLE command.
For example, say you had a database that stores ad campaign data from Yelp's Google Analytics account.
Here's what some constraints could look like:
As a Yelp data analyst, your task is to search the database and retrieve all restaurant records where the name begins with a specific string. For instance, in order to fetch all restaurants starting with "Piz", you'll have to form a query using the SQL LIKE keyword.
restaurant_id | name | location_id | cuisine | avg_rating |
---|---|---|---|---|
105 | Pizza Hut | 172 | Italian | 3.9 |
232 | Pizzeria Locale | 293 | Italian | 4.2 |
322 | Domino's Pizza | 172 | Italian | 3.7 |
488 | Pizzagate | 324 | Italian | 4.6 |
521 | Taco Bell | 293 | Mexican | 3.8 |
restaurant_id | name | location_id | cuisine | avg_rating |
---|---|---|---|---|
105 | Pizza Hut | 172 | Italian | 3.9 |
232 | Pizzeria Locale | 293 | Italian | 4.2 |
322 | Domino's Pizza | 172 | Italian | 3.7 |
488 | Pizzagate | 324 | Italian | 4.6 |
This SQL query uses the keyword in combination with which is a wildcard character in SQL, to filter rows in the restaurants table based on the condition specified. The symbol matches zero or more characters. Therefore, 'Piz%' matches any string that starts with 'Piz'. It returns all columns of the restaurants which have names starting with 'Piz'. This pattern can be replaced by any other desirable pattern based on the requirement.
A self-join is a type of JOIN where a table is joined to itself. To execute a self-join, you must include the table name twice in the FROM clause and assign a different alias to each instance. You can then join the two copies of the table using a JOIN clause, and use a WHERE clause to specify the relationship between the rows.
For example, say you had website visitor data for Yelp, exported from the company's Google Analytics account. In support of the web-dev team, you had to analyze pairs of pages for UX or navigational issues. As part of that analysis, you wanted to generate all pairs of URLs, but needed to avoid pairs where both the URLs were the same since that's not a valid pair.
The self-join query would like the following:
This query returns the url of each page () along with the url of the page that referred to it (). The self-join is performed using the field, which specifies the id of the page that referred the visitor to the current page, and avoids any pages that referred themself (aka data anomalies).
You are given a table named , with columns , , , , and . Here, each row represents a review given by a user (identified by ) to a business (identified by ), with a star rating (out of 5) the user gave the business.
Write a PostgreSQL query to calculate the following information for each business: The average star rating, rounded to two decimal places, and the total count of reviews submitted for the business. For the count of reviews, round this number down to the nearest whole integer.
Also, make sure to exclude businesses with an average star rating below 2.
review_id | user_id | submit_date | business_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 | 60001 | 4 |
7802 | 265 | 06/10/2022 | 69852 | 1 |
5293 | 362 | 06/12/2022 | 60001 | 3 |
6352 | 192 | 07/02/2022 | 69852 | 3 |
4517 | 981 | 07/04/2022 | 69852 | 2 |
The function is used to find the average star rating for each business, which is then rounded to 2 decimal places using the function. The function is used to find the total number of reviews for each business, which is then rounded down to the nearest integer using the function. The clause is used to filter out businesses with an average star rating of less than 2.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for calculating average and grouping by business or this Twitter Histogram of Tweets Question which is similar for counting instances related to unique ids.
The key to acing a Yelp SQL interview is to practice, practice, and then practice some more! Beyond just solving the above Yelp SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Google, Microsoft and Silicon Valley startups.
Each SQL question has hints to guide you, fully explained answers along with a discussion board to see how others solved it and most importantly, there's an interactive SQL code editor so you can right online code up your query and have it checked.
To prep for the Yelp SQL interview you can also be helpful to solve SQL questions from other tech companies like:
But if your SQL coding skills are weak, don't worry about going right into solving questions – go learn SQL with this DataLemur SQL tutorial.
This tutorial covers SQL concepts such as CASE/WHEN statements and INTERCEPT/EXCEPT – both of these show up often during SQL interviews at Yelp.
In addition to SQL query questions, the other types of questions to practice for the Yelp Data Science Interview are:
The best way to prepare for Yelp Data Science interviews is by reading Ace the Data Science Interview. The book's got: