At Airbnb, SQL is used day-to-day for analyzing customer behavior to improve property recommendations and monitoring system performance for seamless booking experiences. Unsurprisingly this is why Airbnb LOVES to ask SQL problems in interviews for Data Analyst, Data Science, and BI jobs.
In case you want to ace the SQL Assessment, we've curated 11 Airbnb SQL interview questions to practice, which are similar to recently asked questions at Airbnb – able to answer them all?
The strategy team in Airbnb is trying to analyze the impact of Covid-19 during 2021. To do so, they need you to write a query that outputs the average vacant days across the AirBnbs in 2021. Some properties have gone out of business, so you should only analyze rentals that are currently active. Round the results to a whole number.
Assumptions:
Column Name | Type |
---|---|
listing_id | integer |
checkin_date | date |
checkout_date | date |
listing_id | checkin_date | checkout_date |
---|---|---|
1 | 08/17/2021 00:00:00 | 08/19/2021 00:00:00 |
1 | 08/19/2021 00:00:00 | 08/25/2021 00:00:00 |
2 | 08/19/2021 00:00:00 | 09/22/2021 00:00:00 |
3 | 12/23/2021 00:00:00 | 01/05/2022 00:00:00 |
Column Name | Type |
---|---|
listing_id | integer |
is_active | integer |
listing_id | is_active |
---|---|
1 | 1 |
2 | 0 |
3 | 1 |
avg_vacant_days |
---|
357 |
Average vacant days are 355 days. (357 + 353 / 2).
The dataset you are querying against may have different input & output - this is just an example!
To solve this question join DataLemur Premium to try this Airbnb SQL interview question:
Given the table with columns: , , , , , write a SQL query to get the average rating of each Airbnb property listing per month. The column represents when the review was submitted. The column represents the unique ID of the Airbnb property, and represents the rating given by the user where 1 is the lowest and 5 is the highest rating.
review_id | user_id | submit_date | listing_id | stars |
---|---|---|---|---|
6171 | 123 | 01/02/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 01/15/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 01/22/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 02/05/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 02/10/2022 00:00:00 | 69852 | 2 |
This SQL query will first group the data by month and listing_id. For each group, it calculates the average rating. The function is used to get the month from . The function is used to calculate the average stars rating. In the end, it orders the results by and .
mth | listing_id | avg_stars |
---|---|---|
1 | 50001 | 3.50 |
1 | 69852 | 4.00 |
2 | 69852 | 2.50 |
p.s. Window functions show up super frequently during SQL interviews, so practice the 27+ window function questions on DataLemur
The strategy team in Airbnb is trying to analyze the impact of Covid-19 during 2021. To do so, they need you to write a query that outputs the average vacant days across the AirBnbs in 2021. Some properties have gone out of business, so you should only analyze rentals that are currently active. Round the results to a whole number.
Assumptions:
Column Name | Type |
---|---|
listing_id | integer |
checkin_date | date |
checkout_date | date |
listing_id | checkin_date | checkout_date |
---|---|---|
1 | 08/17/2021 00:00:00 | 08/19/2021 00:00:00 |
1 | 08/19/2021 00:00:00 | 08/25/2021 00:00:00 |
2 | 08/19/2021 00:00:00 | 09/22/2021 00:00:00 |
3 | 12/23/2021 00:00:00 | 01/05/2022 00:00:00 |
Column Name | Type |
---|---|
listing_id | integer |
is_active | integer |
listing_id | is_active |
---|---|
1 | 1 |
2 | 0 |
3 | 1 |
avg_vacant_days |
---|
357 |
To solve this question on DataLemur's free interactive site, try this Airbnb SQL interview question:
You're a data analyst at Airbnb and you've been tasked with retrieving housing data from specific cities. You want to find all Airbnb listings in San Francisco and New York that have at least 10 reviews and an average rating equal to or above 4.5.
Assume you have two tables: a table with the ID of the housing, its name, city, and the total number of reviews; and a table with the ID of the listing, the review ID, the rating, and the date submitted.
listing_id | name | city | reviews_count |
---|---|---|---|
10001 | "Central Loft" | "San Francisco" | 15 |
10002 | "Cozy Apartment" | "New York" | 20 |
10003 | "Sunny Studio" | "San Francisco" | 8 |
10004 | "Stylish Suite" | "Las Vegas" | 13 |
10005 | "Dreamy Duplex" | "New York" | 5 |
listing_id | review_id | stars | submit_date |
---|---|---|---|
10001 | 15001 | 4.5 | 2022-06-08 |
10001 | 15002 | 5.0 | 2022-06-10 |
10002 | 15003 | 4.0 | 2022-06-18 |
10002 | 15004 | 5.0 | 2022-07-26 |
10003 | 15005 | 3.5 | 2022-07-05 |
10004 | 15006 | 4.5 | 2022-06-08 |
10005 | 15007 | 3.0 | 2022-06-10 |
The above query works as follows:
By joining the tables, filtering, grouping, and running the average function, we're able to get the desired listings in those cities with the specific review conditions.
The UNIQUE constraint is used to ensure the uniqueness of the data in a column or set of columns in a table. It prevents the insertion of duplicate values in the specified column or columns and helps to ensure the integrity and reliability of the data in the database.
For example, say you were on the Marketing Analytics team at Airbnb and were doing some automated keyword research:
Your keyword database might store SEO data like this:
In this example, the UNIQUE constraint is applied to the "keyword" field to ensure that each keyword is unique. This helps to ensure the integrity of the data in the database and prevents errors that could occur if two rows had the same keyword.
As an analyst at Airbnb, one of the most useful insights you could provide would be to understand the average number of guests per booking across locations. For this question, we would like you to write a SQL query that will find the average number of guests per booking in each city.
booking_id | property_id | guests | booking_date |
---|---|---|---|
101 | 4523 | 3 | 01/01/2022 |
102 | 9871 | 2 | 01/05/2022 |
103 | 4523 | 4 | 02/10/2022 |
104 | 7452 | 1 | 02/20/2022 |
105 | 9871 | 3 | 03/01/2022 |
property_id | city |
---|---|
4523 | New York |
9871 | Los Angeles |
7452 | Chicago |
city | average_guests |
---|---|
New York | 3.5 |
Los Angeles | 2.5 |
Chicago | 1.0 |
In this query, we first join the and tables on , allowing us to access both the and columns in the same query. We then group by , so we get a separate row for each city in our output. For each group, we calculate the average number of guests.
To practice a very similar question try this interactive Robinhood Cities With Completed Trades Question which is similar for requiring SQL analysis grouped by city or this Amazon Average Review Ratings Question which is similar for needing an average calculation.
While both types of databases are used to store data (no duh!), relational databases and non-relational (also known as NoSQL databases) differ in a few important ways, most importantly on the way data is stored. Relational databases use a data model consisting of tables and rows, while NoSQL databases use a variety of data models, including document, key-value, columnar, and graph storage formats.
This added flexibilty makes NoSQL databases great for non-tabular data (like hierarchal data or JSON data), or data where the type/format is constantly evolving. With this added flexibility, comes one big weakness – you won't get ACID-compliance. That means, unlike relational databases which are typically adhere to the ACID properties (atomic, consistent, isolated, and durable), you don't get as strong guarentees with most non-relational databases.
The scenario is that Airbnb wants to analyze the click-through conversion rates (CTRs) of their listings. The CTR is calculated by dividing the number of bookings by the number of listing views, giving a proportion of views that resulted in a booking.
Consider you have two tables: one showing all the views for a listing () and another one showing all bookings ().
view_id | user_id | visit_date | listing_id |
---|---|---|---|
101 | 10 | 7/08/2022 | 1001 |
102 | 12 | 7/08/2022 | 1002 |
103 | 14 | 7/09/2022 | 1001 |
104 | 10 | 7/10/2022 | 1003 |
105 | 13 | 7/11/2022 | 1002 |
booking_id | user_id | booking_date | listing_id |
---|---|---|---|
201 | 10 | 7/09/2022 | 1001 |
202 | 12 | 7/10/2022 | 1002 |
203 | 15 | 7/12/2022 | 1003 |
204 | 13 | 7/13/2022 | 1002 |
205 | 12 | 7/14/2022 | 1001 |
The question is to write a SQL query to find the CTR for every unique listing in July 2022.
Here is a PostgreSQL query answer:
The query utilises two subqueries ( and ) to calculate the counts of views and bookings respectively for each listing for the month of July. The final query joins these subqueries via a LEFT JOIN (to include those listings which were viewed but never booked), and calculates the CTR as . The function is used to avoid division by zero error.
To practice a similar SQL problem on DataLemur's free online SQL coding environment, solve this Meta SQL interview question:
As a data analyst for Airbnb, you've been asked to determine the city that has had the most bookings (reservations) in the past year. You are given two tables - a 'bookings' table with booking IDs, user IDs, listing IDs, and booking dates, and a 'listings' table with listing IDs, city locations, and host IDs.
Provide a SQL query that returns the city with the maximum number of bookings, along with the number of bookings.
booking_id | user_id | listing_id | booking_date |
---|---|---|---|
101 | 123 | 50001 | 06/08/2022 |
102 | 265 | 69852 | 06/10/2022 |
103 | 362 | 50001 | 06/18/2022 |
104 | 192 | 69852 | 07/26/2022 |
105 | 981 | 69852 | 07/05/2022 |
listing_id | city | host_id |
---|---|---|
50001 | Amsterdam | 876 |
69852 | Barcelona | 974 |
This query works by joining the 'bookings' table with the 'listings' table on the shared 'listing_id'. After joining, the 'city' in the 'listings' table is grouped by, and for each city, the number of bookings is counted (by counting the 'booking_id' in the 'bookings' table). The results are then ordered by the number of bookings in descending order, and finally, the top result (the city with the most bookings) is selected. undefined
The SQL command merges the results of multiple statements and keeps only those rows that are present in all sets.
For example, say you were doing an HR Analytics project for Airbnb, and had access to Airbnb's employees and contractors data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables. You could use operator to find all contractors who also show up in the employees table:
As an Airbnb data analyst, you have been asked to analyze the performance of hosts' listings in the past year. Your task is to identify the top 10 listings with the most bookings.
There are two tables involved. One is the 'hosts' table that provides details about each host and its respective listing. The other one is 'bookings' table that records each booking transaction.
The 'hosts' table:
host_id | listing_id | listing_name | city |
---|---|---|---|
101 | 201 | Penthouse | New York |
102 | 202 | Ocean View | San Francisco |
103 | 203 | Country House | Austin |
The 'bookings' table:
booking_id | user_id | listing_id | booking_date |
---|---|---|---|
301 | 401 | 201 | 06/08/2022 |
302 | 402 | 202 | 06/10/2022 |
303 | 403 | 202 | 07/10/2022 |
304 | 404 | 203 | 08/08/2022 |
305 | 405 | 201 | 09/08/2022 |
Both tables can be joined on the 'listing_id' field.
This query first joins the 'hosts' and 'bookings' tables based on the 'listing_id' field. It then groups the result by 'listing_id' and 'listing_name' from the 'hosts' table. For each group, it calculates the total number of bookings, and finally sorts the groups based on this count in descending order. It returns the top 10 listings with the most bookings.
Because joins come up frequently during SQL interviews, try an interactive SQL join question from Spotify:
The key to acing a Airbnb SQL interview is to practice, practice, and then practice some more! Beyond just solving the earlier Airbnb SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Facebook, Google, and VC-backed startups.
Each exercise has hints to guide you, full answers and best of all, there is an online SQL code editor so you can instantly run your SQL query answer and have it graded.
To prep for the Airbnb SQL interview you can also be useful to practice interview questions from other tech companies like:
But if your SQL foundations are weak, don't worry about jumping right into solving questions – refresh your SQL knowledge with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers SQL concepts such as working with string/text data and creating summary stats with GROUP BY – both of which pop up frequently in Airbnb SQL interviews.
You should also think criticially about Airbnb's Database & System Design BEFORE you step into the interview.
For the Airbnb Data Science Interview, in addition to SQL query questions, the other types of questions to practice:
The best way to prepare for Airbnb Data Science interviews is by reading Ace the Data Science Interview. The book's got: