At Booking.com, MySQL is used often for analyzing customer booking trends and optimizing pricing strategies based on regional and seasonal data trends. Because of this, Booking asks SQL problems during interviews for Data Science, Analytics, and & Data Engineering jobs.
So, to help you prepare for a Booking.com SQL Assessment, here’s 10 Booking SQL interview questions to practice – can you solve them?
As a data analyst at Booking, one of your role is to identify VIP users. These are individuals who make several hotel bookings, particularly for high-end (5-star) properties monthly. The data is stored in two primary tables booking
and hotels
as shown below.
booking
Example Input:booking_id | user_id | booking_date | hotel_id |
---|---|---|---|
9001 | 567 | 2018-06-15 00:00:00 | 70001 |
9502 | 890 | 2018-06-25 00:00:00 | 80052 |
7704 | 123 | 2018-07-18 00:00:00 | 80052 |
6925 | 567 | 2018-07-01 00:00:00 | 70001 |
4289 | 234 | 2018-07-05 00:00:00 | 70001 |
hotels
Example Input:hotel_id | rank |
---|---|
70001 | 5 |
80052 | 4 |
79011 | 4 |
Given these two tables, write a query that returns each user along with the number of high-end hotel bookings they have made within each month. Assume that a high-end hotel has a rank of 5.
SELECT EXTRACT(MONTH FROM b.booking_date) AS month, b.user_id, COUNT(*) AS number_of_bookings FROM booking b JOIN hotels h ON h.hotel_id = b.hotel_id WHERE h.rank = 5 GROUP BY 1, 2 ORDER BY number_of_bookings DESC ;
In the above query, we join the booking
table with the hotels
table on the hotel_id
then filter out all bookings which are not for high-end hotels (WHERE h.rank = 5
). The result is grouped by month and user_id to give the total number of bookings per user each month. The result is then ordered in descending order of number_of_bookings to have the users with the highest number of bookings at the top.
To solve a related customer analytics question on DataLemur's free online SQL code editor, try this Microsoft Azure Cloud SQL Interview Question:
In the context of a booking company like Booking.com, we have received detailed user reviews about hotels over the years. The task is to analyze and calculate the average review score for each hotel per month. This will help to understand the performance trend and quality of each hotel over time.
You are provided with a hotel_reviews
table with hotel_id, user_id, submit_date and stars (score out of 5) columns. Write a SQL query to return the average monthly review score for each hotel.
hotel_reviews
Example Input:review_id | user_id | submit_date | hotel_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
mth | hotel_id | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
SELECT EXTRACT(MONTH FROM submit_date) AS mth, hotel_id, AVG(stars) as avg_stars FROM hotel_reviews GROUP BY mth, hotel_id;
This SQL query uses the GROUP BY clause to organize data into groups by month and hotel_id. Then, it uses the AVG() function to calculate the average rating for each group. EXTRACT(MONTH FROM submit_date) is used to get the month part from the date column ‘submit_date’ and 'hotel_id' identifies each individual hotel. Finally, it selects the calculated average rating 'avg_stars' for each hotel and for each month.
To solve a related window function SQL problem on DataLemur's free interactive coding environment, solve this Google SQL Interview Question:
Database views are created to provide customized, read-only versions of your data that you can query just like a regular table. So why even use one if they're so similar to a regular table?
Views are advantageous for several reasons:
Given the business needs of Booking.com, your task is to design a database schema to handle their inventory of hotels, rooms, customers, and bookings. Once you have a design in mind, generate some sample data for testing. Then, to prove the design's utility, please write a PostgreSQL query that returns a list of all hotels booked by a specific user during a specific time period.
hotels
Example Input:hotel_id | hotel_name | location |
---|---|---|
1 | Hotel California | California, USA |
2 | The Ritz | London, UK |
3 | Rydges | Sydney, AU |
rooms
Example Input:room_id | hotel_id | room_type | price_per_night |
---|---|---|---|
1 | 1 | Single | 70 |
2 | 1 | Double | 90 |
3 | 2 | Luxury Suite | 300 |
customers
Example Input:customer_id | name | |
---|---|---|
1 | John Doe | johndoe@email.com |
2 | Jane Smith | janesmith@email.com |
3 | James Brown | jamesbrown@email.com |
bookings
Example Input:booking_id | room_id | customer_id | check_in_date | check_out_date |
---|---|---|---|---|
1 | 1 | 1 | 2022-07-01 | 2022-07-07 |
2 | 3 | 2 | 2022-08-15 | 2022-08-19 |
3 | 2 | 1 | 2022-09-01 | 2022-09-10 |
SELECT DISTINCT h.hotel_name, h.location FROM hotels h JOIN rooms r ON h.hotel_id = r.hotel_id JOIN bookings b ON r.room_id = b.room_id WHERE b.customer_id = :customer_id AND b.check_in_date BETWEEN :date_start AND :date_end;
In this PostgreSQL query, we first join the hotels
, rooms
, and bookings
tables using their respective foreign keys. We then apply a WHERE
clause to filter the records by the given customer_id
and the date range. The DISTINCT
keyword ensures we don't get duplicate records in our result. The placeholders :customer_id
, :date_start
, and :date_end
would be replaced by the actual values during implementation.
undefined
A full outer join returns all rows from both tables, including any unmatched rows, whereas an inner join only returns rows that match the join condition between the two tables.
For an example of each one, say you had sales data exported from Booking's Salesforce CRM stored in a datawarehouse which had two tables: sales
and booking_customers
.
INNER JOIN
: retrieves rows from both tables where there is a match in the shared key or keys.
SELECT * FROM sales INNER JOIN booking_customers ON sales.customer_id = booking_customers.id
This query will return rows from the sales and booking_customers
tables that have matching customer id values. Only rows with matching customer_id
values will be included in the results.
FULL OUTER JOIN
: retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
Here is an example of a SQL full outer join using the sales and booking_customers
tables:
SELECT * FROM sales FULL OUTER JOIN booking_customers ON sales.customer_id = booking_customers.id
As part of the Booking team, we want to identify customers who have made a booking in the past month and who have also given a review score of more than 3 stars, for any of their bookings. Write an SQL query to accomplish this, by using a three-table join strategy that includes the customers
, bookings
, and reviews
tables.
customers
Sample Input:customer_id | first_name | last_name |
---|---|---|
101 | John | Doe |
102 | Jane | Smith |
103 | Mary | White |
bookings
Sample Input:booking_id | customer_id | booking_date |
---|---|---|
1001 | 101 | 2022-09-08 |
1002 | 102 | 2022-09-15 |
1003 | 101 | 2022-08-25 |
reviews
Sample Input:review_id | booking_id | star_rating |
---|---|---|
201 | 1001 | 5 |
202 | 1002 | 2 |
203 | 1003 | 5 |
SELECT DISTINCT c.customer_id, c.first_name, c.last_name FROM customers c JOIN bookings b ON b.customer_id = c.customer_id JOIN reviews r ON r.booking_id = b.booking_id WHERE b.booking_date >= CURRENT_DATE - INTERVAL '1 month' AND r.star_rating > 3;
This solution joins the bookings
and reviews
tables with the customers
table to find all bookings made within the past month (WHERE b.booking_date >= CURRENT_DATE - INTERVAL '1 month'
) and then filters by review_score (AND r.star_rating > 3
), thus identifying customers who have made a booking in the past month and who have also given a review score of more than 3 stars. The DISTINCT keyword is used to ensure we do not have duplicate customer entries in our result set.
undefined
NULLs are NOT the same as zero or blank spaces in SQL. NULLs are used to represent a missing value or the abscence of a value, whereas zero and blank space are legitimate values.
It's important to handle NULLs carefully, because they can mess up your analysis very easily. For example, if you compare a NULL value using the = operator, the result will always be NULL (because just like Drake, nothing be dared compared to NULL). That's why many data analysis in SQL start with removing NULLs using the COALESCE()
function.
Booking.com will often look at click-through rates of their digital ad campaigns to understand the effectiveness of the marketing activities. Given the ad_view_logs
and ad_click_logs
tables which track the viewing and clicking activities respectively of users on the digital ads that Booking.com run on various platforms, write a query to calculate the CTR (number of clicks/number of views) for each digital advertisement.
ad_view_logs
Example Input:view_id | user_id | view_date | ad_id |
---|---|---|---|
6001 | 121 | 06/08/2022 00:00:00 | 50101 |
6502 | 129 | 06/10/2022 00:00:00 | 69235 |
5623 | 135 | 06/18/2022 00:00:00 | 50101 |
5392 | 157 | 07/26/2022 00:00:00 | 69235 |
4577 | 165 | 07/05/2022 00:00:00 | 50101 |
ad_click_logs
Example Input:click_id | user_id | click_date | ad_id |
---|---|---|---|
7689 | 121 | 06/08/2022 00:00:00 | 50101 |
5029 | 155 | 06/10/2022 00:00:00 | 69235 |
3645 | 135 | 06/18/2022 00:00:00 | 50101 |
4587 | 173 | 07/26/2022 00:00:00 | 50101 |
6054 | 165 | 07/05/2022 00:00:00 | 50101 |
SELECT v.ad_id, COUNT(c.click_id) * 1.0 / COUNT(v.view_id) AS ctr FROM ad_view_logs v LEFT JOIN ad_click_logs c ON v.ad_id = c.ad_id AND v.user_id = c.user_id GROUP BY v.ad_id;
This SQL query first joins ad_click_logs
to ad_view_logs
to get corresponding lines of viewing and clicking ads. By grouping the result set by ad_id
, it then calculates the click-through-rate (CTR). The COUNT(c.click_id) * 1.0 / COUNT(v.view_id)
gives us the number of clicks per view, effectively giving us the click-through-rate for each ad. The * 1.0
is to convert the count to a float for a correct division result.
To practice a related SQL problem on DataLemur's free interactive SQL code editor, try this SQL interview question asked by Facebook:
As a data analyst at Booking.com, we have a customer records database with different data columns. One of it refers to the UserAgent details. This data helps us in understanding the customers from a technological view, analyzing which platform, device or browser the customers use to access our services.
Your task is to write a SQL query which filters out all customers that used an iPhone. Let's assume the data record in the "user_agent" column for iPhone users contains the string 'iphone'.
customers
Example Input:customer_id | user_name | user_email | user_agent |
---|---|---|---|
101 | John | john@mail.com | Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) |
102 | Jane | jane@mail.com | Mozilla/5.0 (Windows NT 10.0; Win64; x64) |
103 | Tom | tom@mail.com | Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) |
104 | Jerry | jerry@mail.com | Mozilla/5.0 (Windows NT 6.1; Win64; x64) |
105 | Sophia | sophia@mail.com | Mozilla/5.0 (iPhone; CPU iPhone OS 9_3 like Mac OS X) |
SELECT * FROM customers WHERE LOWER(user_agent) LIKE '%iphone%';
This query will select all records from the "customers" table where the "user_agent" contains the term "iphone". The '%' symbol is a wildcard in SQL that matches any sequence of characters. Also, the function LOWER
is used to ensure the query is case insensitive.
undefined
When designing a database schema, a one-to-one relationship between two entities is characterized by each entity being related to a single instance of the other. An example of this is the relationship between a car and a license plate - each car has one license plate, and each license plate belongs to one car.
On the other hand, a one-to-many relationship is when one entity can be associated with multiple instances of the other entity. For example, a person can have multiple email addresses, but each email address only relates back to one person.
The key to acing a Booking SQL interview is to practice, practice, and then practice some more!
Besides solving the above Booking SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like FAANG and tech startups.
Each DataLemur SQL question has multiple hints, detailed solutions and crucially, there's an online SQL code editor so you can right online code up your SQL query and have it checked.
To prep for the Booking SQL interview you can also be wise to practice SQL problems from other tech companies like:
But if your SQL foundations are weak, don't worry about jumping right into solving questions – go learn SQL with this interactive SQL tutorial.
This tutorial covers SQL topics like joining a table to itself and sorting results with ORDER BY – both of which come up often during Booking SQL interviews.
For the Booking Data Science Interview, besides SQL questions, the other types of questions to practice:
To prepare for Booking Data Science interviews read the book Ace the Data Science Interview because it's got: