At Expedia Group, PostgreSQL is crucial for querying travel data for customer insights and to study the user acquisition funnel for inefficiences. That's why Expedia often tests SQL coding questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.
To help you study for the Expedia SQL interview, we've collected 8 Expedia Group SQL interview questions – how many can you solve?
As a Data Analyst at Expedia, assume you need to find out who their most loyal and high spending users are to target them with personalized marketing campaigns. Specifically, Expedia is interested in users who have made a high amount of total booking cost throughout 2022.
Sample table:
booking_id | user_id | booking_date | hotel_id | total_cost |
---|---|---|---|---|
101 | 254 | 01/02/2022 | 1001 | 4000 |
102 | 365 | 02/10/2022 | 1003 | 2100 |
103 | 254 | 03/15/2022 | 1002 | 5300 |
104 | 400 | 04/01/2022 | 1003 | 1900 |
105 | 254 | 05/21/2022 | 1001 | 3400 |
106 | 500 | 06/12/2022 | 1002 | 2500 |
107 | 400 | 07/03/2022 | 1001 | 4100 |
108 | 254 | 08/17/2022 | 1003 | 5100 |
109 | 365 | 09/05/2022 | 1003 | 2000 |
110 | 254 | 10/06/2022 | 1001 | 5200 |
This query sums up the column for each for bookings made in the year 2022. It returns the and the summed cost (alias ) for users whose total cost is greater than 10,000. The results are ordered by the total cost in descending order, so the highest spending users appear first.
To solve a similar VIP customer analysis question on DataLemur's free interactive coding environment, try this Microsoft SQL Interview problem:
As an analytics professional at Expedia, one of your responsibilities will be to analyze customer reviews and ratings which are essential for the company's product improvement and customer satisfaction.
Our table consists of customer's reviews data with the following columns:
Your task is to write a SQL query to calculate the monthly average ratings(stars) for each product.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-08-06 | 50001 | 4 |
7802 | 265 | 2022-10-06 | 69852 | 4 |
5293 | 362 | 2022-18-06 | 50001 | 3 |
6352 | 192 | 2022-26-07 | 69852 | 3 |
4517 | 981 | 2022-05-07 | 69852 | 2 |
This query first groups the data by month (using the function on ) and . It then calculates the average star rating() per month per product() using the function. The result is ordered by and then .
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.5 |
6 | 69852 | 4.0 |
7 | 69852 | 2.5 |
For more window function practice, solve this Uber SQL problem within DataLemur's online SQL code editor:
A database index is a data structure that improves the speed of data retrieval operations on a database table.
There are few different types of indexes that can be used in a database:
For a concrete example, say you had a table of Expedia customer payments with the following columns:
Here's what a clustered index on the column would look like:
A clustered index on the column would determine the physical order of the records in the table based on the . This means that the records with the earliest values would be stored together physically in the table, followed by records with later payment_date values. This speeds up queries that filter or sort the data based on the , as the records are already physically sorted in the table. For example, if you want to retrieve all the payments made in the month of June, the database can use the clustered index to quickly locate and retrieve the desired records without having to search through the entire table.
Expedia is a travel agency company, which provides hotel bookings among other services. For each booking, customers are asked to give a rating, which ranges from 1 to 5 stars. The higher the rating, the more satisfied the customer was.
As a data analyst at Expedia, you are tasked to find the average rating for each hotel. You have a 'ratings' table with the hotel_id, user_id, rating, and a 'hotels' table with the hotel_id, hotel_name.
Here is some example data:
In the query, the ratings table is joined with the hotels table on the hotel_id. Then the average rating for each hotel is calculated using the AVG function in combination with the GROUP BY clause.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for requiring average rating computation or this Facebook User's Third Transaction Question which is similar for working with user and transaction related data.
hile a correlated subquery relies on columns in the main query's FROM clause and cannot function independently, a non-correlated subquery operates as a standalone query and its results are integrated into the main query.
An example correlated sub-query:
This correlated subquery retrieves the names and salaries of Expedia employees who make more than the average salary for their department. The subquery references the department column in the main query's FROM clause (e1.department) and uses it to filter the rows of the subquery's FROM clause (e2.department).
An example non-correlated sub-query:
This non-correlated subquery retrieves the names and salaries of Expedia employees who make more than the average salary for the Data Science department (which honestly should be very few people since Data Scientists are awesome and deserve to be paid well).The subquery is considered independent of the main query can stand alone. Its output (the average salary for the Data Science department) is then used in the main query to filter the rows of the Expedia employees table.
Given a table which contains customer reviews for various hotels booked via Expedia, calculate the average rating of each hotel per month. Each review is associated with a specific hotel through and has a particular . For this question, you can assume that all dates are in the format 'YYYY-MM-DD'.
review_id | user_id | submit_date | hotel_id | rating |
---|---|---|---|---|
1 | 1 | 2022-05-01 | 1 | 4.5 |
2 | 2 | 2022-05-01 | 1 | 3.5 |
3 | 3 | 2022-05-02 | 2 | 4.0 |
4 | 4 | 2022-06-01 | 1 | 5.0 |
5 | 5 | 2022-06-02 | 2 | 4.5 |
month | hotel | avg_rating |
---|---|---|
5 | 1 | 4 |
5 | 2 | 4 |
6 | 1 | 5 |
6 | 2 | 4.5 |
This query extracts the month from the submit date of the review, then groups the results by the extracted month and . The aggregation function is applied to calculate the average rating per hotel per month.
When designing a database schema, a one-to-one relationship between two entities is characterized by each entity being related to a single instance of the other. An example of this is the relationship between a car and a license plate - each car has one license plate, and each license plate belongs to one car.
On the other hand, a one-to-many relationship is when one entity can be associated with multiple instances of the other entity. For example, a person can have multiple email addresses, but each email address only relates back to one person.
As a data analyst at Expedia's headquarters, you've been tasked with the job of analyzing the customer database and their bookings. Specifically, your boss wants you to get a list of all customers, their total number of trips booked, and the average rating for each trip. Join the customer and booking tables to get this information.
In PostgreSQL, you could use something like the following:
This query entails joining the customer table with the bookings table on the field. We then use the function to count the total number of bookings and the function to calculate the average rating for all trips booked by each customer. Finally, we group by , , and and order by . If any customer did not book a trip, that customer would be excluded from the result.
Since joins come up so often during SQL interviews, try this interactive Snapchat SQL Interview question using JOINS:
The key to acing a Expedia SQL interview is to practice, practice, and then practice some more! Beyond just solving the earlier Expedia SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Microsoft, Google, Amazon, and tech startups.
Each DataLemur SQL question has multiple hints, full answers and best of all, there's an interactive coding environment so you can easily right in the browser your SQL query and have it graded.
To prep for the Expedia SQL interview it is also helpful to solve SQL questions from other tech companies like:
But if your SQL skills are weak, forget about jumping right into solving questions – refresh your SQL knowledge with this SQL interview tutorial.
This tutorial covers things like math functions and filtering data with WHERE – both of these come up routinely in Expedia SQL assessments.
In addition to SQL interview questions, the other types of questions to practice for the Expedia Data Science Interview are:
To prepare for Expedia Data Science interviews read the book Ace the Data Science Interview because it's got: