At Amadeus IT Group, SQL is used quite frequently for analyzing vast hotel-related datasets for insights and managing complex databases to improve the company's travel IT solutions. Unsurprisingly this is why Amadeus IT almost always asks SQL questions during interviews for Data Science, Data Engineering and Data Analytics jobs.
As such, to help you practice for the Amadeus IT SQL interview, we've collected 8 Amadeus IT Group SQL interview questions – how many can you solve?
Amadeus IT mantains a database of flight bookings as part of their Flight Search product. The dataset includes booking_id, booking_date, airline, and price. Write a SQL query to calculate the average booking price per month per airline using window function.
Note: For booking_date, assume it is provided in the format MM/DD/YYYY.
booking_id | booking_date | airline | price |
---|---|---|---|
1 | 01/02/2022 | Delta | 300 |
2 | 01/03/2022 | Delta | 400 |
3 | 01/05/2022 | American | 350 |
4 | 02/04/2022 | Delta | 320 |
5 | 02/06/2022 | American | 400 |
6 | 03/02/2022 | Delta | 500 |
7 | 03/05/2022 | American | 550 |
8 | 03/06/2022 | Delta | 450 |
month | airline | average_price |
---|---|---|
1 | Delta | 350 |
1 | American | 350 |
2 | Delta | 320 |
2 | American | 400 |
3 | Delta | 475 |
3 | American | 550 |
This SQL query calculates the average price per month for each airline. The window function is used here to calculate the average price. The clause divides the result set into partitions (groups) and the average price is calculated for each group (month and airline). The function extracts the month from the date value. Finally, the result is ordered by month and airline.
Pro Tip: Window functions are a popular SQL interview topic, so practice every window function problem on DataLemur
In Amadeus IT, well-structured data is vital to assist various businesses such as hotels, airlines, and travel agencies. Given a customers table, write a SQL query to filter the customer records who traveled in February 2022, flew with 'Air Company A', stayed in 'Hotel B', and spent more on flights than on accommodations.
customer_id | customer_name | travel_date | airline_company | hotel_name | flight_cost | accommodation_cost |
---|---|---|---|---|---|---|
1001 | Adam | 02/10/2022 | Air Company A | Hotel B | 500 | 300 |
1002 | Bruce | 03/15/2022 | Air Company A | Hotel B | 600 | 500 |
1003 | Charlie | 02/28/2022 | Air Company B | Hotel B | 400 | 300 |
1004 | David | 02/15/2022 | Air Company A | Hotel C | 200 | 400 |
1005 | Eve | 02/21/2022 | Air Company A | Hotel B | 800 | 700 |
You can leverage the , commands like this:
This query first extracts the month and year from the travel_date column and checks whether it's in February 2022. It then proceeds to check whether the airline_company is 'Air Company A', the hotel_name is 'Hotel B', and the flight_cost is greater than the accommodation_cost.
A foreign key is a column or group of columns in a table that refers to the primary key in another table. The foreign key constraint helps maintain referential integrity between the two tables. The table with the foreign key is called the child table, while the table with the candidate key is called the parent or referenced table.
For example, consider a database with two tables: and customer_idcustomer_idcustomer_id` column in Amadeus IT customers table.
Given the flight status data of Amadeus IT, can you write a query to find the average delay in departure for each airline?
flight_id | airline_company | scheduled_departure | actual_departure |
---|---|---|---|
7865 | Air Canada | 07/12/2022 10:00:00 | 07/12/2022 10:30:00 |
3362 | British Airways | 07/14/2022 20:00:00 | 07/14/2022 20:50:00 |
2756 | Air Canada | 07/16/2022 06:00:00 | 07/16/2022 06:30:00 |
9245 | Lufthansa | 07/24/2022 14:00:00 | 07/24/2022 14:05:00 |
1742 | British Airways | 07/28/2022 18:00:00 | 07/28/2022 18:30:00 |
airline_company | avg_departure_delay |
---|---|
Air Canada | 15.0 |
British Airways | 25.0 |
Lufthansa | 5.0 |
The query first calculates the delay in departure for each flight (in minutes) by subtracting the scheduled_departure from the actual_departure, followed by averaging these delays over each airline_company using the AVG() function. Finally, the airline_company names and their corresponding average departure delays are selected.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating time-based metrics or this Facebook Active User Retention Question which is similar for analyzing user-based time data.
A database index is a data structure that provides a quick lookup of data in a column or columns of a table.
There are several types of indexes that can be used in a database:
Amadeus IT needs to filter customer records who have their emails hosted with specific providers. They want to find how many customers have their email hosted with and .
Given the table shown below, write a SQL query that counts the number of customers with emails hosted on and .
customer_id | first_name | last_name | |
---|---|---|---|
3412 | John | Doe | johndoe@gmail.com |
9352 | Jane | Smith | janesmith@yahoo.com |
6384 | Bob | Johnson | bob77@gmail.com |
9249 | Alice | Baker | aliceb@yahoo.com |
8462 | Charlie | Brown | cbrown@gmail.com |
email_host | count |
---|---|
gmail.com | 3 |
yahoo.com | 2 |
This query uses the LIKE keyword to filter email addresses that end with or . It then counts the number of customers for each specified email host using the COUNT function and GROUP BY clause.
A cross join is a type of JOIN operation in SQL that creates a new table by combining each row from the first table with every row from the second table. It is also known as a cartesian join.
For example, say you worked on the Marketing Analytics team at Amadeus IT, and needed to understand what advertising copy (text) goes best with what advertising creative (the photo that gets posted with the advertisement copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
A natural join, on the other hand, is a type of JOIN that combines rows from two or more tables based on their common columns. It is called a "natural" join because it is based on the natural relationship that exists between the common columns in the tables being joined.
For an example of each one, say you had sales data exported from Amadeus IT's Salesforce CRM stored in a datawarehouse which had two tables: and .
An (which is a type of natural join) combines the two tables on the common
This query will return rows from the sales and tables that have matching customer id values. Only rows with matching values will be included in the results.
One main difference between cross joins and natural joins is that cross joins do not require any common columns between the tables being joined, while natural joins do. Another difference is that cross joins can create very large tables if the input tables have a large number of rows, while natural joins will only return a table with the number of rows equal to the number of matching rows in the input tables.
As a data analyst for Amadeus IT, you've been tasked with calculating the daily cancellation rate for several airlines for the month of July 2021. The cancellation rate should be measured as the number of cancelled flights divided by the total number of flights per day. Use the following tables: and .
flight_id | airline_id | flight_date | departure_city | arrival_city | status |
---|---|---|---|---|---|
101 | 1 | 07/01/2021 | New York | London | Completed |
102 | 1 | 07/01/2021 | Tokyo | Paris | Cancelled |
103 | 1 | 07/02/2021 | New York | Tokyo | Completed |
104 | 2 | 07/02/2021 | Paris | New York | Cancelled |
105 | 2 | 07/02/2021 | London | Tokyo | Completed |
airline_id | airline_name |
---|---|
1 | AirwayFly |
2 | SkyTravel |
Your task is to write SQL query to output a table with columns: , , .
airline_name | flight_date | cancellation_rate |
---|---|---|
AirwayFly | 2021-07-01 | 0.50 |
AirwayFly | 2021-07-02 | 0.00 |
SkyTravel | 2021-07-02 | 0.50 |
Here's the PostgreSQL query that will solve the problem:
The query joins the and tables on the column. It only considers flights that occurred in July 2021. It groups the flights by and and then calculates the cancellation rate as the number of cancelled flights (status = 'Cancelled') divided by the total number of flights. The resulting table shows the cancellation rate for each airline for each day in July 2021.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the above Amadeus IT SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Microsoft, Google, and Facebook.
Each problem on DataLemur has multiple hints, step-by-step solutions and most importantly, there's an interactive coding environment so you can right in the browser run your SQL query answer and have it graded.
To prep for the Amadeus IT SQL interview it is also wise to practice SQL questions from other tech companies like:
However, if your SQL foundations are weak, don't worry about diving straight into solving questions – improve your SQL foundations with this DataLemur SQL tutorial.
This tutorial covers SQL topics like filtering data with WHERE and filtering on multiple conditions using AND/OR/NOT – both of these show up often in Amadeus IT SQL interviews.
Beyond writing SQL queries, the other types of questions to prepare for the Amadeus IT Data Science Interview are:
To prepare for Amadeus IT Data Science interviews read the book Ace the Data Science Interview because it's got: