At American Airlines data engineers use SQL to analyze passenger flight data and to optimize flight schedules, while juggling conditions like air traffic and weather conditions. That's why American Airlines usually asks SQL coding interview questions to see which candidates are prepared to take on this responsibility.
Thus, to help you prep for the American Airlines SQL interview, we've collected 9 American Airlines Group SQL interview questions in this blog.
Based on the customer flight history for American Airlines, you are required to write a query that will identify the top 5 customers who have the highest average frequency of flights per month in the last year. A customer's flight frequency is counted by the number of flights they have taken in any given month.
Data for two tables are provided: customers
and flights
.
customers
is structured as follows:customer_id | full_name | membership_date |
---|---|---|
5437 | John Doe | 01/15/2015 |
9821 | Jane Smith | 07/29/2018 |
7373 | Emma Jones | 06/11/2019 |
8280 | Michael Brown | 10/26/2020 |
7353 | Olivia Taylor | 04/03/2016 |
flights
is structured as follows:flight_id | customer_id | flight_date |
---|---|---|
2547 | 5437 | 08/03/2021 |
8753 | 5437 | 08/15/2021 |
3672 | 9821 | 08/05/2021 |
5246 | 7373 | 08/23/2021 |
8362 | 8280 | 08/01/2021 |
A query that could solve this problem would look something like this:
SELECT c.customer_id, c.full_name, AVG(f.number_of_flights) AS avg_monthly_flights FROM customers c JOIN ( SELECT customer_id, COUNT(flight_id) AS number_of_flights, DATE_TRUNC('month', flight_date) AS month FROM flights WHERE flight_date >= (CURRENT_DATE - INTERVAL '1 year') GROUP BY customer_id, month ) f ON c.customer_id = f.customer_id GROUP BY c.customer_id, c.full_name ORDER BY avg_monthly_flights DESC LIMIT 5;
This query works by first creating a subquery (the portion of the query inside the round brackets) that groups the flights by customer and by month, and counts the number of flights for each group. It then joins this result back to the customers table to get the customer names, and calculates the average number of flights per month for each customer. The results are then ordered by this average in descending order and limited to the top 5 customers.
To work on a similar customer analytics SQL question where you can code right in the browser and have your SQL code instantly executed, try this Walmart SQL Interview Question:
Assume you had a table of American Airlines employee salary data. Write a SQL query to find the top three highest paid employees within each department.
employees
Example Input:employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department
Example Input:department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
Write a SQL query for this interview question interactively on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
WITH ranked_salary AS ( SELECT name, salary, department_id, DENSE_RANK() OVER ( PARTITION BY department_id ORDER BY salary DESC) AS ranking FROM employee ) SELECT d.department_name, rs.name, rs.salary FROM ranked_salary AS rs INNER JOIN department AS d ON rs.department_id = d.department_id WHERE rs.ranking <= 3 ORDER BY d.department_id, rs.salary DESC, rs.name ASC;
If the code above is tough, you can find a step-by-step solution here: Top 3 Department Salaries.
A self-join is a type of JOIN where a table is joined to itself. To execute a self-join, you must include the table name twice in the FROM clause and assign a different alias to each instance. You can then join the two copies of the table using a JOIN clause, and use a WHERE clause to specify the relationship between the rows.
One common use case for self-joins is to compare data within a single table. For example, you might use a self-join to compare the salaries of employees within the same department, or to identify pairs of products that are frequently purchased together (like in this real Walmart SQL interview question)[https://datalemur.com/questions/frequently-purchased-pairs].
For a more concrete example, imagine you had website visitor data for American Airlines, exported from the company's Google Analytics account. To help the web-dev team, you needed to analyze pairs of pages for UX or navigational issues. As part of that analysis, you wanted to produce all pairs of URLs, but needed to exclude pairs where both URLs were the same since that is not a valid pair.
You could use the following self-join:
SELECT page1.url AS page_url, page2.url AS referred_from FROM google_analytics AS page1 JOIN google_analytics AS page2 ON page1.referrer_id = page2.id WHERE page1.id <> page2.id;
This query retrieves the url of each page (page1.url
) along with the url of the page that referred to it (page2.url
). The self-join is performed using the referrer_id field
, which identifies the id of the page that referred the visitor to the current page, and excludes any pages that referred to themselves (i.e., data anomalies since pages shouldn't refer to themself).
You are given a table flights
with information about flight departures and their delays. Each row represents a unique flight. For this scenario, a delay is considered anytime a flight departure is late by 15 minutes or more. American Airlines wants to analyze its flights' performance by evaluating the average delay from each airport for the past month.
The flights
table has the below structure:
flights
Example Input:flight_id | origin_airport | destination_airport | departure_date | delay_minutes |
---|---|---|---|---|
1 | LAX | DFW | 2021-09-05 13:15:00 | 20 |
2 | LAX | JFK | 2021-09-05 15:23:00 | 0 |
3 | DFW | ORD | 2021-09-08 08:50:00 | 35 |
4 | JFK | LAX | 2021-09-08 10:00:00 | 50 |
5 | LAX | ORD | 2021-09-10 16:10:00 | 20 |
Calculate the monthly average delay in minutes for each origin airport.
month | origin_airport | avg_delay_minutes |
---|---|---|
9 | LAX | 13.33 |
9 | DFW | 35 |
9 | JFK | 50 |
SELECT DATE_TRUNC('month', departure_date) as month, origin_airport, AVG(delay_minutes) as avg_delay_minutes FROM flights GROUP BY month, origin_airport ORDER BY month DESC, avg_delay_minutes DESC;
This SQL query uses the DATE_TRUNC
function to convert the departure_date
to the first day of the corresponding month, allowing us to group and compare flights by month. The AVG
function is then used to calculate the average delay for each airport within each month.
To practice another window function question on DataLemur's free interactive SQL code editor, solve this Google SQL Interview Question:
An inner join only includes rows from both tables that satisfy the join condition, whereas a full outer join includes all rows from both tables, regardless of whether they match the join condition or not.
For example, suppose you had a table of American Airlines orders and American Airlines customers.
INNER JOIN: When there is a match in the shared key or keys, rows from both tables are retrieved. In this example, an INNER JOIN
between the Orders and Customers tables would retrieve rows where the customer_id
in the Orders table matches the customer_id
in the Customers table.
FULL OUTER JOIN: A FULL OUTER JOIN
retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
You're a data analyst working for American Airlines. You have been tasked by your manager to filter out the customers who had travelled in the business class, who travelled from Dallas to New York during winter (December, January, February) of 2021, and who spent more than $1000 for the ticket.
A great example of this can be seen in American Airlines 2023 financial results!
Assuming that you have a flights
table with the following records:
flights
Example Input:passenger_id | flight_date | origin | destination | class | fare |
---|---|---|---|---|---|
4731 | 2021-01-12 | Dallas | New York | Business | 1250 |
5721 | 2021-01-20 | Dallas | New York | Economy | 800 |
3629 | 2021-12-05 | Dallas | Los Angeles | Business | 1000 |
8197 | 2021-02-18 | Dallas | New York | Business | 1500 |
9823 | 2021-02-25 | Miami | New York | Business | 1700 |
passenger_id | flight_date | origin | destination | class | fare |
---|---|---|---|---|---|
4731 | 2021-01-12 | Dallas | New York | Business | 1250 |
8197 | 2021-02-18 | Dallas | New York | Business | 1500 |
SELECT * FROM flights WHERE class = 'Business' AND origin = 'Dallas' AND destination = 'New York' AND DATE_PART('month', flight_date) IN (12, 1, 2) AND fare > 1000 AND DATE_PART('year', flight_date) = 2021;
This SQL query filters out the required details by using multiple WHERE conditions. DATE_PART function is used to filter out the winter months and year. These records are then displayed by the SELECT statement as the result of this query.
A primary key is a special type of column or set of columns in a database table that is used to identify each row in the table. It helps to ensure that all the rows are distinct and prevents null values. Additionally, primary keys play a crucial role in maintaining the accuracy of the data by making sure that there are no duplicate rows. They are also used to establish relationships between tables.
To create a primary key in a SQL database, you can use the PRIMARY KEY
constraint. For example, in the table american_airlines_employees
, the EmployeeID
column is the primary key.
CREATE TABLE american_airlines_employees ( EmployeeID INTEGER PRIMARY KEY, LastName VARCHAR(255), FirstName VARCHAR(255), Age INTEGER, Salary DECIMAL(8,2) );
As a data analyst at American Airlines, you have been asked to extract customer records to gain insights into travel trends. The database contains a passengers
table that ncludes columns such as id
, name
, address
, email
, and frequent_flyer
. The email column contains email addresses of all customers and the frequent_flyer
column indicates whether the customer is enrolled in the Frequent Flyer program.
Your task is to write a SQL query that filters out all customers whose email provider is 'gmail.com' and are enrolled in the Frequent Flyer program.
passengers
Example Input:id | name | address | frequent_flyer | |
---|---|---|---|---|
001 | John Doe | 123 Main Street | johndoe@gmail.com | yes |
002 | Jane Smith | 456 Oak Avenue | janesmith@yahoo.com | no |
003 | Peter Parker | 789 Elm Drive | peterparker@gmail.com | yes |
004 | Miles Morales | 321 Pine Road | milesmorales@hotmail.com | no |
005 | Tony Stark | 654 Maple Lane | tonystark@gmail.com | yes |
id | name | |
---|---|---|
001 | John Doe | johndoe@gmail.com |
003 | Peter Parker | peterparker@gmail.com |
005 | Tony Stark | tonystark@gmail.com |
SELECT id, name, email FROM passengers WHERE email LIKE '%gmail.com' AND frequent_flyer = 'yes';
This query selects the id
, name
, and email
of passengers where the email ends with "gmail.com" and the frequent_flyer
status is 'yes'. The '%' before 'gmail.com' is a wildcard character that matches any sequence of characters. The 'AND' operator ensures both conditions must be true. This way, we can identify frequent flyers who use Gmail.
Imagine you are working for American Airlines as a Data Analyst. Your manager gave you two tables: flights
and passengers
tables. The flights
table contains information about the various flights the company had over time and the passengers
table contains information about the passengers who flew on those flights. {#Question-9}
Now your manager wants to know which flight had the highest number of passengers from New York (NY) in the year 2021. How would you write an SQL query to get this information?
flights
Example Input:flight_id | flight_date | origin | destination | aircraft_type |
---|---|---|---|---|
1 | 2021-05-01 | NY | LA | Boeing 737 |
2 | 2021-07-06 | LA | NY | Airbus A380 |
3 | 2021-10-15 | NY | Miami | Airbus A320 |
4 | 2021-11-02 | Miami | NY | Boeing 747 |
passengers
Example Input:passenger_id | flight_id | name |
---|---|---|
101 | 1 | John Doe |
102 | 1 | Jane Smith |
103 | 2 | Peter Parker |
104 | 3 | Tony Stark |
105 | 4 | Clark Kent |
106 | 1 | Bruce Wayne |
SELECT f.flight_id, f.origin, f.destination, COUNT(p.passenger_id) as total_passengers FROM flights f JOIN passengers p ON f.flight_id = p.flight_id WHERE f.origin = 'NY' AND EXTRACT(YEAR FROM f.flight_date) = 2021 GROUP BY f.flight_id ORDER BY total_passengers DESC LIMIT 1;
The SQL join operation is used to combine rows from two or more tables based on a related column between them. Here, we joined the flights
and passengers
tables on flight_id
. We filtered the records by year of flight and origin using WHERE clause with EXTRACT function and match string. Afterwards we grouped the joined data by flight_id and counted the number of passengers per flight, ordering by the quantity. We only selected the top result with LIMIT 1
to return the flight with the most passengers from NY in 2021.
Because join questions come up frequently during SQL interviews, practice this Snapchat SQL Interview question using JOINS:
The key to acing a American Airlines SQL interview is to practice, practice, and then practice some more!
Beyond just solving the earlier American Airlines SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like Microsoft, Google, Amazon, and tech startups.
Each SQL question has multiple hints, step-by-step solutions and crucially, there is an online SQL code editor so you can right in the browser run your SQL query answer and have it checked.
To prep for the American Airlines SQL interview it is also a great idea to solve interview questions from other airlines like:
However, if your SQL skills are weak, don't worry about diving straight into solving questions – strengthen your SQL foundations with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers SQL topics like filtering strings using LIKE and window functions like RANK() and ROW_NUMBER() – both of these come up frequently during SQL interviews at American Airlines.
In addition to SQL interview questions, the other types of questions tested in the American Airlines Data Science Interview include:
The best way to prepare for American Airlines Data Science interviews is by reading Ace the Data Science Interview. The book's got: