10 Trainline SQL Interview Questions (Updated 2025)

(Ex-Facebook & Best-Selling Data Science Author)

Updated on

January 3, 2025

At Trainline, SQL is used day-to-day for extracting and analyzing railway booking data, and for managing and manipulating databases for customer trend prediction. So, it shouldn't surprise you that Trainline frequently asks SQL coding questions during interviews for Data Science and Data Engineering positions.

To help you study for the Trainline SQL interview, we've collected 10 Trainline SQL interview questions – able to answer them all?

10 Trainline SQL Interview Questions

SQL Question 1: Analyze the Ticket Sales Over Time

Trainline, being a digital platform for selling train tickets, may be interested in understanding how their ticket sales have evolved overtime for each route.

Let's say we have a simple database table , which contains the details of each ticket sold. The table schema and some sample data are as follows:

Example Input:

sale_id	route_id	sale_date	ticket_price
2321	1001	06/08/2022 00:00:00	50.00
3452	2001	06/10/2022 00:00:00	45.00
9563	1001	06/30/2022 00:00:00	55.00
4222	3001	07/01/2022 00:00:00	80.00
7245	2001	07/22/2022 00:00:00	46.00
8562	3001	07/25/2022 00:00:00	80.00

The SQL interview question could be: Write a SQL query to calculate the total monthly revenue and average ticket price for each route.

This question requires knowledge of SQL window functions to sum, average over a group of records which fall within a specific date range (month), and partition by each route.

The expected output may look as follows:

Example Output:

mth	route_id	total_revenue	avg_ticket_price
6	1001	105.00	52.50
6	2001	45.00	45.00
7	2001	46.00	46.00
7	3001	160.00	80.00

Answer:

This query groups the data by month and route_id and then calculates the total and average ticket price for each group. By using the date_trunc function, we can easily group the data by month. The ORDER BY clause ensures that the result is sorted by month and then by total revenue in descending order.

Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur

SQL Question 2: Design a database for Trainline's Ticket Booking System

Imagine you are a data engineer at Trainline, a company that sells train tickets. Your task is to design a database for handling bookings. Essential entities to consider are stations, trains, and tickets.

Trainline's ticket booking system should support these functionalities:

A list of all departure and arrival stations.
The train schedules, that is, each trains' departure and arrival time at each station.
The ability to book tickets where booking details such as passenger name, booking date, the departure station, arrival station, departure time, and arrival time are kept.

Example:

station_id	station_name
1	London
2	Manchester
3	Glasgow

Example:

train_id	train_name
1	Galia Express
2	Victoria Line

Example:

schedule_id	train_id	departure_station_id	arrival_station_id	departure_time	arrival_time
1	1	1	2	09:00:00	11:30:00
2	2	2	3	14:00:00	16:00:00

Example:

ticket_id	passenger_name	booking_date	departure_station_id	arrival_station_id	departure_time	arrival_time
1	John Doe	2022-01-01	1	2	09:00:00	11:30:00

Answer to how many tickets were sold between 2 stations in a specific time range:

This query will return the count of all tickets sold for a specific route in a specific time range. You replace , , , and with your desired values. The operator in the clause will ensure you only consider tickets booked within the time range you want.

SQL Question 3: How do you identify records in one table that aren't in another?

To find records in one table that aren't in another, you can use a and check for values in the right-side table.

Here's an example using two tables, Trainline employees and Trainline managers:

This query returns all rows from Trainline employees where there is no matching row in managers based on the column.

You can also use the operator in PostgreSQL and Microsoft SQL Server to return the records that are in the first table but not in the second. Here is an example:

This will return all rows from employees that are not in managers. The operator works by returning the rows that are returned by the first query, but not by the second.

Note that isn't supported by all DBMS systems, like in MySQL and Oracle (but have no fear, since you can use the operator to achieve a similar result).

Trainline SQL Interview Questions

SQL Question 4: Filter Trainline Customer Records by Specific Conditions

As a data analyst for Trainline, your task is to filter the customer records based on the following conditions:

The customer has made at least one booking in the past 6 months.
The departure station of their most recent booking is 'London'.
The travel class of their most recent booking is 'First Class'.
The customer has not cancelled any booking in the last year.

You are provided with two tables, and .

The table is formatted as follows:

Example Input:

booking_id	customer_id	booking_date	departure_station	arrival_station	travel_class
1001	200	2022-01-20	London	Manchester	First Class
1002	201	2022-02-15	Birmingham	London	Standard
1003	200	2022-03-10	London	Edinburgh	First Class
1004	202	2022-04-25	London	Bristol	Standard
1005	203	2022-05-30	Manchester	Birmingham	First Class

The table is formatted as follows:

Example Input:

cancel_id	booking_id	cancel_date
501	1002	2022-02-16
502	1005	2022-06-01
503	1004	2022-04-27
504	1004	2022-05-01

Answer:

Here is the SQL Postgres query to solve the above problem:

This query filters out the customers who satisfy all the given conditions, providing the customer_id of the user who made their most recent booking under these conditions.

First, it filters out the records with a booking date within the last 6 months, where the departure station is London and the travel class is First Class. It then excludes the bookings which are present in the cancellations table within the last year. The ORDER BY and LIMIT 1 clause ensure that we retrieve only the most recent booking that meets these conditions.

SQL Question 5: What's the purpose of the constraint?

The constraint is used to establish a relationship between two tables in a database. This ensures the referential integrity of the data in the database.

For example, if you have a table of Trainline customers and an orders table, the customer_id column in the orders table could be a that references the id column (which is the primary key) in the Trainline customers table.

SQL Question 6: Calculating Click-Through-Rates for Trainline

Trainline is a company that offers various routes and ticket bookings for trains and coaches. As a part of their digital marketing strategy, they often use ads that lead a user to their website or app. From there they hope user will not just view the different routes and tickets, but also add them to their cart.

Given two tables, and , your task is to calculate the click-through rate (CTR) from viewing a route to adding it to the cart by date. The table logs every click on an ad that redirects to a route view while the table logs every addition of a ticket to the cart.

Example Input:

click_id	user_id	click_date	route_id
1508	429	06/08/2022	30169
1782	713	06/10/2022	40590
1769	446	06/12/2022	30169
2295	145	06/15/2022	50702
2790	366	06/18/2022	40590

Example Input:

addition_id	user_id	add_date	route_id
2715	429	06/08/2022	30169
3463	713	06/10/2022	40590
3673	446	06/12/2022	30169
4953	145	06/15/2022	50702
5702	434	06/18/2022	40590

Answer:

To calculate the CTR, this query joins and tables on , and fields. Then it counts the number of tickets added to the cart and divides it by number of ad clicks for each date and route. This value is then multiplied by 100 to get a percentage value. The results are grouped by date and route_id. The query orders by to provide a chronological view of CTR.

To practice another question about calculating rates, try this TikTok SQL question within DataLemur's online SQL code editor:

SQL Question 7: Can you explain what an index is and the various types of indexes?

In a database, an index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and the use of more storage space to maintain the index data structure.

There are several types of indexes that can be used in a database:

Primary index: a unique identifier for each row in a table and is used to access the row directly.
Unique index: used to enforce the uniqueness of the indexed columns in a table. It does not allow duplicate values to be inserted into the indexed columns.
Composite index: is created on multiple columns of a table. It can be used to speed up the search process on the combination of columns.
Clustered index: determines the physical order of the data in a table. There can be only one clustered index per table.
Non-clustered index: does NOT determine the physical order of the data in a table. A table can have multiple non-clustered indexes.

SQL Question 8: Find The Most Popular Routes

Suppose we want to find out the most popular routes by the number of tickets sold per month. Each ticket sale is logged to a sales table, which includes the train route id, ticket id, and the sale date.

Example Input:

sale_id	route_id	ticket_id	sale_date
2387	501	10301	06/08/2022
1293	702	13456	06/10/2022
5093	501	12872	06/18/2022
1132	702	14236	07/26/2022
4917	501	15641	07/05/2022

Example Output:

month	route	ticket_count
6	501	2
6	702	1
7	501	1
7	702	1

Answer:

This SQL query will group all ticket sales by month and route_id, count the number of tickets for each group, and order the results by month and the number of tickets in descending order. This gives us the most popular routes (identified by 'route') for each month.

SQL Question 9: Finding Customer Data in Train records

Assume you are given a database of Trainline customers who have booked train tickets in the past. Your task is to find customers whose first name starts with 'M' and have booked their tickets on 'London-Plymouth' route. The query should return the customer's first name, last name, and the date they booked the ticket.

Example Input:

customer_id	first_name	last_name
007	Mark	Smith
463	John	Brown
591	Matthew	Jones
812	Mario	Rossi
073	Madeline	Archibald

Example Input:

booking_id	customer_id	booking_date	route
521	007	06/08/2022	London-Plymouth
612	463	06/10/2022	London-Bristol
349	591	06/18/2022	London-Plymouth
724	812	07/26/2022	London-Edinburgh
965	073	07/05/2022	London-Plymouth

Answer:

This SQL command joins the 'customers' and 'bookings' tables on the 'customer_id'. It then filters the result where the customer's first name starts with 'M' and the route is 'London-Plymouth'.

SQL Question 10: What does the keyword do?

The keyword removes duplicates from a query.

Suppose you had a table of Trainline customers, and wanted to figure out which cities the customers lived in, but didn't want duplicate results.

table:

name	city
Akash	SF
Brittany	NYC
Carlos	NYC
Diego	Seattle
Eva	SF
Faye	Seattle

You could write a query like this to filter out the repeated cities:

Your result would be:

city
SF
NYC
Seattle

How To Prepare for the Trainline SQL Interview

The key to acing a Trainline SQL interview is to practice, practice, and then practice some more! Beyond just solving the earlier Trainline SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Facebook, Google and unicorn tech startups.

Each problem on DataLemur has multiple hints, step-by-step solutions and most importantly, there's an interactive coding environment so you can easily right in the browser your SQL query and have it checked.

To prep for the Trainline SQL interview it is also helpful to practice SQL questions from other tech companies like:

But if your SQL foundations are weak, don't worry about going right into solving questions – go learn SQL with this SQL tutorial for Data Scientists & Analysts.

This tutorial covers topics including CASE/WHEN/ELSE statements and handling missing data (NULLs) – both of these pop up often during Trainline SQL assessments.

Trainline Data Science Interview Tips

What Do Trainline Data Science Interviews Cover?

In addition to SQL interview questions, the other types of questions tested in the Trainline Data Science Interview are:

Statistics and Probability Questions
Python or R Programming Questions
Product-Sense Questions
ML Modelling Questions
Behavioral & Resume-Based Questions

Trainline Data Scientist

How To Prepare for Trainline Data Science Interviews?

The best way to prepare for Trainline Data Science interviews is by reading Ace the Data Science Interview. The book's got:

201 Interview Questions from Facebook, Google, & Amazon
A Refresher covering SQL, Product-Sense & ML
Amazing Reviews (900+ reviews, 4.5-star rating)

10 Trainline SQL Interview Questions (Updated 2025)

10 Trainline SQL Interview Questions

SQL Question 1: Analyze the Ticket Sales Over Time

Example Input:

Example Output:

Answer:

SQL Question 2: Design a database for Trainline's Ticket Booking System

Example:

Example:

Example:

Example:

SQL Question 3: How do you identify records in one table that aren't in another?

SQL Question 4: Filter Trainline Customer Records by Specific Conditions

Example Input:

Example Input:

Answer:

SQL Question 5: What's the purpose of the constraint?

SQL Question 6: Calculating Click-Through-Rates for Trainline

Example Input:

Example Input:

Answer:

SQL Question 7: Can you explain what an index is and the various types of indexes?

SQL Question 8: Find The Most Popular Routes

Example Input:

Example Output:

Answer:

SQL Question 9: Finding Customer Data in Train records

Example Input:

Example Input:

Answer:

SQL Question 10: What does the keyword do?

How To Prepare for the Trainline SQL Interview

Trainline Data Science Interview Tips

What Do Trainline Data Science Interviews Cover?

How To Prepare for Trainline Data Science Interviews?

Career Resources

Support

Interview Questions