At Northern Trust, SQL is used day-to-day for analyzing financial data for trends and insights, and managing databases to improve asset management strategies. That's why Northern Trust almost always evaluates jobseekers on SQL problems during interviews for Data Science, Data Engineering and Data Analytics jobs.
So, to help you study, we've curated 11 Northern Trust SQL interview questions – can you answer each one?
Imagine you are a data analyst at Northern Trust. You have access to a table called that records every transaction for all the customers of the bank. The schema of the table is as follows:
transaction_id | account_id | transaction_date | transaction_type | transaction_amount |
---|---|---|---|---|
123 | 1234 | 01/01/2022 00:00:00 | Deposit | 1000 |
124 | 1234 | 01/02/2022 00:00:00 | Withdrawal | 100 |
125 | 5678 | 01/01/2022 00:00:00 | Deposit | 2000 |
126 | 5678 | 01/01/2022 00:00:00 | Withdrawal | 500 |
127 | 5678 | 01/02/2022 00:00:00 | Deposit | 1500 |
For each account, you are tasked to analyze the average deposit and withdrawal amounts on a per-month basis. Please write a SQL query for this scenario.
Here is suitable PostgreSQL query to solve the problem:
In this SQL statement, a window function is used to calculate the average deposit and withdrawal for each account on a monthly basis. The clause ensures that the average is computed within each account, each month, and each transaction type (either deposit or withdrawal). Finally, the result is ordered by account_id, the month of transaction, and transaction type for easier reading and analysis.
The result will include each account_id, month, transaction type (either deposit or withdrawal) and the corresponding average transaction amount for those categories.
To solve a similar window function question on DataLemur's free interactive coding environment, try this Google SQL Interview Question:
Given a table of Northern Trust employee salary information, write a SQL query to find employees who make more than their own boss.
employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns $8,000, surpassing her manager, William Davis who earns 7,800.
You can solve this interview question directly within the browser on DataLemur:
First, we perform a SELF-JOIN where we treat the first table () as the managers' table and the second table () as the employees' table. Then we use a clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
If the solution above is confusing, you can find a detailed solution with hints here: Employees Earning More Than Managers.
To clarify the distinction between a primary key and a foreign key, let's examine employee data from Northern Trust's HR database:
:
+------------+------------+------------+------------+ | employee_id| first_name | last_name | manager_id | +------------+------------+------------+------------+ | 1 | Aubrey | Graham | 3 | | 2 | Marshal | Mathers | 3 | | 3 | Dwayne | Carter | 4 | | 4 | Shawn | Carter | | +------------+------------+------------+------------+
In this table, serves as the primary key. It uniquely identifies each employee and cannot be null.
functions as a foreign key, linking to the of the employee's manager. This establishes a relationship between Northern Trust employees and their managers, allowing for easy querying to find an employee's manager or see which employees report to a specific manager.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to connect each employee to their respective department and location.
Northern Trust is a financial services company. For this scenario, you are tasked with finding all clients who have more than 200,000.
Here's an example of sample data:
client_id | client_name | state | join_date | account_balance |
---|---|---|---|---|
4356 | John Doe | Illinois | 2018-07-26 | $1,500,000 |
9872 | Jane Smith | New York | 2016-06-14 | $800,000 |
2351 | Robert Johnson | Illinois | 2017-03-01 | $750,000 |
5527 | Laura Davis | Illinois | 2018-05-15 | $1,250,000 |
transaction_id | client_id | transaction date | amount |
---|---|---|---|
8765 | 4356 | 2022-01-23 | $120,000 |
9871 | 9872 | 2022-02-13 | $50,000 |
6589 | 2351 | 2022-02-20 | $220,000 |
3246 | 5527 | 2022-03-01 | $75,000 |
The question will involve using the SQL commands SELECT, FROM, WHERE and AND to filter the clients based on the required criteria.
This query joins the clients and transactions tables on the field. It then filters for clients who live in Illinois and joined after 2015, and who have either more than 200,000 in the last 6 months.
The SQL command merges the results of multiple statements and keeps only those rows that are present in all sets.
For example, say you were doing an HR Analytics project for Northern Trust, and had access to Northern Trust's employees and contractors data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables. You could use operator to find all contractors who also show up in the employees table:
As a financial institution, Northern Trust might be interested in finding out the average transaction volume per account over a certain period of time for their analysis.
Let's assume that we have a table named "transactions" that tracks all the transactions for each account. Here is some sample data:
transaction_id | account_id | transaction_date | transaction_amount |
---|---|---|---|
1001 | AB01 | 01/05/2022 00:00:00 | 2000 |
1002 | AB02 | 01/06/2022 00:00:00 | 5000 |
1003 | AB01 | 01/07/2022 00:00:00 | 7000 |
1004 | AB03 | 01/08/2022 00:00:00 | 10000 |
1005 | AB02 | 01/09/2022 00:00:00 | 5000 |
To answer this question, we could use the AVG function in PostgreSQL in conjunction with the GROUP BY clause to find out the average transaction amount of an account_id. Our SQL query will look something like this:
This query first groups the table by account_id, and then for each group, it calculates the average transaction amount. The result will be a new table where each row corresponds to a unique account_id and has the corresponding average transaction amount.
For example, the average transaction amount for account 'AB01' is calculated by (2000+7000)/2 = 4500, as there are two transactions for this account with amounts 2000 and 7000. Similarly, the average for 'AB02' is (5000+5000)/2 = 5000. And for 'AB03' there is only one transaction with amount 10000 so the average is this amount itself.
account_id | avg_transaction_amount |
---|---|
AB01 | 4500 |
AB02 | 5000 |
AB03 | 10000 |
To practice a very similar question try this interactive Uber User's Third Transaction Question which is similar for dealing with transactions and SQL functions or this Stripe Repeated Payments Question which is similar for dealing with transaction times and SQL queries.
The clause serves as a filter for the groups created by the clause, similar to how the clause filters rows. However, is applied to groups rather than individual rows.
For example, say you were a data analyst at Northern Trust trying to understand how sales differed by region:
This query retrieves the total sales for all products in each region, and uses the clause to only sales made after January 1, 2023. The rows are then grouped by region and the clause filters the groups to include only those with total sales greater than $400,000.
Northern Trust has been running a series of marketing campaigns to attract more clients to their digitized financial services. They want to analyze the effectiveness of their campaigns. For each email campaign, they track whether a client clicked on the link in the email (a "click") and whether that click led to the client signing up for a new service (a "conversion").
The marketing team has asked you to calculate the click-through rate (CTR), which is the number of clicks divided by the number of emails sent (expressed as a percentage), and the conversion rate, which is the number of conversions divided by the number of clicks (expressed as a percentage), for each campaign.
You are provided with two data tables:
campaign_id | emails_sent |
---|---|
101 | 5000 |
102 | 7500 |
103 | 6200 |
result_id | campaign_id | clicks | conversions |
---|---|---|---|
401 | 101 | 200 | 50 |
402 | 102 | 350 | 75 |
403 | 103 | 310 | 55 |
The company wants the answer table to look like this:
campaign_id | ctr | conversion_rate |
---|---|---|
101 | 4.00% | 25.00% |
102 | 4.67% | 21.43% |
103 | 5.00% | 17.74% |
This SQL query joins two tables: and on the column. For each campaign, it computes the click-through rate (CTR) by dividing the number of clicks by the number of emails sent and multiplying by 100 to convert to a percentage. The conversion rate is calculated by dividing the number of conversions by the number of clicks and multiplying by 100 to convert to a percentage. The results are formatted as text followed by the percentage symbol (%).
To practice another question about calculating rates, try this TikTok SQL question on DataLemur's online SQL code editor:
See the latest news from Northern Trust and stay up to date before the interview!
As Northern Trust is a financial services company, let's assume they maintain portfolios for clients. These portfolios could fluctuate in value over time as markets rise and fall. Given this, could you provide a breakdown of the average portfolio value by client and quarter for the year 2022?
Please make use of the following example tables:
portfolio_id | client_id | value | date |
---|---|---|---|
1011 | 30 | 100000.00 | 03/01/2022 |
1102 | 30 | 102000.00 | 06/30/2022 |
1202 | 45 | 80000.00 | 01/12/2022 |
1301 | 45 | 85000.00 | 03/14/2022 |
1402 | 45 | 90000.00 | 06/20/2022 |
1502 | 60 | 120000.00 | 01/05/2022 |
quarter | client_id | average_portfolio_value |
---|---|---|
Q1 | 30 | 100000.00 |
Q2 | 30 | 102000.00 |
Q1 | 45 | 82500.00 |
Q2 | 45 | 90000.00 |
Q1 | 60 | 120000.00 |
In this query, we extract the quarter from the date and group by this along with the client_id to calculate the average portfolio value for each client per quarter in 2022.
A is a column or set of columns in a table that references the primary key of another table. It is used to link the two tables together, and to ensure that the data in the foreign key column is valid.
The constraint helps to ensure the integrity of the data in the database by preventing the insertion of rows in the foreign key table that do not have corresponding entries in the primary key table. It also helps to enforce the relationship between the two tables, and can be used to ensure that data is not deleted from the primary key table if there are still references to it in the foreign key table.
For example, if you have a table of Northern Trust customers and an orders table, the customer_id column in the orders table could be a that references the id column (which is the primary key) in the Northern Trust customers table.
As an analyst at Northern Trust, your task is to determine the total account balances for each customer. You are given two tables, and .
The table has the following schema:
CustomerID | FirstName | LastName | |
---|---|---|---|
1 | John | Doe | john.doe@example.com |
2 | Jane | Smith | jane.smith@example.com |
3 | Tom | Thompson | tom.thompson@example.com |
4 | Susie | Queue | susie.queue@example.com |
The table has the following schema:
TransactionID | CustomerID | TransactionDate | Amount |
---|---|---|---|
1001 | 1 | 2022-09-01 | 500.00 |
1002 | 1 | 2022-09-15 | -200.00 |
1003 | 2 | 2022-08-01 | 300.00 |
1004 | 3 | 2022-07-15 | 200.00 |
1005 | 2 | 2022-10-01 | -100.00 |
Write a SQL query to obtain the total balance for each customer using the data provided in the above tables. Remember, debit transactions will be represented as negative amounts.
In the above query, we first join the and tables on . Then, we group the data by , , and and calculate the total balance for each customer by summing the from the table. The result will give us the total account balance for each customer.
Since joins come up so often during SQL interviews, try an interactive SQL join question from Spotify:
The key to acing a Northern Trust SQL interview is to practice, practice, and then practice some more! In addition to solving the earlier Northern Trust SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Google, Facebook, Microsoft and Amazon.
Each SQL question has multiple hints, fully explained answers along with a discussion board to see how others solved it and most importantly, there's an interactive coding environment so you can right in the browser run your SQL query and have it executed.
To prep for the Northern Trust SQL interview you can also be useful to solve interview questions from other banking & finanacial services companies like:
In case your SQL coding skills are weak, forget about jumping right into solving questions – improve your SQL foundations with this free SQL tutorial.
This tutorial covers SQL topics like Union vs. UNION ALL and using LIKE – both of these show up routinely during Northern Trust SQL assessments.
In addition to SQL interview questions, the other question categories covered in the Northern Trust Data Science Interview are:
To prepare for Northern Trust Data Science interviews read the book Ace the Data Science Interview because it's got: