At Bank of New York Mellon, SQL is typically used for analyzing transaction data to identify fraud patterns, and creating customer profiles for targeted marketing. For this reason Bank of New York Mellon asks SQL query questions during interviews for Data Science, Analytics, and & Data Engineering jobs.
So, to help you study, we've curated 10 Bank of New York Mellon SQL interview questions – how many can you solve?
A common requirement for a banking institution like "Bank of New York Mellon" is to identify their top performing account holders. Here, a top performing account holder can be considered as a user who carries out high value transactions frequently. This would be very important for the business in terms of managing risks and revenue.
Write a SQL query to extract the top 5 account holders who have carried out the highest total value of transactions in the last quarter.
account_id | user_id | account_type | opening_date |
---|---|---|---|
1 | 1001 | Savings | 2010-01-01 |
2 | 1002 | Checking | 2012-05-12 |
3 | 1003 | Savings | 2015-07-01 |
4 | 1004 | Checking | 2018-03-11 |
5 | 1005 | Savings | 2020-06-20 |
transaction_id | account_id | transaction_date | transaction_amount |
---|---|---|---|
5001 | 1 | 2022-07-01 | 10000 |
5002 | 2 | 2022-07-10 | 5000 |
5003 | 3 | 2022-07-15 | 12000 |
5004 | 2 | 2022-08-01 | 7000 |
5005 | 1 | 2022-08-10 | 15000 |
5006 | 4 | 2022-08-15 | 6000 |
5007 | 1 | 2022-09-01 | 12000 |
5008 | 3 | 2022-09-10 | 5000 |
5009 | 4 | 2022-09-15 | 10000 |
This query joins table with table on and filters the transactions to select only those which occurred in the last quarter (i.e., between 2022-07-01 and 2022-09-30). It then groups the data by and sums up the for each user. The result is then ordered in descending order of to get the top performing users, and is used to obtain only the top 5 users.
To practice another SQL customer analytics question where you can solve it interactively and have your SQL solution instantly graded, try this Walmart Labs SQL Interview Question:
Given a table of Bank of New York Mellon employee salary information, write a SQL query to find the top three highest paid employees within each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
You can solve this problem interactively on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
If the solution above is confusing, you can find a step-by-step solution with hints here: Top 3 Department Salaries.
Read about BNY's Data & Analytics division and how they use it to enhance business operations.
Denormalization is the process of modifying a database schema in a way that deviates from the typical rules of normalization (1NF, 2NF, 3NF, etc.). There's a few reasons to denormalize a database:
Improved performance: Joins are slow AF, especially when dealing with the massive datasets that are typically used at Bank of New York Mellon. Denormalization can improve the performance of a database by reducing the number of joins that are required to retrieve data. This can be particularly useful when the database is being used for querying and reporting purposes, as joins can be expensive and slow.
Scalability: By reducing the amount of data that needs to be read and processed to execute a query, denormalization can enhance the scalability of a database. This can be useful when the database is anticipated to handle a large number of read-only queries (such as in OLAP use cases).
Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.
Because denormalization can create more complex update and delete operations, and pose potential data integrity issues, consider denormalization only if joins are causing performance bottlenecks.
As a data analyst at Bank of New York Mellon, one of your tasks is to analyze the monthly transaction volumes and balances for the various types of accounts. From the dataset given below, write a SQL query to find each account type, the month, total transactions for that month, and the average balance throughout the month.
Please use PostgreSQL for this query.
transaction_id | account_id | transaction_date | amount | account_type | balance |
---|---|---|---|---|---|
115 | 12345 | 01/08/2022 | 200 | savings | 1500 |
202 | 22345 | 01/10/2022 | 500 | checking | 1200 |
357 | 22345 | 01/12/2022 | 700 | checking | 900 |
512 | 12345 | 02/03/2022 | 300 | savings | 1200 |
678 | 12345 | 02/15/2022 | 100 | savings | 1100 |
month | account_type | total_transactions | average_balance |
---|---|---|---|
1 | savings | 1 | 1500.00 |
1 | checking | 2 | 1050.00 |
2 | savings | 2 | 1150.00 |
This PostgreSQL query first extracts the month from the transaction_date and combines it with the account type to create partitions. It then counts the number of transactions per part and calculates the average balance of each part. Finally, it groups by month and account type and orders the results the same way for a neat output.
To solve a related window function SQL problem on DataLemur's free interactive SQL code editor, solve this Google SQL Interview Question:
is used to combine the output of multiple statements into one big result!
For a concrete example, say you were doing an HR Analytics project for Bank of New York Mellon and needed to analyze both Bank of New York Mellon's employees and contractors who were hired after 2022 started. You could use in the following way:
As a Data Engineer in Bank of New York Mellon, you are assigned a task to design a relational database to track the bank's loans, their respective status (i.e., active, paid, defaulted), customers who have taken the loan, and the employees who have handled the loan process.
Moreover, the management team would like to know the total amount loaned for a specific period (monthly, quarterly, annually, etc.) and the default rate.
You are asked to create this database design and provide an SQL query that can retrieve the total loan amount, and number of defaulted loans in the first quarter of 2022.
customer_id | firstname | lastname |
---|---|---|
1001 | John | Doe |
1002 | Jane | Smith |
1003 | Emily | Davis |
employee_id | firstname | lastname |
---|---|---|
2001 | Robert | Johnson |
2002 | Oliver | Taylor |
2003 | Sophia | Williams |
loan_id | status | issue_date | customer_id | employee_id | loan_amount |
---|---|---|---|---|---|
5001 | active | 01/01/2022 | 1001 | 2001 | 5000.00 |
5002 | defaulted | 02/01/2022 | 1001 | 2002 | 10000.00 |
5003 | paid | 03/01/2022 | 1002 | 2001 | 2000.00 |
5004 | active | 03/15/2022 | 1003 | 2003 | 7000.00 |
The SQL query for Postgres is mentioned below:
This query first extracts the year and quarter from the issue_date. It then selects only those rows that are within the first quarter of 2022.
Next, it sums up the loan_amount column to get the total loan amount and uses a CASE statement to count the number of defaulted loans.
Finally, it groups the result set by year and quarter to get the summary for each quarter of each year.
A DBMS (database management system), in order to ensure transactions are relaible and don't ruin the integrity of the data, tries to mantain the following ACID properties: Atomicity, Consistency, Isolation, and Durability.
To make this concept more concrete, here is what each of the ACID properties would mean in the context of banking transactions:
As part of Bank of New York Mellon's customer relations strategy, their data team needs to identify a certain group of customers based on specific conditions. The team would like to filter these customers from their large database. Using , write a query that identifies customers that are:
Assume you have a table with the following layout:
customer_id | first_name | last_name | state | balance | account_created |
---|---|---|---|---|---|
10504 | John | Doe | New York | $10,000 | 2019-02-21 00:00:00 |
10403 | Jane | Smith | California | $15,000 | 2019-01-10 00:00:00 |
6032 | Oliver | Jones | Florida | $4,000 | 2020-04-01 00:00:00 |
2198 | Emma | Johnson | Texas | $6,000 | 2018-07-18 00:00:00 |
5093 | Noah | Williams | New York | $2,000 | 2021-07-08 00:00:00 |
This query filters out the customers by their location () and their in the table. The conditions used are which finds customers located in either New York or California, which filters customers that have a balance greater than 5000 and finds customers whose account is more than a year old.
As a data analyst at Bank of New York Mellon, you're asked to analyze the click-through rates of digital ads for their digital banking app. You have two tables, one recording impressions (i.e. the displaying of digital ads), and another recording click-throughs (i.e. the actual clicking of the ads).
The table has the following columns:
impression_id | user_id | impression_date | ad_id |
---|---|---|---|
2071 | 523 | 08/10/2022 00:00:00 | 4001 |
4802 | 116 | 08/10/2022 00:10:00 | 4001 |
2343 | 934 | 08/11/2022 00:00:00 | 4002 |
7192 | 205 | 08/12/2022 00:00:00 | 4002 |
3057 | 463 | 08/13/2022 00:00:00 | 4001 |
The table has the following columns:
click_id | user_id | click_date | ad_id |
---|---|---|---|
5071 | 523 | 08/10/2022 00:01:00 | 4001 |
4520 | 934 | 08/11/2022 00:01:00 | 4002 |
6554 | 463 | 08/13/2022 00:01:00 | 4001 |
7819 | 934 | 08/13/2022 01:00:00 | 4001 |
Write a SQL query to calculate the click-through rate (CTR) for each ad by dividing the number of unique clicks by the number of unique impressions, and format the result as a percentage.
RSVP: Considerations should be taken into account to function within a "same session" concept, considering a window of 30 minutes.
This query works by first joining the and tables on and while considering the click event occurred within a 30 mins window of the impression. Then it groups by in order to calculate the click-through rate for each ad separately. The rate is calculated as the number of unique user clicks divided by the number of unique user impressions, and expressed as a percentage.
To practice another question about calculating rates, try this SQL interview question from TikTok within DataLemur's online SQL coding environment:
combines the results from two or more SELECT statements, but only returns the rows that are exactly the same between the two sets.
For example, say you were doing an HR Analytics project for Bank of New York Mellon, and had access to Bank of New York Mellon's contractors and employees data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables. You could use operator to find all employees who also show up in the contractors table:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the above Bank of New York Mellon SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Microsoft, Google, and Facebook.
Each SQL question has hints to guide you, step-by-step solutions and crucially, there's an online SQL coding environment so you can easily right in the browser your SQL query answer and have it graded.
To prep for the Bank of New York Mellon SQL interview you can also be wise to solve SQL questions from other investment banking companies like:
But if your SQL foundations are weak, forget about jumping right into solving questions – strengthen your SQL foundations with this SQL interview tutorial.
This tutorial covers SQL topics like WHERE with AND/OR/NOT and HAVING – both of these show up often in SQL job interviews at Bank of New York Mellon.
Besides SQL interview questions, the other types of questions tested in the Bank of New York Mellon Data Science Interview include:
To prepare for Bank of New York Mellon Data Science interviews read the book Ace the Data Science Interview because it's got: