10 Bank of New York Mellon SQL Interview Questions (Updated 2024)

Updated on

June 30, 2024

At Bank of New York Mellon, SQL is typically used for analyzing transaction data to identify fraud patterns, and creating customer profiles for targeted marketing. For this reason Bank of New York Mellon asks SQL query questions during interviews for Data Science, Analytics, and & Data Engineering jobs.

So, to help you study, we've curated 10 Bank of New York Mellon SQL interview questions – how many can you solve?

Bank of New York Mellon SQL Interview Questions

10 Bank of New York Mellon SQL Interview Questions

SQL Question 1: Identify the Top Performing Account Holders

A common requirement for a banking institution like "Bank of New York Mellon" is to identify their top performing account holders. Here, a top performing account holder can be considered as a user who carries out high value transactions frequently. This would be very important for the business in terms of managing risks and revenue.

Write a SQL query to extract the top 5 account holders who have carried out the highest total value of transactions in the last quarter.

Example Input:
account_iduser_idaccount_typeopening_date
11001Savings2010-01-01
21002Checking2012-05-12
31003Savings2015-07-01
41004Checking2018-03-11
51005Savings2020-06-20
Example Input:
transaction_idaccount_idtransaction_datetransaction_amount
500112022-07-0110000
500222022-07-105000
500332022-07-1512000
500422022-08-017000
500512022-08-1015000
500642022-08-156000
500712022-09-0112000
500832022-09-105000
500942022-09-1510000

Answer:


This query joins table with table on and filters the transactions to select only those which occurred in the last quarter (i.e., between 2022-07-01 and 2022-09-30). It then groups the data by and sums up the for each user. The result is then ordered in descending order of to get the top performing users, and is used to obtain only the top 5 users.

To practice another SQL customer analytics question where you can solve it interactively and have your SQL solution instantly graded, try this Walmart Labs SQL Interview Question: Walmart Labs SQL Interview Question

SQL Question 2: Top 3 Salaries

Given a table of Bank of New York Mellon employee salary information, write a SQL query to find the top three highest paid employees within each department.

Bank of New York Mellon Example Input:

employee_idnamesalarydepartment_id
1Emma Thompson38001
2Daniel Rodriguez22301
3Olivia Smith20001
4Noah Johnson68002
5Sophia Martinez17501
8William Davis68002
10James Anderson40001

Example Input:

department_iddepartment_name
1Data Analytics
2Data Science

Example Output:

department_namenamesalary
Data AnalyticsJames Anderson4000
Data AnalyticsEmma Thompson3800
Data AnalyticsDaniel Rodriguez2230
Data ScienceNoah Johnson6800
Data ScienceWilliam Davis6800

You can solve this problem interactively on DataLemur:

Top 3 Department Salaries

Answer:

We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.


If the solution above is confusing, you can find a step-by-step solution with hints here: Top 3 Department Salaries.

Read about BNY's Data & Analytics division and how they use it to enhance business operations.

SQL Question 3: What is database denormalization, and when is it a good idea to consider it?

Denormalization is the process of modifying a database schema in a way that deviates from the typical rules of normalization (1NF, 2NF, 3NF, etc.). There's a few reasons to denormalize a database:

Improved performance: Joins are slow AF, especially when dealing with the massive datasets that are typically used at Bank of New York Mellon. Denormalization can improve the performance of a database by reducing the number of joins that are required to retrieve data. This can be particularly useful when the database is being used for querying and reporting purposes, as joins can be expensive and slow.

Scalability: By reducing the amount of data that needs to be read and processed to execute a query, denormalization can enhance the scalability of a database. This can be useful when the database is anticipated to handle a large number of read-only queries (such as in OLAP use cases).

Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.

Because denormalization can create more complex update and delete operations, and pose potential data integrity issues, consider denormalization only if joins are causing performance bottlenecks.

Bank of New York Mellon SQL Interview Questions

SQL Question 4: Analyzing Monthly Transaction Volumes and Balances

As a data analyst at Bank of New York Mellon, one of your tasks is to analyze the monthly transaction volumes and balances for the various types of accounts. From the dataset given below, write a SQL query to find each account type, the month, total transactions for that month, and the average balance throughout the month.

Please use PostgreSQL for this query.

Example Input:
transaction_idaccount_idtransaction_dateamountaccount_typebalance
1151234501/08/2022200savings1500
2022234501/10/2022500checking1200
3572234501/12/2022700checking900
5121234502/03/2022300savings1200
6781234502/15/2022100savings1100
Example Output:
monthaccount_typetotal_transactionsaverage_balance
1savings11500.00
1checking21050.00
2savings21150.00

Answer:


This PostgreSQL query first extracts the month from the transaction_date and combines it with the account type to create partitions. It then counts the number of transactions per part and calculates the average balance of each part. Finally, it groups by month and account type and orders the results the same way for a neat output.

To solve a related window function SQL problem on DataLemur's free interactive SQL code editor, solve this Google SQL Interview Question: Google SQL Interview Question

SQL Question 5: What does do?

is used to combine the output of multiple statements into one big result!

For a concrete example, say you were doing an HR Analytics project for Bank of New York Mellon and needed to analyze both Bank of New York Mellon's employees and contractors who were hired after 2022 started. You could use in the following way:


SQL Question 6: Design a Database for Loan tracking in Bank of New York Mellon

As a Data Engineer in Bank of New York Mellon, you are assigned a task to design a relational database to track the bank's loans, their respective status (i.e., active, paid, defaulted), customers who have taken the loan, and the employees who have handled the loan process.

Moreover, the management team would like to know the total amount loaned for a specific period (monthly, quarterly, annually, etc.) and the default rate.

You are asked to create this database design and provide an SQL query that can retrieve the total loan amount, and number of defaulted loans in the first quarter of 2022.

Sample Input:
customer_idfirstnamelastname
1001JohnDoe
1002JaneSmith
1003EmilyDavis
Sample Input:
employee_idfirstnamelastname
2001RobertJohnson
2002OliverTaylor
2003SophiaWilliams
Sample Input:
loan_idstatusissue_datecustomer_idemployee_idloan_amount
5001active01/01/2022100120015000.00
5002defaulted02/01/20221001200210000.00
5003paid03/01/2022100220012000.00
5004active03/15/2022100320037000.00

Answer:

The SQL query for Postgres is mentioned below:


This query first extracts the year and quarter from the issue_date. It then selects only those rows that are within the first quarter of 2022.

Next, it sums up the loan_amount column to get the total loan amount and uses a CASE statement to count the number of defaulted loans.

Finally, it groups the result set by year and quarter to get the summary for each quarter of each year.

SQL Question 7: DBMS transactions are expected to follow the ACID properties. What are they, and what does each property mean?

A DBMS (database management system), in order to ensure transactions are relaible and don't ruin the integrity of the data, tries to mantain the following ACID properties: Atomicity, Consistency, Isolation, and Durability.

To make this concept more concrete, here is what each of the ACID properties would mean in the context of banking transactions:

  • Atomicity: a transaction is either completed fully, or not complete at all. For example, if a customer is transferring money from one account to another, the transaction should either transfer the full amount or none at all.
  • Consistency: a transaction will only be completed if it follows all database constraints and checks. For example, if a customer is withdrawing money from an account, the transaction should only be completed if the account has sufficient funds available, otherwise the transaction is rejected
  • Isolation: ensures that concurrent transactions are isolated from each other, so that the changes made by one transaction cannot be seen by another transaction. This isolation prevents race conditions, like two customers trying to withdraw money from the same account at the same time.
  • Durability: ensures that once a transaction has been committed and completed, the changes are permanent. A reset / shutdown of the database shouldn't erase someone's savings accounts!

SQL Question 8: Filtering Customer Records

As part of Bank of New York Mellon's customer relations strategy, their data team needs to identify a certain group of customers based on specific conditions. The team would like to filter these customers from their large database. Using , write a query that identifies customers that are:

  1. Located in New York or California.
  2. Have a balance greater than $5,000.
  3. Have an account that is more than 1 year old.

Assume you have a table with the following layout:

Example Input:
customer_idfirst_namelast_namestatebalanceaccount_created
10504JohnDoeNew York$10,0002019-02-21 00:00:00
10403JaneSmithCalifornia$15,0002019-01-10 00:00:00
6032OliverJonesFlorida$4,0002020-04-01 00:00:00
2198EmmaJohnsonTexas$6,0002018-07-18 00:00:00
5093NoahWilliamsNew York$2,0002021-07-08 00:00:00

Answer:


This query filters out the customers by their location () and their in the table. The conditions used are which finds customers located in either New York or California, which filters customers that have a balance greater than 5000 and finds customers whose account is more than a year old.

SQL Interview Question 9: Analyzing Digital Banking Ad Click-Through Rates

As a data analyst at Bank of New York Mellon, you're asked to analyze the click-through rates of digital ads for their digital banking app. You have two tables, one recording impressions (i.e. the displaying of digital ads), and another recording click-throughs (i.e. the actual clicking of the ads).

The table has the following columns:

  • (unique identifier for each impression)
  • (unique identifier for each user)
  • (date and time of the impression)
  • (unique identifier for each ad)
Example Input:
impression_iduser_idimpression_datead_id
207152308/10/2022 00:00:004001
480211608/10/2022 00:10:004001
234393408/11/2022 00:00:004002
719220508/12/2022 00:00:004002
305746308/13/2022 00:00:004001

The table has the following columns:

  • (unique identifier for each click)
  • (unique identifier for each user)
  • (date and time when the click happened)
  • (unique identifier for each ad)
Example Input:
click_iduser_idclick_datead_id
507152308/10/2022 00:01:004001
452093408/11/2022 00:01:004002
655446308/13/2022 00:01:004001
781993408/13/2022 01:00:004001

Write a SQL query to calculate the click-through rate (CTR) for each ad by dividing the number of unique clicks by the number of unique impressions, and format the result as a percentage.

RSVP: Considerations should be taken into account to function within a "same session" concept, considering a window of 30 minutes.

Solution:


This query works by first joining the and tables on and while considering the click event occurred within a 30 mins window of the impression. Then it groups by in order to calculate the click-through rate for each ad separately. The rate is calculated as the number of unique user clicks divided by the number of unique user impressions, and expressed as a percentage.

To practice another question about calculating rates, try this SQL interview question from TikTok within DataLemur's online SQL coding environment: TikTok SQL Interview Question

SQL Question 10: What does do, and when would you use this SQL command?

combines the results from two or more SELECT statements, but only returns the rows that are exactly the same between the two sets.

For example, say you were doing an HR Analytics project for Bank of New York Mellon, and had access to Bank of New York Mellon's contractors and employees data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables. You could use operator to find all employees who also show up in the contractors table:


Preparing For The Bank of New York Mellon SQL Interview

The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the above Bank of New York Mellon SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Microsoft, Google, and Facebook. DataLemur SQL Interview Questions

Each SQL question has hints to guide you, step-by-step solutions and crucially, there's an online SQL coding environment so you can easily right in the browser your SQL query answer and have it graded.

To prep for the Bank of New York Mellon SQL interview you can also be wise to solve SQL questions from other investment banking companies like:

But if your SQL foundations are weak, forget about jumping right into solving questions – strengthen your SQL foundations with this SQL interview tutorial.

SQL tutorial for Data Analytics

This tutorial covers SQL topics like WHERE with AND/OR/NOT and HAVING – both of these show up often in SQL job interviews at Bank of New York Mellon.

Bank of New York Mellon Data Science Interview Tips

What Do Bank of New York Mellon Data Science Interviews Cover?

Besides SQL interview questions, the other types of questions tested in the Bank of New York Mellon Data Science Interview include:

  • Statistics and Probability Questions
  • Python or R Programming Questions
  • Product Analytics Questions
  • ML Interview Questions
  • Behavioral Interview Questions

Bank of New York Mellon Data Scientist

How To Prepare for Bank of New York Mellon Data Science Interviews?

To prepare for Bank of New York Mellon Data Science interviews read the book Ace the Data Science Interview because it's got:

  • 201 interview questions taken from Facebook, Google, & Amazon
  • a crash course on Python, SQL & ML
  • over 1000+ reviews on Amazon & 4.5-star rating

Acing Data Science Interview

© 2024 DataLemur, Inc

Career Resources

Free 9-Day Data Interview Crash CourseFree SQL Tutorial for Data AnalyticsUltimate SQL Interview GuideAce the Data Job Hunt Video CourseAce the Data Science InterviewBest Books for Data Analysts