Data Science, Data Engineering and Data Analytics employees at Citizens Bank write use SQL for analyzing banking transactions and predicting customer behavior based on their financial interactions. For this reason Citizens Bank asks prospective hires SQL interview problems.
So, to help you prepare, here’s 8 Citizens Financial Group SQL interview questions – can you solve them?
As a data analyst at Citizens Bank, you are tasked with analyzing loan data. Write a SQL query that calculates the average loan rate per month for each type of loan product.
Consider the "loans" table for this scenario is formatted as follows:
loan_id | customer_id | loan_opened_date | loan_product_id | loan_rate |
---|---|---|---|---|
1023 | 506 | 2021-01-15 | 3240 | 4.5 |
2347 | 901 | 2021-02-09 | 9812 | 3.7 |
5432 | 189 | 2021-02-25 | 3240 | 5.0 |
6721 | 804 | 2021-03-13 | 9812 | 4.2 |
7319 | 431 | 2021-03-29 | 3240 | 4.7 |
The output should show the month, loan product id, and the average loan rate for that product in that month.
month | loan_product_id | avg_loan_rate |
---|---|---|
01 | 3240 | 4.5 |
02 | 3240 | 5.0 |
02 | 9812 | 3.7 |
03 | 3240 | 4.7 |
03 | 9812 | 4.2 |
Here is the SQL Query for PostgreSQL:
The above query will calculate the average loan rate by each type of loan product for each month in our dataset. We first convert 'loan_opened_date' into month format, and then group (using ) by 'loan_product_id' and 'month' to calculate the average loan rate. The clause is the window function being implemented in this query. Window functions perform calculation across a set of rows that are related to the current row.
To solve a related window function SQL problem on DataLemur's free interactive SQL code editor, solve this Google SQL Interview Question:
Given a table of Citizens Bank employee salary information, write a SQL query to find the 2nd highest salary amongst all the employees.
employee_id | salary |
---|---|
1 | 2500 |
2 | 800 |
3 | 1000 |
4 | 1200 |
second_highest_salary |
---|
1200 |
Write a SQL query for this interview question directly within the browser on DataLemur:
You can find a step-by-step solution with hints here: 2nd Highest Salary.
The keyword added to a statement can be used to get records without duplicates.
For example, say you had a table of Citizens Bank customers:
name | city |
---|---|
Akash | SF |
Brittany | NYC |
Carlos | NYC |
Diego | Seattle |
Eva | SF |
Faye | Seattle |
Suppose you wanted to figure out which cities the customers lived in, but didn't want duplicate results, you could write a query like this:
Your result would be:
city |
---|
SF |
NYC |
Seattle |
As a data analyst for Citizens Bank, you are asked to find out the average loan amount based on different loan types. The bank offers different loan types such as home loans, auto loans, personal loans, etc. Your manager wants this information to understand the average amount customers usually borrow for each type of loan.
Provided below is a sample table of the loans:
loan_id | customer_id | loan_type | loan_amount | loan_date |
---|---|---|---|---|
101 | 8745 | Home | 500000 | 01/22/2021 |
102 | 1842 | Auto | 20000 | 03/15/2021 |
103 | 7841 | Personal | 15000 | 04/18/2021 |
104 | 9271 | Home | 350000 | 05/25/2021 |
105 | 5461 | Auto | 30000 | 06/19/2021 |
Your task is to write a SQL query that generates the average loan amount for each loan type and sorts the average loan amount in descending order.
This query first groups the loans by their type using the GROUP BY clause. It then calculates the average loan amount for each loan type using the AVG function. The result is then sorted in descending order by the average loan amount using the ORDER BY clause. The output of this query enables the bank to understand the average amount of money borrowed by their customers based on each loan type. This helps them to understand the preference of their customers based on loan amount for each loan type.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for grouping and finding top items or this Stripe Repeated Payments Question which is similar for handling and analyzing transaction data.
Take a look at the Citizens Bank career page and see what role might be the best fit for you!
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1st, 2nd, 3rd normal forms).
For example, in a database that stores Citizens Bank sales analytics data, you might have separate tables for "customers," "orders," and "products," with foreign key constraints linking the tables together. This helps to ensure the integrity of the data and reduces redundancy, but it can also make queries that involve multiple tables more complex and slower to execute.
By de-normalizing the database, you can combine some of the data from separate tables into a single table, which can reduce the number of joins that are required to retrieve the data you need. This can speed up queries and improve the performance of your database.
However, it's important to carefully consider the trade-offs of de-normalization before making any changes to your database. De-normalization can make it more difficult to maintain the integrity and reliability of your data, and can also increase the risk of data redundancy. It's generally best to use de-normalization as a performance optimization technique only when necessary, and to carefully evaluate the benefits and drawbacks in the context of your specific database and workload.
For Citizens Bank, suppose you have been given access to the database of customer records. Your task is to find all customers who live in the city of 'Boston'.
The table has the following structure:
customer_id | first_name | last_name | city | state | |
---|---|---|---|---|---|
1 | John | Doe | Boston | Massachusetts | johndoe@example.com |
2 | Jane | Smith | New York | New York | janesmith@example.com |
3 | Sarah | Brown | Boston | Massachusetts | sarahbrown@example.com |
4 | Mark | Thomas | San Francisco | California | markthomas@example.com |
5 | Emma | Williams | Seattle | Washington | emmawilliams@example.com |
Here is a SQL query that can find all customers in Boston:
You will get all customer records where the "city" column value is "Boston".
customer_id | first_name | last_name | city | state | |
---|---|---|---|---|---|
1 | John | Doe | Boston | Massachusetts | johndoe@example.com |
3 | Sarah | Brown | Boston | Massachusetts | sarahbrown@example.com |
The result shows the customer details that match the specific condition 'Boston' in the city field.
The CHECK constraint is used to set a rule for the data in a column. If a row is inserted or updated and the data in the column does not follow the rule specified by the CHECK constraint, the operation will be unsuccessful.The CHECK constraint is often used in conjunction with other constraints, such as NOT NULL or UNIQUE.
You might consider implementing the CHECK constraint in your database if you want to ensure that certain data meets specific conditions. This can be helpful for maintaining the quality and reliability of your data.
For example, you might use a CHECK constraint to ensure that a column contains only positive numbers, or that a date is within a certain range.
As a data analyst at Citizens Bank, analyze customer transaction behavior by writing a SQL query to find the total number of Checking and Savings accounts per customer along with their total transactions. Use a SQL JOIN operation to join the customers table with the accounts table and the transactions table.
customer_id | first_name | last_name |
---|---|---|
1 | John | Doe |
2 | Jane | Smith |
3 | Mary | Johnson |
account_id | customer_id | account_type |
---|---|---|
1 | 1 | Checking |
2 | 1 | Savings |
3 | 2 | Checking |
4 | 3 | Savings |
account_id | transaction_id | transaction_amt |
---|---|---|
1 | 1 | 100 |
2 | 2 | 200 |
2 | 3 | 300 |
3 | 4 | 400 |
4 | 5 | 500 |
customer_id | first_name | last_name | account_type | total_transactions |
---|---|---|---|---|
1 | John | Doe | Checking | 100 |
1 | John | Doe | Savings | 500 |
2 | Jane | Smith | Checking | 400 |
3 | Mary | Johnson | Savings | 500 |
This SQL query operates by first joining the customers table with the accounts table based on the customer ID. It then joins the result with the transactions table based on the account ID. It subsequently groups the records by customer ID, first name, last name, and account type, while summing the transaction amounts to find the total transactions for each account type per customer.
Because joins come up so often during SQL interviews, take a stab at this Spotify JOIN SQL question:
The key to acing a Citizens Bank SQL interview is to practice, practice, and then practice some more! Besides solving the above Citizens Bank SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Netflix, Airbnb, and Amazon.
Each DataLemur SQL question has hints to guide you, step-by-step solutions and best of all, there is an interactive coding environment so you can instantly run your SQL query answer and have it checked.
To prep for the Citizens Bank SQL interview you can also be a great idea to solve interview questions from other banking & finanacial services companies like:
But if your SQL coding skills are weak, forget about jumping right into solving questions – strengthen your SQL foundations with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers things like LEAD window function and Subqueries – both of which show up often in Citizens Bank interviews.
Beyond writing SQL queries, the other types of problems to practice for the Citizens Bank Data Science Interview include:
The best way to prepare for Citizens Bank Data Science interviews is by reading Ace the Data Science Interview. The book's got: