At PennyMac Financial Services, SQL is used for analyzing loan portfolio performance and managing customers databases. For this reason PennyMac LOVES to ask SQL query questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.
Thus, to help you study, we've curated 10 PennyMac Financial Services SQL interview questions – able to solve them?
In PennyMac, a mortgage company, there are multiple loan officers handling loans. For each loan officer, we want to analyze the total loan amount they handled and the average loan amount per loan that they managed. Each loan has a unique id, loan amount, and loan officer in charge.
Write a SQL query to produce the following output:
Order your results based on the rank in ascending order.
loan_id | loan_officer_id | loan_amount |
---|---|---|
101 | 1 | 250000 |
102 | 2 | 300000 |
103 | 1 | 500000 |
104 | 3 | 400000 |
105 | 2 | 200000 |
loan_officer_id | total_loan_amount | average_loan_amount | rank |
---|---|---|---|
1 | 750000 | 375000 | 1 |
3 | 400000 | 400000 | 2 |
2 | 500000 | 250000 | 3 |
In this query, we group the data by loan officer id and calculate the total and average loan amount for each loan officer. We then use the window function RANK to rank the officers based on the total loan amount in descending order. The result is ordered by rank ascendingly.
To solve another window function question on DataLemur's free interactive SQL code editor, solve this Amazon BI Engineer interview question:
Imagine there was a table of PennyMac employee salary data. Write a SQL query to find the top 3 highest paid employees in each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
You can solve this question and run your code right in DataLemur's online SQL environment:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
If the code above is hard to understand, you can find a step-by-step solution with hints here: Top 3 Department Salaries.
Views are a lot like virtual tables, where you can take a base table and customize it (such as by hiding some data from non-admin users, or removing some random columns/rows based on business requirements).
Here's the PostgreSQL syntax for creating a view based on data in the table:
PennyMac is a leading mortgage loan company. It provides a number of services to its users. Suppose you are tasked with maintaining the database that stores information about the mortgages for its users. The database should keep track of who took out the mortgage, mortgage details such as loan number, amount, start date and end date, property details, and payment records.
Create suitable tables for this scenario and write a SQL query to find out which user has taken out the most number of loans with PennyMac.
user_id | first_name | last_name |
---|---|---|
123 | Alice | Brown |
456 | John | Smith |
mortgage_id | user_id | loan_number | loan_amount | start_date | end_date |
---|---|---|---|---|---|
1 | 123 | 0001 | 100000 | 2022-01-01 | 2032-01-01 |
2 | 456 | 0002 | 150000 | 2022-01-01 | 2032-01-01 |
3 | 123 | 0003 | 200000 | 2022-06-01 | 2032-06-01 |
property_id | mortgage_id | address | property_value |
---|---|---|---|
1 | 1 | 123 Main St | 100000 |
2 | 2 | 456 Main St | 150000 |
3 | 3 | 789 Main St | 200000 |
payment_id | mortgage_id | payment_date | payment_amount |
---|---|---|---|
1 | 1 | 2022-01-01 | 1000 |
2 | 1 | 2022-02-01 | 1000 |
3 | 2 | 2022-01-01 | 1500 |
4 | 2 | 2022-02-01 | 1500 |
5 | 3 | 2022-06-01 | 2000 |
6 | 3 | 2022-07-01 | 2000 |
In this query, we first join the and tables on the field. We then group the results by , , and and count the number of mortgages each user has taken out. We order the results by the count in descending order and limit the result to the top 1 to find the user who has taken the most loans.
In SQL, both and are used to rank rows within a result set. The key difference between the two functions is how deal with two or more rows having the same value in the ranked column (aka how the break ties).
RANK(): When there's a tie, leaves a gap in the ranking. For example, if three rows are tied for 2nd place, the RANK() function will assign a rank of 2 to the first of these rows, a rank of 3 to the second row, and a rank of 4 to the third row.
DENSE_RANK(): For ties, does not leave a gap in the ranking. Instead, it assigns the same rank to all tied rows, and then makes the next row 1 bigger. Confusing, I know, but here's an example to make it more clear: if three rows are tied for 3rd place, the function will assign a rank of 3 to all three rows, and then assign a rank of 4 to the next row.
PennyMac, a finance company, would likely deal with loan-related data. Thus, you might be asked a question such as: "What is the Average Loan Amount taken by Customers every quarter?"
loan_id | customer_id | loan_date | loan_amount |
---|---|---|---|
001 | 101 | 01/04/2022 | 150000 |
002 | 102 | 03/02/2022 | 235000 |
003 | 101 | 04/05/2022 | 145000 |
004 | 103 | 05/22/2022 | 198000 |
005 | 102 | 05/25/2022 | 245000 |
006 | 101 | 06/06/2022 | 175000 |
We expect to calculate the average loan amount per quarter of the year.
quarter_of_year | avg_loan_amount |
---|---|
Q1 | 192500 |
Q2 | 172666.66 |
This SQL query first classifies every loan's date to a quarter of the year by extracting the month from the loan date. It then calculates the average loan amount for each quarter by using the AVG aggregate function. Group By is used to group results by each quarter, and Order By is used to sort the resulting rows in ascending order of the quarter of the year.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for requiring aggregation of data over a time period or this Alibaba Compressed Mean Question which is similar for focusing on calculation of averages.
A primary key is a special type of column or set of columns in a database table that is used to identify each row in the table. It helps to ensure that all the rows are distinct and prevents null values. Additionally, primary keys play a crucial role in maintaining the accuracy of the data by making sure that there are no duplicate rows. They are also used to establish relationships between tables.
To create a primary key in a SQL database, you can use the constraint. For example, in the table , the column is the primary key.
You are a data analyst at PennyMac, a company providing home loans and refinancing services. The marketing department runs several digital ad campaigns to attract potential customers. You are tasked to analyse the click-through-rates of these campaigns.
You have the data in two tables - ad_campaigns and ad_clicks.
The ad_campaigns table contains information of each campaign:
campaign_id | campaign_start_date | campaign_end_date | total_impressions |
---|---|---|---|
1 | 01/01/2022 | 01/31/2022 | 10000 |
2 | 02/01/2022 | 02/28/2022 | 12000 |
3 | 03/01/2022 | 03/31/2022 | 15000 |
The ad_clicks table contains data of each click event:
click_id | campaign_id | click_date |
---|---|---|
1 | 1 | 01/01/2022 |
2 | 1 | 01/04/2022 |
3 | 1 | 01/10/2022 |
4 | 2 | 02/05/2022 |
5 | 2 | 02/20/2022 |
6 | 3 | 03/03/2022 |
7 | 3 | 03/10/2022 |
8 | 3 | 03/25/2022 |
The click-through-rate (CTR) is calculated as the total number of clicks that your ads receive divided by the total number of impressions (views) that your ads get.
Calculate the click-through-rate (CTR) for each campaign.
This query joins ad_campaigns and ad_clicks on and calculates the CTR for each campaign. It counts the number of clicks () for each campaign and divides it by the number of impressions provided by the campaign. The result is multiplied by 100 to obtain the CTR in percentage. The output is ordered by .
To solve a related problem on DataLemur's free interactive coding environment, solve this Meta SQL interview question:
Check out PennyMac's career page and see what type of role ould be the best fit for you!
As a part of the SQL Analyst team at PennyMac, you have been provided with a database of customer records. Your task is to write a SQL query that allows you to filter down the customer loan applications and find records that contain specific text in the 'description' field. For this task, filter for any record that contains the word 'refinance' in its description.
app_id | customer_id | application_date | loan_amt | description |
---|---|---|---|---|
1 | 1001 | 06/08/2022 | $300,000 | Refinance existing mortgage |
2 | 1002 | 06/10/2022 | $500,000 | New home loan application |
3 | 1003 | 06/18/2022 | $350,000 | Refinance mortgage for lower rate |
4 | 1004 | 07/26/2022 | $250,000 | Looking to refinance home |
5 | 1005 | 07/05/2022 | $450,000 | Applying for first-time home loan |
This SQL query will filter down the data in the 'loan_applications' table and return any record where the description contains the word 'refinance'. In the provided sample input, the query would return the records for 1, 3, and 4. The '%' symbol in the LIKE clause operates as a wildcard that matches any sequence of characters. Thus, 'refinance' could appear anywhere in the 'description' field.
The clause in SQL allows you to select records that are unique, eliminating duplicates.
For example, if you had a table of PennyMac employees, and wanted to see what are all the unique job titles that currently worked at the company, you could write the following query:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Beyond just solving the earlier PennyMac SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Netflix, Airbnb, and Amazon.
Each DataLemur SQL question has hints to guide you, step-by-step solutions and best of all, there is an online SQL coding environment so you can right in the browser run your SQL query answer and have it graded.
To prep for the PennyMac SQL interview you can also be helpful to solve SQL problems from other mortgage & loan companies like:
But if your SQL coding skills are weak, don't worry about jumping right into solving questions – go learn SQL with this SQL tutorial for Data Analytics.
This tutorial covers SQL concepts such as using LIKE and aggregate functions like SUM()/COUNT()/AVG() – both of these pop up frequently during SQL job interviews at PennyMac.
In addition to SQL query questions, the other types of problems tested in the PennyMac Data Science Interview include:
To prepare for PennyMac Data Science interviews read the book Ace the Data Science Interview because it's got: