At Fannie Mae, SQL is often used for managing and querying vast real estate databases, and performing data cleansing. So, it shouldn't surprise you that Fannie Mae almost always asks SQL problems during interviews for Data Analyst, Data Science, and BI jobs.
Thus, to help you prepare, we've curated 11 Fannie Mae SQL interview questions – can you answer each one?
Fannie Mae, popularly known for its mission to provide Mortgage loans, has a vast customers' database. The business wants to identify its Power Users i.e., the users with maximum number of mortgage loans applied in last year. The higher the number of mortgage loans, the greater the value of the user for the business. Can you write a SQL query to find out the having the highest number of mortgage loans applied in the last year?
loan_id | user_id | loan_date | loan_amount |
---|---|---|---|
1071 | 23 | 10/08/2021 00:00:00 | 350000 |
2802 | 65 | 16/03/2021 00:00:00 | 500000 |
2923 | 23 | 18/06/2021 00:00:00 | 400000 |
3352 | 92 | 26/07/2021 00:00:00 | 200000 |
4510 | 91 | 05/05/2021 00:00:00 | 250000 |
This query first filters the table to only contain loans from the last year (2021). It then groups the results by user and counts how many loans each user has. The results are then ordered by the number of loans in descending order, and the top user (i.e., the user with the most loans) is selected. Only the ID of this user and the number of loans they have are returned.
To practice a related customer analytics question on DataLemur's free online SQL code editor, try this Microsoft SQL Interview problem:
Suppose you had a table of Fannie Mae employee salary data. Write a SQL query to find the top three highest paid employees in each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
You can solve this interview question directly within the browser on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
If the code above is confusing, you can find a step-by-step solution with hints here: Top 3 Department Salaries.
BTW Fannie Mae has a blog page! Read up on all the latest updates directly from the source.
While both types of indexes improve the performance of SQL queries by providing a faster way to lookup rows of data, a unique index enforces the uniqueness of the indexed columns, meaning that no duplicate values are allowed in the indexed columns. On the other hand, a non-unique index allows duplicate values in the indexed columns.
Here is an example of a unique index on the column of a table of Fannie Mae employees:
This index would ensure that no two Fannie Mae employees have the same , which could be used as a unique identifier for each employee.
Here is an example of a non-unique index on the column of the same table:
This index would not enforce uniqueness, but it could be used to improve the performance of queries that filter or sort the data based on the column. For example, if you want to retrieve all employees who were Data Analysts, the database can use the index to quickly locate and retrieve the desired records without having to search through all the employees.
Fannie Mae provides guarantee to mortgage lenders. It's crucial for them to understand the pattern of loan amounts they have been guaranteeing. For this question, you are asked to write a SQL query that calculates the monthly average loan amount for each state.
Assume you have the following table called loans:
loan_id | loan_date | loan_amount | state |
---|---|---|---|
101 | 2022-01-01 | 250000 | NY |
102 | 2022-01-10 | 300000 | NY |
103 | 2022-02-01 | 280000 | CA |
104 | 2022-02-15 | 350000 | CA |
105 | 2022-03-01 | 450000 | NY |
106 | 2022-03-05 | 260000 | CA |
Your task is to calculate the monthly average loan amount for each state.
Expected output:
month | state | avg_loan_amount |
---|---|---|
2022-01 | NY | 275000 |
2022-02 | CA | 315000 |
2022-03 | NY | 450000 |
2022-03 | CA | 260000 |
In this SQL query, we are grouping the data by month (using the function, which truncates the date to the precision specified, in this case 'month') and state. For each group, we calculate the average loan amount using the function. Finally, we order our result by month and state.
Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur
A self-join is a type of JOIN where a table is joined to itself. To execute a self-join, you must include the table name twice in the FROM clause and assign a different alias to each instance. You can then join the two copies of the table using a JOIN clause, and use a WHERE clause to specify the relationship between the rows.
For example, say you had website visitor data for Fannie Mae, exported from the company's Google Analytics account. In support of the web-dev team, you had to analyze pairs of pages for UX or navigational issues. As part of that analysis, you wanted to generate all pairs of URLs, but needed to avoid pairs where both the URLs were the same since that's not a valid pair.
The self-join query would like the following:
This query returns the url of each page () along with the url of the page that referred to it (). The self-join is performed using the field, which specifies the id of the page that referred the visitor to the current page, and avoids any pages that referred themself (aka data anomalies).
As a data analyst at Fannie Mae, you have been asked to filter out customer records from the 'Loans' database table to track those who have an 'Active' loan status for 'Single Family' residences. You're also asked to filter based on the loan amount, specifically looking for loans that exceed $200,000.
Below is an example of how the data might look in the 'Loans' table:
loan_id | customer_id | loan_status | property_type | loan_amount |
---|---|---|---|---|
001 | 564 | Active | Single Family | 250000 |
002 | 317 | Active | Multi Family | 180000 |
003 | 284 | Closed | Single Family | 220000 |
004 | 458 | Active | Single Family | 230000 |
005 | 186 | Active | Single Family | 190000 |
Your objective is to write a PostgreSQL query which filters out these specific customers.
This query will return all fields from the Loans table () where the loan status is 'Active', the property type is 'Single Family', and the loan amount exceeds $200,000 ().
The function allows you to select the first non-NULL value from a list of values. It is often used to replace a NULL value with a default value in order to run aggregate functions like or on a column without being affected by values.
For example, if you conducted a customer satisfaction survey for a company and asked customers to rate their level of agreement with a statement on a scale of 1 to 10, some customers may not have answered all the questions, resulting in NULL values in the survey data:
customer_id | question_id | agree_amount |
---|---|---|
101 | 1 | 9 |
101 | 2 | 10 |
202 | 1 | 6 |
202 | 2 | NULL |
303 | 1 | 4 |
303 | 2 | NULL |
To replace the NULLs in the column with the default value of 5 which corresponds to neutral on the survey, you could run the following SQL query:
You'd get the following output:
customer_id | question_id | agree_amount |
---|---|---|
101 | 1 | 9 |
101 | 2 | 10 |
202 | 1 | 6 |
202 | 2 | 5 |
303 | 1 | 4 |
303 | 2 | 5 |
At Fannie Mae, you have access to the table which contains data of the mortgages extended by the company to its customers over the years. Write a SQL query to find the average mortgage amount that each customer has taken from Fannie Mae.
mortgage_id | customer_id | issue_date | amount |
---|---|---|---|
101 | 223 | 06/12/2015 | 250000 |
102 | 223 | 02/20/2018 | 300000 |
103 | 456 | 11/29/2011 | 125000 |
104 | 456 | 05/18/2017 | 150000 |
105 | 789 | 07/01/2021 | 400000 |
This query groups records by the field and calculates the average for each group. The function calculates the average value of the field over each group. The result is a list of customers and the average mortgage amount they have taken.
customer_id | average_amount |
---|---|
223 | 275000 |
456 | 137500 |
789 | 400000 |
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for calculating highest values per group or this Stripe Repeated Payments Question which is similar for identifying repeated transactions by the same customer.
The marketing team at Fannie Mae wants to analyze the efficiency of their promotional campaigns, focusing on their digital products (like their mobile application). An important metric they are interested in is the Click-Through Rate (CTR), defined as the proportion of users who click on an ad link to the total number of users who view the ad.
More specifically, they would like to know the click-through rates of their ads on a monthly basis, with each ad promoting a specific digital product. By understanding these rates, they can pinpoint when and where their marketing strategies are most effective and take targeted actions to boost the popularity of their digital products.
We have two tables - and . The table logs the times when users viewed an ad, with each row representing an instance of an ad being shown to a user. The table logs the times when users clicked on one of these ads.
impression_id | user_id | time_impressed | product |
---|---|---|---|
1 | 123 | 07/01/2022 00:00:00 | App v1.0 |
2 | 265 | 07/01/2022 00:00:00 | App v1.0 |
3 | 362 | 07/01/2022 00:00:00 | App v1.0 |
4 | 192 | 07/01/2022 00:00:00 | App v1.0 |
5 | 981 | 07/01/2022 00:00:00 | App v1.0 |
click_id | user_id | time_clicked | product |
---|---|---|---|
1 | 123 | 07/01/2022 01:00:00 | App v1.0 |
2 | 265 | 07/01/2022 01:00:00 | App v1.0 |
3 | 362 | 07/01/2022 02:00:00 | App v1.0 |
This query will give the month, product, and click-through rate (in percentage terms) by joining two subqueries: one that groups the total number of impressions per month and product, and another that groups the total number of clicks per month and product. The click-through rate is calculated by dividing the total number of clicks by the total number of impressions (and multiplied by 100 to get a percentage). Usage of LEFT JOIN ensures that even products that didn't get any clicks, but impressions are also included in the final result. The output is ordered by month and product for better readability.
To practice a related problem on DataLemur's free online SQL coding environment, solve this Meta SQL interview question:
A foreign key is a field in a database table that links to the primary key of another table, establishing a connection between the two tables. For example, let's analyze Fannie Mae's Google Ads campaigns data:
:
+------------+------------+------------+------------+ | ad_id | campaign_id| keyword | click_count| +------------+------------+------------+------------+ | 1 | 201 | Fannie Mae reviews | 120 | | 2 | 202 | Fannie Mae pricing | 150 | | 3 | 101 | buy Fannie Mae | 65 | | 4 | 101 | Fannie Mae alternatives | 135 | +------------+------------+------------+------------+
is a foreign key that connects to the of the corresponding Google Ads campaign. This establishes a relationship between the ads and their campaigns, enabling easy querying to find which ads belong to a specific campaign or which campaigns a specific ad belongs to.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to link each ad to its ad group and the Google Ads account that the campaigns belong to, respectively.
Fannie Mae, as a mortgage loan company, has a database that stores all the data regarding their loans, including the borrower, loan amount, property state, and date when the loan was acquired. They want to analyze their loans data by identifying the state with the highest total loan amounts each year.
Assuming we have a table named 'loans' in the following format:
Given this, provide a SQL query that returns the state with the highest total loan amount for each year.
Given this, we can use a PostgreSQL query as shown below:
This SQL query first creates a temporary table 'yearly_totals' with the total loan amount for each state each year. Then, for each year, it selects the state with the highest total loan amount.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Beyond just solving the earlier Fannie Mae SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Facebook, Google and unicorn tech startups.
Each DataLemur SQL question has multiple hints, step-by-step solutions and crucially, there's an interactive SQL code editor so you can right online code up your SQL query answer and have it executed.
To prep for the Fannie Mae SQL interview you can also be a great idea to solve SQL problems from other mortgage & loan companies like:
However, if your SQL query skills are weak, don't worry about jumping right into solving questions – refresh your SQL knowledge with this DataLemur SQL tutorial.
This tutorial covers topics including joining a table to itself and filtering on multiple conditions using AND/OR/NOT – both of which pop up frequently in Fannie Mae SQL interviews.
In addition to SQL interview questions, the other topics covered in the Fannie Mae Data Science Interview are:
I'm a tad biased, but I believe the optimal way to prep for Fannie Mae Data Science interviews is to read my book Ace the Data Science Interview.
The book has 201 interview questions taken from Microsoft, Amazon & startups. It also has a crash course on Stats, ML, & Data Case Studies. And finally it's helped thousands of people land their dream job in data, which is why it's earned 1000+ reviews on Amazon and a 4.5-star rating.