At Compass, Data Scientists, Analysts, and Data Engineers rely on SQL to extract and analyze real estate data, including property listings and sales history, as well as manage data in the property recommendation system to match buyer preferences with available listings. For this reason, Compass includes SQL interview questionsin its hiring process to assess candidates' skills.
To help you prepare for the Compass SQL interview, here’s 11 Compass SQL interview questions in this blog.
Imagine you are a data analyst at Compass, a real estate company. Your task is to analyze the change in average listing price by date for the various types of properties, made by different agents, available on the platform.
The platform maintains a property listings database, with a structure as shown below:
listing_id | agent_id | listing_date | property_type | listing_price |
---|---|---|---|---|
101 | 201 | 2022-01-13 | Condo | 250000 |
102 | 207 | 2022-02-14 | Single Family | 350000 |
103 | 201 | 2022-03-22 | Townhouse | 300000 |
104 | 208 | 2022-01-15 | Single Family | 400000 |
105 | 210 | 2022-02-22 | Condo | 200000 |
Your task is to write a SQL query to identify the average listing price, per month, per property type.
In this query, we used the window function , this function traverse through each row of each partition (here partition is made by same month and property type) and computes the average listing price.
As a result, we'll get a list of average listing prices separated by month and property type.
p.s. Window functions show up pretty often during SQL interviews, so practice the 27+ window function questions on DataLemur
Suppose you had a table of Compass employee salary data. Write a SQL query to find all employees who earn more than their own manager.
employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns $8,000, surpassing her manager, William Davis who earns 7,800.
Solve this interview question interactively on DataLemur:
First, we perform a SELF-JOIN where we treat the first table () as the managers' table and the second table () as the employees' table. Then we use a clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
If the solution above is tough, you can find a step-by-step solution here: Well Paid Employees.
A correlated sub-query is one that is linked to the outer query and cannot be executed on its own. It uses the outer query to filter or transform data by referencing a column from the outer query, while the outer query uses the results of the inner query. On the other hand, a non-correlated sub-query is independent of the outer query and can be executed on its own. It does not reference any columns from the outer query and is used to retrieve additional data for the outer query.
Correlated sub-queries are slower to execute, as they have to be re-evaluated for each row of the outer query, while non-correlated sub-queries are faster, as they only have to be executed once.
Compass is a real-estate technology company that provides a platform for buying, selling and renting properties. Given this, you are tasked with designing a database to hold listing information. The information required includes listing id, list date, price, address, number of bedrooms, number of bathrooms, and the agent id associated with the property.
In addition, create a table for agents which includes agent id, first name, last name, and total listings managed. Once you have created the database design, answer the following question: Which agents had listed properties with prices above $1 million in the month of June, 2022?
listing_id | list_date | price | address | bedrooms | bathrooms | agent_id |
---|---|---|---|---|---|---|
1 | 06/15/2022 | 1250000 | "123 Main St" | 3 | 2 | 1 |
2 | 06/18/2022 | 850000 | "456 Oak St" | 2 | 1 | 2 |
3 | 06/20/2022 | 1750000 | "789 Pine St" | 4 | 3 | 1 |
4 | 07/02/2022 | 950000 | "111 Elm St" | 3 | 2 | 3 |
agent_id | first_name | last_name | total_listings |
---|---|---|---|
1 | "John" | "Doe" | 50 |
2 | "Jane" | "Smith" | 45 |
3 | "Jim" | "Brown" | 60 |
The above query joins the listings and agents tables on the agent_id column. It then filters on the price being above $1 million and the listing date being in June, 2022. The result will be the first and last names of those agents who meet these criteria.
An inner join only includes rows from both tables that satisfy the join condition, whereas a full outer join includes all rows from both tables, regardless of whether they match the join condition or not.
For example, suppose you had a table of Compass orders and Compass customers.
INNER JOIN: When there is a match in the shared key or keys, rows from both tables are retrieved. In this example, an between the Orders and Customers tables would retrieve rows where the in the Orders table matches the in the Customers table.
FULL OUTER JOIN: A retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
At Compass, we often need to filter customer records based on their residential status and their city. Using the customer records database,
For this exercise, consider the table has the following fields: , , , , .
Here's a sample of the type of data in the table:
id | name | city | status | created_at |
---|---|---|---|---|
1 | John Doe | New York | Homeowner | 07/01/2022 00:00:00 |
2 | Jane Smith | San Francisco | Renter | 07/01/2022 00:00:00 |
3 | Andrew Fernandez | Boston | Homeowner | 07/02/2022 00:00:00 |
4 | Diane Watkins | New York | Renter | 07/02/2022 00:00:00 |
5 | George Anderson | New York | Homeowner | 07/03/2022 00:00:00 |
This query will return all rows from the table where the is and the is .
This query will return all rows from the table where the is not and the is not .
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1st, 2nd, 3rd normal forms).
A few reasons to denormalize a database:
Improved performance: Denormalization often reduces the the number of costly join operations that are needed to retrieve data. This is helpful when the database is being used for OLAP (Online Analytical Processing) use cases at Compass, as joins can be expensive and slow.
Scalability: Denormalization can be like a turbo boost for your database's scalability! By requiring less data to be read and processed when executing a query, denormalization can help your database handle a ton of queries without breaking a sweat.
Simplification: One way to simplify the design of a database is by using denormalization to reduce the number of tables and relationships that need to be managed. This can make the database easier to understand and maintain.
Of course, don't over-do the database denormalization magic – too much redundancy can increase data storage costs, and get complicated to manage if there's frequent commands that need to be run.
As a real estate company, Compass has lots of data about property prices. For this question, we are interested in the average price of different types of properties in each borough.
Given the table that contains detailed information about the properties listed on the platform, write a SQL query to determine the average price for each type of property (e.g., condos, single family homes) for each borough in the city.
Here is example data from the table:
listing_id | borough | property_type | price |
---|---|---|---|
1 | Brooklyn | condo | 600000 |
2 | Queens | single family home | 800000 |
3 | Manhattan | condo | 2000000 |
4 | Brooklyn | single family home | 700000 |
5 | Queens | condo | 500000 |
6 | Brooklyn | condo | 700000 |
7 | Manhattan | single family home | 2500000 |
8 | Queens | single family home | 850000 |
The result should be a table that groups by boroughs and property types and shows their average prices.
borough | property_type | avg_price |
---|---|---|
Brooklyn | condo | 650000 |
Brooklyn | single family home | 700000 |
Queens | condo | 500000 |
Queens | single family home | 825000 |
Manhattan | condo | 2000000 |
Manhattan | single family home | 2500000 |
This query first groups the listings by both borough and property type, then calculates the average price of the listings in each group. The result is the average price for each type of property in each borough.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for requiring grouping and aggregating items of a certain type or this Facebook Average Post Hiatus (Part 1) Question which is similar for calculating and grouping by different criteria.
Compass, a real estate technology company, advertises its various property listings on their site. The marketing team is interested in knowing the click-through-conversion rate, i.e., the proportion of users who viewed an ad and subsequently added the property to their wishlist. Create a SQL statement to calculate the click-through-conversion rates for different properties.
view_id | user_id | view_date | property_id |
---|---|---|---|
101 | 123 | 09/10/2021 00:00:00 | 50001 |
102 | 234 | 09/10/2021 00:00:00 | 90001 |
103 | 123 | 09/11/2021 00:00:00 | 50001 |
104 | 456 | 09/12/2021 00:00:00 | 30001 |
add_id | user_id | add_date | property_id |
---|---|---|---|
201 | 123 | 09/10/2021 00:00:00 | 50001 |
202 | 234 | 09/11/2021 00:00:00 | 90001 |
203 | 678 | 09/12/2021 00:00:00 | 30001 |
In the above PostgreSQL query, we're calculating the total number of unique views and wishlist adds for each property. The click-through-rate is calculated as the ratio of unique wishlist adds to unique views. The is used to ensure that we consider all views even if there were no corresponding adds to the wishlist.
To solve a related SQL interview question on DataLemur's free interactive SQL code editor, attempt this SQL interview question asked by Facebook:
Unique indexes help ensure that there are no duplicate key values in a table, maintaining data integrity. They enforce uniqueness whenever keys are added or changed within the index.
Non-unique indexes on the other hand, are used to improve query performance by maintaining a sorted order of frequently used data values, but they do not enforce constraints on the associated table.
Unique Indexes are blazing fast. Non unique indexes can improve query performance, but they are often slower because of their non-unique nature.
As a real estate brokerage firm, Compass frequently needs to analyze its inventory of property listings. Write a SQL query that calculates the average, minimum, and maximum listing price for each property type (e.g., house, apartment, condo) in each city.
listing_id | city | property_type | listing_price |
---|---|---|---|
101 | New York | Apartment | 850000 |
102 | New York | House | 2000000 |
103 | San Francisco | Condo | 1200000 |
104 | San Francisco | Apartment | 1100000 |
105 | New York | Apartment | 1200000 |
106 | San Francisco | House | 3000000 |
city | property_type | avg_price | min_price | max_price |
---|---|---|---|---|
New York | Apartment | 1025000 | 850000 | 1200000 |
New York | House | 2000000 | 2000000 | 2000000 |
San Francisco | Apartment | 1100000 | 1100000 | 1100000 |
San Francisco | Condo | 1200000 | 1200000 | 1200000 |
San Francisco | House | 3000000 | 3000000 | 3000000 |
The output of this query shows the average, minimum, and maximum listing price for each type of property in each city. For example, in New York, the average price of an apartment is 1,025,000, with a range from 850,000 to 1,200,000.
The best way to prepare for a Compass SQL interview is to practice, practice, practice. In addition to solving the above Compass SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Google, Uber, and Microsoft.
Each exercise has multiple hints, full answers and most importantly, there is an online SQL code editor so you can right in the browser run your query and have it executed.
To prep for the Compass SQL interview you can also be a great idea to solve SQL problems from other tech companies like:
In case your SQL query skills are weak, forget about jumping right into solving questions – refresh your SQL knowledge with this free SQL for Data Analytics course.
This tutorial covers topics including filtering groups with HAVING and advantages of CTEs vs. subqueries – both of which pop up frequently during Compass interviews.
In addition to SQL interview questions, the other question categories covered in the Compass Data Science Interview include:
To prepare for Compass Data Science interviews read the book Ace the Data Science Interview because it's got:
Also focus on the behavioral interview – prep for that using this Behavioral Interview Guide for Data Scientists. You should also look into the latest trends and innovations from Compass.