At CoStar Group, SQL is used quite frequently for analyzing large real estate databases and powering their market analytics tool. Because of this, CoStar almost always evaluates jobseekers on SQL problems in interviews for Data Analyst, Data Science, and BI jobs.
As such, to help you ace the CoStar SQL interview, we've collected 9 CoStar Group SQL interview questions – how many can you solve?
As a real estate data company, CoStar collects information about property listings across various cities. Given a table 'Listings', write a SQL query to calculate the average listing price by city and month for the year 2020, and rank the cities by their average listing prices within each month.
Assume we are provided with the following 'Listings' table:
Note: Price here is in thousands.
listing_id | city | listing_date | price |
---|---|---|---|
101 | New York | 2020-01-15 | 1500 |
102 | Chicago | 2020-01-20 | 700 |
103 | Los Angeles | 2020-01-25 | 1300 |
104 | New York | 2020-02-10 | 1800 |
105 | Chicago | 2020-02-20 | 800 |
106 | Los Angeles | 2020-02-25 | 1500 |
107 | New York | 2020-03-15 | 2100 |
108 | Chicago | 2020-03-20 | 900 |
109 | Los Angeles | 2020-03-25 | 1700 |
mth | city | avg_price | rank |
---|---|---|---|
1 | New York | 1500 | 2 |
1 | Chicago | 700 | 3 |
1 | Los Angeles | 1300 | 1 |
2 | New York | 1800 | 2 |
2 | Chicago | 800 | 3 |
2 | Los Angeles | 1500 | 1 |
3 | New York | 2100 | 1 |
3 | Chicago | 900 | 3 |
3 | Los Angeles | 1700 | 2 |
In this solution, we use the window function to rank the cities according to the average listing price within each month. is used to create a separate ranking for each month, and is used to sort the average prices in descending order. The result set is then ordered by month and ranking order.
For more window function practice, solve this Uber SQL Interview Question within DataLemur's interactive coding environment:
CoStar has a database that records customer details, their subscription status and their interaction level with the application. CoStar would like you to write a SQL query to filter out active subscribers who have not interacted with the application in the last 30 days.
The 'customers' table is as follows:
customer_id | subscription_status | last_interaction_date |
---|---|---|
1123 | active | 2022-08-10 |
2451 | active | 2022-09-05 |
3872 | inactive | 2022-08-20 |
5478 | active | 2022-07-10 |
6349 | active | 2022-09-15 |
Assume today's date is 2022-09-20.
You need to write a SQL query to find all active customers who have not interacted with the application in the last 30 days.
Here is a PostgreSQL query to solve this question:
This PostgreSQL query filters the 'customers' table to show the rows where the 'subscription_status' column is 'active' and 'last_interaction_date' is more than 30 days prior to today's date. The "INTERVAL '30 days'" command in PostgreSQL subtracts 30 days from the current date.
A non-relational (NoSQL) database is any database that does not use the typical tabular format of rows and columns like in relational databases.
While knowing the four different types of NoSQL databases is beyond the scope of most Data Analyst & Data Science interviews, Data Engineers at CoStar should vaguely refresh these concepts:
As an analyst at CoStar, a leading provider of commercial real estate information, analytics and online marketplace, you are tasked with understanding key metrics about the listed properties. Calculate the average price per square foot for the listed properties.
property_id | listing_price | square_foot |
---|---|---|
101 | 500000 | 2000 |
102 | 600000 | 2500 |
103 | 750000 | 3000 |
104 | 400000 | 1800 |
105 | 450000 | 1800 |
avg_price_per_sqft |
---|
241.91 |
The above SQL command first calculates the price per square foot for each property by dividing the listing price by the square footage. The ::decimal is used to cast the results as a decimal type for more accurate division. The AVG function is then used to calculate the average of these values.
To practice a very similar question try this interactive Amazon Maximize Prime Item Inventory Question which is similar for calculating average based on square footage or this Alibaba Compressed Mean Question which is similar for calculating mean on a dataset.
Unique indexes help ensure that there are no duplicate key values in a table, maintaining data integrity. They enforce uniqueness whenever keys are added or changed within the index.
To define a unique index in PostgreSQL, you can use the following syntax:
To define a non-unique index in PostgreSQL, you can use the following syntax:
Non-unique indexes on the other hand, are used to improve query performance by maintaining a sorted order of frequently used data values, but they do not enforce constraints on the associated table.
CoStar, a commercial real estate information company, is interested in understanding the click-through-to-conversion rates of the products they list on their website. For a given time period, they want to know the proportion of users who viewed a product and subsequently added it to their cart.
Two tables contain relevant information:
click_id | user_id | click_time | product_id |
---|---|---|---|
10001 | 101 | 06/08/2022 00:00:00 | 2001 |
10002 | 102 | 06/10/2022 00:00:00 | 2002 |
10003 | 101 | 06/18/2022 00:00:00 | 2001 |
10004 | 103 | 07/26/2022 00:00:00 | 2003 |
10005 | 104 | 07/05/2022 00:00:00 | 2004 |
conversion_id | user_id | conversion_time | product_id |
---|---|---|---|
21001 | 101 | 06/08/2022 00:05:00 | 2001 |
21002 | 105 | 06/10/2022 00:15:00 | 2005 |
21003 | 101 | 06/18/2022 00:07:00 | 2001 |
21004 | 106 | 07/26/2022 00:10:00 | 2006 |
21005 | 104 | 07/05/2022 00:06:00 | 2004 |
product_id | click-through-conversion_rate |
---|---|
2001 | 100.00 |
2002 | 0.00 |
2003 | 0.00 |
2004 | 100.00 |
2005 | 0.00 |
2006 | 0.00 |
The SQL query will first count the number of clicks on each product_id and the number of conversions, if any. Then it will calculate the click-through-conversion_rate by dividing the conversions by the clicks and multiplying by 100 to express it as a percentage.
The COALESCE function is used to handle cases where there are zero conversions for a product_id, in which case the click-through-conversion_rate would be NULL. This function changes those NULL values to 0.
In the sample output, 2001 and 2004 have a 100% click through conversion rate because every click led to a conversion. The remaining product_ids have a 0% rate because no corresponding conversions were found for those product clicks.
To practice a similar problem on DataLemur's free interactive coding environment, attempt this Meta SQL interview question:
A is a column or set of columns in a table that references the primary key of another table. It is used to link the two tables together, and to ensure that the data in the foreign key column is valid.
The constraint helps to ensure the integrity of the data in the database by preventing the insertion of rows in the foreign key table that do not have corresponding entries in the primary key table. It also helps to enforce the relationship between the two tables, and can be used to ensure that data is not deleted from the primary key table if there are still references to it in the foreign key table.
For example, if you have a table of CoStar customers and an orders table, the customer_id column in the orders table could be a that references the id column (which is the primary key) in the CoStar customers table.
Given two tables from a database, and , write a SQL query to analyze customer metrics.
You need to find out the total number of transactions, total amount spent, and average amount spent per transaction for each customer. Join these tables on the 'customer_id'. Sort the result by total amount spent in descending order.
customer_id | first_name | last_name | SignUpDate | |
---|---|---|---|---|
001 | John | Doe | john_doe@example.com | 2020-06-02 |
002 | Jane | Doe | jane_doe@example.com | 2021-07-15 |
003 | Bob | Smith | bob_smith@example.com | 2018-01-20 |
004 | Alice | Johnson | alice_johnson@example.com | 2019-11-11 |
005 | Charlie | Brown | charlie_brown@example.com | 2020-04-07 |
transaction_id | customer_id | date | amount | product_id |
---|---|---|---|---|
1001 | 001 | 2022-04-02 | 200.50 | ABC123 |
1002 | 002 | 2022-06-10 | 150.75 | XYZ456 |
1003 | 003 | 2022-07-20 | 50.00 | ABC123 |
1004 | 004 | 2022-08-30 | 75.00 | XYZ456 |
1005 | 001 | 2022-09-15 | 250.00 | ABC123 |
This query first joins the and tables on the 'customer_id' field. Then, for each customer it counts the total number of transactions (total_transactions), sums the total amount spent (total_amount), and calculates the average amount per transaction (avg_amount). The results are grouped by each unique customer and sorted in descending order by the total amount spent.
Because joins come up frequently during SQL interviews, try an interactive Spotify JOIN SQL question:
CoStar is a large company that provides data on the real estate market. You are a data analyst for CoStar and have been given a task involving their Rent Report data. Each monthly report provides information about the average rent and total units available for rental properties in a given city for a specific month and year. The reports cover the past 3 years.
report_id | city_id | report_date | avg_rent | units_available |
---|---|---|---|---|
2091 | 5045 | 08/01/2020 | 1250 | 2050 |
2387 | 5045 | 09/01/2020 | 1325 | 1900 |
2753 | 5045 | 08/01/2021 | 1390 | 2150 |
3059 | 5045 | 09/01/2021 | 1450 | 2000 |
3402 | 6273 | 08/01/2020 | 1000 | 1850 |
3719 | 6273 | 09/01/2020 | 1025 | 1750 |
4282 | 6273 | 08/01/2021 | 1100 | 1900 |
4657 | 6273 | 09/01/2021 | 1150 | 1850 |
Your task is to write a SQL query to calculate the change in average rent and number of units available for each city from August 2020 to August 2021.
city_id | rent_change | unit_change |
---|---|---|
5045 | 140 | 100 |
6273 | 100 | 50 |
To answer this question, we would need to run a query where we calculate the difference between the average rent and total units available for each city for the months of August in 2020 and 2021.
Here is the PostgreSQL query that would accomplish that:
In this example, the query first selects the "city_id", then calculates the changes in "rent" and "units_available" between August 2020 and August 2021 for each city. The subqueries help retrieve the specific data for the calculation. We finally group by "city_id" and filter using a WHERE clause to support cities that existed in our timeframe. This results in a new table showing the change in rent and total units available for each city.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the CoStar SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the earlier CoStar SQL interview questions, you should also solve the 200+ SQL exercises on DataLemur which come from companies like Microsoft, Google, Amazon, and tech startups.
Each DataLemur SQL question has multiple hints, fully explained answers along with a discussion board to see how others solved it and most importantly, there is an interactive coding environment so you can instantly run your query and have it checked.
To prep for the CoStar SQL interview it is also helpful to practice SQL questions from other tech companies like:
But if your SQL query skills are weak, don't worry about diving straight into solving questions – refresh your SQL knowledge with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers SQL topics like different types of joins and Self-Joins – both of these show up routinely in SQL job interviews at CoStar.
Beyond writing SQL queries, the other topics to practice for the CoStar Data Science Interview are:
The best way to prepare for CoStar Data Science interviews is by reading Ace the Data Science Interview. The book's got: