At Zillow, MySQL is used across the company for analyzing housing data from databases and generating product analytics insights. For this reason, Zillow almost always evaluates jobseekers on SQL coding questions during interviews for Data Science, Analytics, and & Data Engineering jobs.
As such, to help you study for the Zillow SQL interview, we've curated 11 Zillow SQL interview questions – able to answer them all?
Given a Zillow dataset containing historical listing information, write a SQL query to calculate and compare the monthly average listing prices for properties based in Seattle and San Francisco over last 6 months.
Consider the data in the table to be formatted as below:
listing_id | city | list_date | price |
---|---|---|---|
1001 | Seattle | 2022-06-01 | 1500000 |
1002 | Seattle | 2022-07-02 | 1350000 |
1003 | Seattle | 2022-08-03 | 1250000 |
2001 | San Francisco | 2022-07-01 | 2250000 |
2002 | San Francisco | 2022-08-02 | 2350000 |
2003 | San Francisco | 2022-09-03 | 2300000 |
In the query above, the function is used to truncate the list_date to a monthly precision. This is important because we are calculating monthly averages.
The function is used to calculate the average price per city per month.
The clause is a window function which is partitioning the data by city and by each month to calculate the average price.
The clause restricts the data to only include listings from Seattle and San Francisco and only for the last 6 months.
The clause sorts the result by the month and city for easy interpretation of results.
Pro Tip: Window functions are a popular SQL interview topic, so practice every window function problem on DataLemur
As an employee for Zillow, the online real estate marketplace, it's crucial to understand the data around housing listed on the platform from different sellers. This helps in having transparency, accountability and decision making.
Given a database that has two tables and , with each listing being linked to a seller:
The table has the following fields:
The table has:
listing_id | seller_id | submission_date | listing_price | is_sold |
---|---|---|---|---|
1001 | 200 | 01/10/2022 | 500000 | No |
1002 | 201 | 01/15/2022 | 450000 | Yes |
1003 | 200 | 02/17/2022 | 550000 | No |
1004 | 202 | 02/25/2022 | 600000 | Yes |
1005 | 203 | 03/05/2022 | 700000 | Yes |
seller_id | seller_name |
---|---|
200 | John Doe |
201 | Jane Doe |
202 | Bill Smith |
203 | Sarah Johnson |
Write a SQL query to list the name of every seller and the sum of the prices for all of their unsold listings.
The SQL query would be framed as:
This query joins the and tables using the field. It then filters out all listings which are not sold and finally, groups the remaining entries by seller name and calculates the total unsold listing price for each seller. The output would provide us with each seller's name and their total unsold housing property prices on Zillow.
ACID refers to the four key properties that are essential to the reliable and correct execution of database transactions. These properties are:
Atomicity: ensures that a transaction is treated as a single operation, and either all of the changes are made or none of them are! Basically, the database version of a "-FULL SEND-"
Consistency: ensures that the data is in a consistent state before and after a transaction is completed. For example, if wiring money to a friendly Nigerian prince whose fallen on hard times, consistency ensures that the total value of funds lost in my account is the same amount that's gained in the prince's account!
Isolation: ensures that the intermediate state of a transaction is invisible to other transactions. Back to the wiring-the-prince-some-money example, isolation ensures that another transaction sees the transferred funds in my account OR the princes, but not in both accounts at the same time
Durability: ensures that once a transaction has been completed successfully, the changes made by the transaction are permanent and cannot be undone, even in the event of a system failure. Basically, no taksies backsies (even if your system has a meltdown!).
Suppose you are a data analyst at Zillow. Your task is to filter the customer data for customers who have listed properties in Boston and Seattle in the past month, and their listing price was over $500,000. Write a SQL query to find the customer_id, name, city, listing_date, and property_price of such customers. Ignore listings that are currently inactive.
Let's create some sample data for the problem:
customer_id | name |
---|---|
6171 | John Doe |
7802 | Jane Smith |
5293 | Emma Johnson |
listing_id | customer_id | city | listing_date | property_price | is_active |
---|---|---|---|---|---|
50001 | 6171 | Boston | 06/08/2022 00:00:00 | 550000 | 1 |
69852 | 7802 | Seattle | 06/10/2022 00:00:00 | 600000 | 1 |
50001 | 5293 | Boston | 06/18/2022 00:00:00 | 400000 | 0 |
69852 | 5293 | Seattle | 07/26/2022 00:00:00 | 500001 | 1 |
69852 | 7802 | Seattle | 07/05/2022 00:00:00 | 650000 | 1 |
With this query, we're joining the and table on . We are then filtering the records where the city is either 'Boston' or 'Seattle', the listing price is over $500,000, the listing date is in the past month, and the listing is currently active.
A primary key is a column (or set of columns) in a table that uniquely identifies each row in the table. It cannot contain null values and must be unique across all rows in the table.
A foreign key is a column (or set of columns) in a table that references the primary key of another table. It is used to establish a relationship between the two tables. A foreign key can contain null values, and multiple rows in the referencing table can reference the same row in the referenced table.
For example, consider a database with two tables: and . The Zillow customers table might have a primary key column called , while the Zillow orders table might have a foreign key column called that references the column in the table. This establishes a relationship between the two tables, such that each row in the orders table corresponds to a specific Zillow customer.
Given the table which contains data about each real estate listing, write a SQL query to find the average listing price per city.
This query groups the listings by city (city_id) and then computes the average (AVG) price for each group. The AS keyword is used to rename the columns in the output.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for requiring group by and aggregate functions or this LinkedIn Duplicate Job Listings Question which is similar for dealing with count of unique items in a category.
A is a field in a table that references the of another table. It creates a link between the two tables and ensures that the data in the field is valid.
For example, if you have a table of Zillow customers and an orders table, the customer_id column in the orders table could be a that references the id column (which is the primary key) in the Zillow customers table.
The constraint helps maintain the integrity of the data in the database by preventing the insertion of rows in the table that do not have corresponding entries in the table. It also enforces the relationship between the two tables and prevents data from being deleted from the table if it is still being referenced in the table.
Zillow.com often evaluates the effectiveness of their property listings by measuring the click-through conversion rates. These rates illustrate the progression from the initial stage of property viewing to the subsequent stage of property inquiry. Given property viewing log () and property inquiry log (), can you write a SQL query that computes the overall click-through conversion rate (from viewing a property to inquiring about it) for each property listed on the website?
Please consider the 'viewed_date' and 'inquiry_date' for each log. The click-through conversion rate can be calculated as the number of property inquiries per property view (in percentage form).
Here is an example of the logs.
viewed_id | user_id | viewed_date | property_id |
---|---|---|---|
1001 | 430 | 2022-07-15 | 1203 |
1002 | 444 | 2022-07-15 | 1203 |
1003 | 475 | 2022-07-16 | 1203 |
1004 | 435 | 2022-07-15 | 1298 |
1005 | 470 | 2022-07-16 | 1298 |
inquiry_id | user_id | inquiry_date | property_id |
---|---|---|---|
4001 | 430 | 2022-07-15 | 1203 |
4002 | 476 | 2022-07-17 | 1203 |
4003 | 444 | 2022-07-18 | 1203 |
4004 | 470 | 2022-07-16 | 1298 |
4005 | 440 | 2022-07-19 | 1298 |
In this query, we join the and tables on the matching property and user details. We also ensure that the is earlier than or equal to the , which reflects a logical user activity sequence. Finally, we cluster the data by listing () and calculate the desired click-through conversion rate.
To practice a related SQL problem on DataLemur's free online SQL code editor, try this Facebook SQL Interview question:
Zillow is a famous real estate platform where customers can search for properties to buy, sell, or rent. Let's say Zillow wants to track customers who have interests in two-bedroom houses for its marketing campaign. Zillow wants a list of customers whose search queries include the term '2-bedroom' during the month of August.
You are given a table called 'customer_searches'. Columns in the table include: (which is unique), , , and . Note that can include extra spaces, be in upper case, lower case, or a combination.
search_id | customer_id | search_query | search_date |
---|---|---|---|
1 | 145 | "2-bedroom apartment san francisco" | 2022-07-01 |
2 | 535 | "1-bedroom apartment NEW YORK" | 2022-08-05 |
3 | 145 | " 2-bedroom Homes in los angeles " | 2022-08-09 |
4 | 275 | "3 BHK apartments SAN JOSE" | 2022-08-16 |
5 | 535 | "2-Bedroom apartments CHICAGO" | 2022-08-25 |
customer_id |
---|
145 |
535 |
This SQL query selects distinct customers who had '2-bedroom' in their search query during the month of August. This is done by using the operator in PostgreSQL, which ignores case, i.e., '2-Bedroom' and '2-bedroom' would both match. Furthermore, it uses the function to extract the month of the search date.
While both and are used to rank rows, the key difference is in how they deal with ties.
RANK(): When there's a tie, leaves a gap in the ranking. For example, if three rows are tied for 2nd place, the RANK() function will assign a rank of 2 to the first of these rows, a rank of 3 to the 2nd row in the tie, and a rank of 4 to the the 3rd tie.
DENSE_RANK(): For ties, does not leave a gap in the ranking. Instead, it assigns the same rank to all tied rows, and then makes the next row 1 bigger. Confusing, I know, but here's an example to make it more clear: if three rows are tied for 3rd place, the function will assign a rank of 3 to all three rows, and then assign a rank of 4 to the next row.
Suppose we had data on how many deals different salespeople at Zillow:
To rank these salespeople, we could execute the following query:
The result of this query would be:
name | deals_closed | rank | dense_rank |
---|---|---|---|
Akash | 50 | 1 | 1 |
Brittany | 50 | 2 | 1 |
Carlos | 40 | 3 | 2 |
Dave | 40 | 4 | 3 |
Eve | 30 | 5 | 3 |
Farhad | 10 | 6 | 4 |
As you can see, the function assigns a rank of 1 to the first row (Akash), a rank of 2 to the second row (Brittany), and so on. In contrast, the , gives both Akash and Brittany a rank of 1, since they are tied, then moves on to rank 2 for Carlos.
A manager at Zillow asks you to analyze customer data. Zillow has two tables - the table and the table. The table contains detailed information about each customer, including their ID and the state they live in, while the table contains detailed information about every property listed for each state. Assume there's a one-to-many relationship between customers and properties, i.e. one customer can list multiple properties.
Your task is to write an SQL query to join the table with the table to get a list of all customers who have listed properties, with corresponding details from both tables. Information from the table will include property_id, price, and listing_date.
This PostgreSQL query joins the table (aliased as c) with the table (aliased as p) on the . This will result in a tabular output that includes specific columns from both tables. In this case, we ordered the resulting data by to make it easier to see all properties listed by each customer. The INNER JOIN makes sure that only customers who have at least one property listed are included in the result table.
Since joins come up frequently during SQL interviews, try an interactive Spotify JOIN SQL question:
The key to acing a Zillow SQL interview is to practice, practice, and then practice some more! Besides solving the above Zillow SQL interview questions, you should also solve the 200+ SQL exercises on DataLemur which come from companies like Amazon, Microsoft, Meta, and smaller tech companies.
Each DataLemur SQL question has multiple hints, fully explained answers along with a discussion board to see how others solved it and crucially, there is an interactive SQL code editor so you can instantly run your SQL query and have it executed.
To prep for the Zillow SQL interview it is also wise to solve interview questions from other tech companies like:
But if your SQL query skills are weak, don't worry about going right into solving questions – strengthen your SQL foundations with this DataLemur SQL Tutorial.
This tutorial covers topics including filtering strings based on patterns and advantages of CTEs vs. subqueries – both of which show up routinely during SQL job interviews at Zillow.
Beyond writing SQL queries, the other topics to prepare for the Zillow Data Science Interview are:
The best way to prepare for Zillow Data Science interviews is by reading Ace the Data Science Interview. The book's got:
You should also familiarize yourself with different Zillow Data Science & ML use cases: