At Public Storage, SQL is used day-to-day for analyzing customer demographics and rental patterns, and optimizing the inventory management of storage units. That's why Public Storage almost always evaluates jobseekers on SQL questions during interviews for Data Analytics, Data Science, and Data Engineering jobs.
So, to help you prep, here’s 11 Public Storage SQL interview questions – able to solve them?
Public Storage is a company providing self-storage units - spaces which customers can rent to store their belongings. In this context, power users could be defined as those customers who rent storage units most frequently.
Your task is to write a SQL query to identify these power users. For the sake of this question, let's assume any customer who rents storage units more than twice in a month is considered a power user.
customers
Example Input:customer_id | name |
---|---|
1 | Amy |
2 | Bob |
3 | Charlie |
4 | David |
5 | Eva |
rentals
Example Input:rental_id | customer_id | rent_date | unit_id |
---|---|---|---|
1 | 1 | 2022-08-01 | 101 |
2 | 2 | 2022-08-01 | 102 |
3 | 1 | 2022-08-02 | 103 |
4 | 3 | 2022-08-03 | 104 |
5 | 2 | 2022-08-04 | 105 |
6 | 1 | 2022-08-05 | 106 |
7 | 4 | 2022-08-05 | 107 |
8 | 5 | 2022-08-06 | 108 |
9 | 3 | 2022-08-07 | 109 |
10 | 1 | 2022-08-08 | 101 |
SELECT c.customer_id, c.name FROM customers c JOIN ( SELECT customer_id, EXTRACT(MONTH FROM rent_date) as month, COUNT(*) as rental_count FROM rentals GROUP BY customer_id, month HAVING COUNT(*) > 2 ) r ON c.customer_id = r.customer_id
In the subquery, we group by both customer and month, counting the number of rentals for each month. In the outer query, we join this result with the customers
table to include the customer names. The HAVING
clause filters to include only those rows where the rental_count
for the month is more than 2. This will give us the list of customers who have rented more than twice in any given month.
To practice a similar VIP customer analysis question on DataLemur's free online SQL coding environment, try this Microsoft Azure Cloud SQL Interview Question:
Given a table of Public Storage employee salary data, write a SQL query to find employees who earn more money than their direct boss.
employees
Example Input:employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns $8,000, surpassing her manager, William Davis who earns 7,800.
You can solve this interview question and run your code right in DataLemur's online SQL environment:
First, we perform a SELF-JOIN where we treat the first employee
table (mgr
) as the managers' table and the second employee
table (emp
) as the employees' table. Then we use a WHERE
clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
SELECT emp.employee_id AS employee_id, emp.name AS employee_name FROM employee AS mgr INNER JOIN employee AS emp ON mgr.employee_id = emp.manager_id WHERE emp.salary > mgr.salary;
If the solution above is hard to understand, you can find a step-by-step solution with hints here: Employee Salaries Higher Than Their Manager.
A join in SQL combines rows from two or more tables based on a shared column or set of columns.
Four types of JOINs exist in SQL. To demonstrate each one, say you had a table of Public Storage orders and Public Storage customers.
INNER JOIN: When there is a match in the shared key or keys, rows from both tables are retrieved. In this example, an INNER JOIN
between the Orders and Customers tables would retrieve rows where the customer_id
in the Orders table matches the customer_id
in the Customers table.
LEFT JOIN: A LEFT JOIN
retrieves all rows from the left table (in this case, the Orders table) and any matching rows from the right table (the Customers table). If there is no match in the right table, NULL values will be returned for the right table's columns.
RIGHT JOIN: A RIGHT JOIN
retrieves all rows from the right table (in this case, the Customers table) and any matching rows from the left table (the Orders table). If there is no match in the left table, NULL values will be returned for the left table's columns.
FULL OUTER JOIN: A FULL OUTER JOIN
retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
As a Data Analyst at Public Storage, you are asked to analyze the company's rental income by calculating the average price per month for each type of storage unit. Consider the hypothetical scenario where the data of all transactions are stored in one table named rental_prices
. Each row in this table represents a single rental transaction. Each storage unit type has a unique identifier unit_id
.
Please write a PostgreSQL query that calculates the monthly average rental price for each type of storage unit. The table includes rental_price, transaction_date (in 'YYYY-MM-DD HH:MI:SS' format) and unit_id.
rental_prices
Example Input:transaction_id | transaction_date | unit_id | rental_price |
---|---|---|---|
1 | 2022-06-08 00:00:00 | A1 | 50 |
2 | 2022-06-10 00:00:00 | B1 | 60 |
3 | 2022-06-18 00:00:00 | B1 | 60 |
4 | 2022-07-26 00:00:00 | B1 | 60 |
5 | 2022-07-05 00:00:00 | A1 | 55 |
month | unit_id | average_price |
---|---|---|
6 | A1 | 50.00 |
6 | B1 | 60.00 |
7 | A1 | 55.00 |
7 | B1 | 60.00 |
SELECT EXTRACT(MONTH FROM transaction_date) as month, unit_id, AVG(rental_price) as average_price FROM rental_prices GROUP BY month, unit_id ORDER BY month, unit_id;
This query calculates the average price for each storage unit type by month from the rental_prices
table. The EXTRACT(MONTH FROM transaction_date)
function is used to extract the month from the date of each transaction.
To practice another window function question on DataLemur's free interactive coding environment, solve this Amazon BI Engineer interview question:
Read Public Storage's report on the three months ended March 31st.
Stored procedures are like functions in Python – they can accept input params and return values, and are used to encapsulate complex logic.
For example, if you worked as a Data Analyst in support of the Marketing Analytics team at Public Storage, a common task might be to find the conversion rate for your ads given a specific time-frame. Instead of having to write this query over-and-over again, you could write a stored procedure like the following:
CREATE FUNCTION get_conversion_rate(start_date DATE, end_date DATE, event_name TEXT) RETURNS NUMERIC AS $BODY$ BEGIN RETURN (SELECT COUNT(*) FROM events WHERE event_date BETWEEN start_date AND end_date AND event_name = 'conversion') / (SELECT COUNT(*) FROM events WHERE event_date BETWEEN start_date AND end_date AND event_name = 'impression'); END; $BODY$ LANGUAGE 'plpgsql';
To call this stored procedure, you'd execute the following query:
SELECT get_conversion_rate('2023-01-01', '2023-01-31', 'conversion');
You are given a Public Storage database with a table named customers
containing customer details and a rentals
table containing rental information. You need to list all customers who have been renting for more than six months and are up-to-date with their payments.
The customers
table is defined as follows:
customers
Example Input:customer_id | first_name | last_name | |
---|---|---|---|
101 | John | Doe | johndoe@example.com |
102 | Jane | Doe | janedoe@example.com |
103 | Sam | Smith | samsmith@example.com |
104 | Sara | Johnson | sarajohnson@example.com |
105 | James | Brown | jamesbrown@example.com |
The rentals
table is defined as follows:
rentals
Example Input:rental_id | customer_id | storage_unit_id | start_date | end_date | rental_status |
---|---|---|---|---|---|
1 | 101 | 1 | 01/01/2021 | NULL | Paid |
2 | 102 | 2 | 06/01/2022 | NULL | Due |
3 | 103 | 3 | 01/01/2022 | NULL | Paid |
4 | 104 | 4 | 01/01/2020 | 12/31/2020 | Paid |
5 | 105 | 1 | 12/01/2022 | NULL | Paid |
In this case, the start_date
and end_date
are in the format 'MM/DD/YYYY'. If the end_date
is NULL
, it means that the customer is still renting the storage unit. The rental_status
indicates whether the customer has paid their most recent bill. If the rental_status
is 'Paid', it means the customer is up to date with their payments.
Here is the PostgreSQL query which can be used to solve this problem:
SELECT c.customer_id, c.first_name, c.last_name, c.email FROM customers c JOIN rentals r ON c.customer_id = r.customer_id WHERE r.rental_status = 'Paid' AND r.start_date <= NOW() - INTERVAL '6 months' AND r.end_date IS NULL;
This query first joins customers
and rentals
on customer_id
. It then filters the joined data where rental_status
is 'Paid', start_date
is more than six months ago, and end_date
is NULL
, indicating that the rental is ongoing. The query then selects the customer_id
, first_name
, last_name
, and email
from the filtered data.
Denormalization is the process of modifying a database schema in a way that deviates from the typical rules of normalization (1NF, 2NF, 3NF, etc.). There's a few reasons to denormalize a database:
Improved performance: Joins are slow AF, especially when dealing with the massive datasets that are typically used at Public Storage. Denormalization can improve the performance of a database by reducing the number of joins that are required to retrieve data. This can be particularly useful when the database is being used for querying and reporting purposes, as joins can be expensive and slow.
Scalability: By reducing the amount of data that needs to be read and processed to execute a query, denormalization can enhance the scalability of a database. This can be useful when the database is anticipated to handle a large number of read-only queries (such as in OLAP use cases).
Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.
Because denormalization can create more complex update and delete operations, and pose potential data integrity issues, consider denormalization only if joins are causing performance bottlenecks.
As a data analyst at Public Storage, you're tasked with finding out the average volume (in cubic feet) of storage units rented each month. You have access to the following two tables:
rentals
Example Input:rental_id | customer_id | storage_id | start_date | end_date |
---|---|---|---|---|
20001 | 789 | 10001 | 2022-06-01 | 2022-06-30 |
20002 | 654 | 10002 | 2022-06-15 | 2022-07-15 |
20003 | 321 | 10003 | 2022-07-01 | 2022-07-28 |
20004 | 987 | 10001 | 2022-07-01 | 2022-07-15 |
20005 | 456 | 10002 | 2022-07-02 | NULL |
storage_units
Example Input:storage_id | volume_cft |
---|---|
10001 | 150 |
10002 | 200 |
10003 | 350 |
Create a SQL query to determine the average volume of storage units rented each month.
SELECT EXTRACT(MONTH FROM r.start_date) AS month, AVG(s.volume_cft) AS avg_volume_cft FROM rentals r INNER JOIN storage_units s ON r.storage_id = s.storage_id GROUP BY month ORDER BY month;
In this query, we first join the rentals
and storage_units
tables using the storage_id
shared between the tables. Then, we group by the month portion of the start_date
column in the rentals
table and calculate the average volume of the storage units rented during each month. The results are sorted by the month.
To practice a very similar question try this interactive Amazon Maximize Prime Item Inventory Question which is similar for requiring calculations on item volumes or this Google Average Review Ratings Question which is similar for needing average calculation per month.
Imagine you are a data analyst at Public Storage. Your manager has asked you to retrieve customer data for a specific branch. The branch is not identified by the branch_id, but rather by the city. Your task is to list all the customers whose records contain the name of the city, e.g., 'San Francisco'.
customer_info
Example Input:customer_id | first_name | last_name | branch_id | city |
---|---|---|---|---|
7546 | John | Doe | 123 | Los Angeles |
2045 | Jane | Smith | 456 | San Francisco |
6213 | Emily | Brown | 789 | San Diego |
9047 | Robert | White | 123 | San Francisco |
8673 | Mary | Green | 456 | Los Angeles |
branch
Example Input:branch_id | branch_name | city |
---|---|---|
123 | Central L.A. | Los Angeles |
456 | Downtown SF | San Francisco |
789 | Old Town SD | San Diego |
To return the results, you will need to write an SQL query.
SELECT ci.customer_id, ci.first_name, ci.last_name FROM customer_info as ci JOIN branch as b ON ci.branch_id = b.branch_id WHERE ci.city LIKE 'San Francisco'
This query performs a join operation on the customer information and branch tables using the branch_id
field. It then filters the results to return only those customer records where the city
field contains the string 'San Francisco'. The results will list all the customers who have a record at the San Francisco branch of Public Storage.
CHECK
SQL constraint?The CHECK
constraint is used to specify a condition that the data in a column must meet. If a row is inserted or updated and the data in the column doesn't meet the condition specified by the CHECK constraint, the operation will sadly fail.
For example, you might use a CHECK
constraint to ensure that a column contains only positive numbers, or that a date is within a certain range.
For example, if you had a table of Public Storage employees, here's an example of how to use the CHECK constraint in a CREATE TABLE statement:
CREATE TABLE public_storage_employees ( id INT PRIMARY KEY, salary INT CHECK (salary > 0), hire_date DATE CHECK (hire_date >= '1970-01-01') );
You are given two tables - customers
and rentals
. The customers
table has information about each customer like their id, name, and email. The rentals
table tracks all the storage unit rentals and includes the renter's id, the rental unit's id, the date of the rental, and the rental price.
Write a SQL query to find the average rental price paid by each customer, also return the customer's name and id for easy identification.
Below are the structures of the customers
and rentals
tables.
customers
Example Input:customer_id | customer_name | |
---|---|---|
1 | John Doe | johndoe@example.com |
2 | Jane Smith | janesmith@example.com |
3 | Alice Johnson | alicejohnson@example.com |
rentals
Example Input:renter_id | unit_id | rental_date | rental_price |
---|---|---|---|
1 | 5 | 06/08/2022 | 100 |
1 | 6 | 06/15/2022 | 120 |
2 | 7 | 06/18/2022 | 110 |
2 | 8 | 07/26/2022 | 100 |
3 | 9 | 07/05/2022 | 125 |
SELECT c.customer_id, c.customer_name, AVG(r.rental_price) AS average_rental_price FROM customers c JOIN rentals r ON c.customer_id = r.renter_id GROUP BY c.customer_id, c.customer_name
This PostgreSQL query works by joining the customer
and rentals
tables on the customer id (which is the renter id in the rentals table). Then it groups by customer id and name, and applies the AVG function to compute the average rental price for each customer.
Since join questions come up frequently during SQL interviews, take a stab at this Snapchat JOIN SQL interview question:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews.
Besides solving the earlier Public Storage SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Microsoft, Google, and Facebook.
Each DataLemur SQL question has hints to guide you, full answers and best of all, there's an online SQL code editor so you can right online code up your SQL query and have it executed.
To prep for the Public Storage SQL interview you can also be wise to practice SQL questions from other real estate and REIT companies like:
However, if your SQL query skills are weak, don't worry about diving straight into solving questions – go learn SQL with this SQL tutorial for Data Analytics.
This tutorial covers things like manipulating string/text data and GROUP BY – both of which pop up frequently during Public Storage interviews.
Besides SQL interview questions, the other types of questions tested in the Public Storage Data Science Interview include:
To prepare for Public Storage Data Science interviews read the book Ace the Data Science Interview because it's got: