At Public Storage, SQL is used day-to-day for analyzing customer demographics and rental patterns, and optimizing the inventory management of storage units. That's why Public Storage almost always evaluates jobseekers on SQL questions during interviews for Data Analytics, Data Science, and Data Engineering jobs.
So, to help you prep, here’s 11 Public Storage SQL interview questions – able to solve them?
Public Storage is a company providing self-storage units - spaces which customers can rent to store their belongings. In this context, power users could be defined as those customers who rent storage units most frequently.
Your task is to write a SQL query to identify these power users. For the sake of this question, let's assume any customer who rents storage units more than twice in a month is considered a power user.
customer_id | name |
---|---|
1 | Amy |
2 | Bob |
3 | Charlie |
4 | David |
5 | Eva |
rental_id | customer_id | rent_date | unit_id |
---|---|---|---|
1 | 1 | 2022-08-01 | 101 |
2 | 2 | 2022-08-01 | 102 |
3 | 1 | 2022-08-02 | 103 |
4 | 3 | 2022-08-03 | 104 |
5 | 2 | 2022-08-04 | 105 |
6 | 1 | 2022-08-05 | 106 |
7 | 4 | 2022-08-05 | 107 |
8 | 5 | 2022-08-06 | 108 |
9 | 3 | 2022-08-07 | 109 |
10 | 1 | 2022-08-08 | 101 |
In the subquery, we group by both customer and month, counting the number of rentals for each month. In the outer query, we join this result with the table to include the customer names. The clause filters to include only those rows where the for the month is more than 2. This will give us the list of customers who have rented more than twice in any given month.
To practice a similar VIP customer analysis question on DataLemur's free online SQL coding environment, try this Microsoft Azure Cloud SQL Interview Question:
Given a table of Public Storage employee salary data, write a SQL query to find employees who earn more money than their direct boss.
employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns $8,000, surpassing her manager, William Davis who earns 7,800.
You can solve this interview question and run your code right in DataLemur's online SQL environment:
First, we perform a SELF-JOIN where we treat the first table () as the managers' table and the second table () as the employees' table. Then we use a clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
If the solution above is hard to understand, you can find a step-by-step solution with hints here: Employee Salaries Higher Than Their Manager.
A join in SQL combines rows from two or more tables based on a shared column or set of columns.
Four types of JOINs exist in SQL. To demonstrate each one, say you had a table of Public Storage orders and Public Storage customers.
INNER JOIN: When there is a match in the shared key or keys, rows from both tables are retrieved. In this example, an between the Orders and Customers tables would retrieve rows where the in the Orders table matches the in the Customers table.
LEFT JOIN: A retrieves all rows from the left table (in this case, the Orders table) and any matching rows from the right table (the Customers table). If there is no match in the right table, NULL values will be returned for the right table's columns.
RIGHT JOIN: A retrieves all rows from the right table (in this case, the Customers table) and any matching rows from the left table (the Orders table). If there is no match in the left table, NULL values will be returned for the left table's columns.
FULL OUTER JOIN: A retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
As a Data Analyst at Public Storage, you are asked to analyze the company's rental income by calculating the average price per month for each type of storage unit. Consider the hypothetical scenario where the data of all transactions are stored in one table named . Each row in this table represents a single rental transaction. Each storage unit type has a unique identifier .
Please write a PostgreSQL query that calculates the monthly average rental price for each type of storage unit. The table includes rental_price, transaction_date (in 'YYYY-MM-DD HH:MI:SS' format) and unit_id.
transaction_id | transaction_date | unit_id | rental_price |
---|---|---|---|
1 | 2022-06-08 00:00:00 | A1 | 50 |
2 | 2022-06-10 00:00:00 | B1 | 60 |
3 | 2022-06-18 00:00:00 | B1 | 60 |
4 | 2022-07-26 00:00:00 | B1 | 60 |
5 | 2022-07-05 00:00:00 | A1 | 55 |
month | unit_id | average_price |
---|---|---|
6 | A1 | 50.00 |
6 | B1 | 60.00 |
7 | A1 | 55.00 |
7 | B1 | 60.00 |
This query calculates the average price for each storage unit type by month from the table. The function is used to extract the month from the date of each transaction.
To practice another window function question on DataLemur's free interactive coding environment, solve this Amazon BI Engineer interview question:
Read Public Storage's report on the three months ended March 31st.
Stored procedures are like functions in Python – they can accept input params and return values, and are used to encapsulate complex logic.
For example, if you worked as a Data Analyst in support of the Marketing Analytics team at Public Storage, a common task might be to find the conversion rate for your ads given a specific time-frame. Instead of having to write this query over-and-over again, you could write a stored procedure like the following:
To call this stored procedure, you'd execute the following query:
You are given a Public Storage database with a table named containing customer details and a table containing rental information. You need to list all customers who have been renting for more than six months and are up-to-date with their payments.
The table is defined as follows:
customer_id | first_name | last_name | |
---|---|---|---|
101 | John | Doe | johndoe@example.com |
102 | Jane | Doe | janedoe@example.com |
103 | Sam | Smith | samsmith@example.com |
104 | Sara | Johnson | sarajohnson@example.com |
105 | James | Brown | jamesbrown@example.com |
The table is defined as follows:
rental_id | customer_id | storage_unit_id | start_date | end_date | rental_status |
---|---|---|---|---|---|
1 | 101 | 1 | 01/01/2021 | NULL | Paid |
2 | 102 | 2 | 06/01/2022 | NULL | Due |
3 | 103 | 3 | 01/01/2022 | NULL | Paid |
4 | 104 | 4 | 01/01/2020 | 12/31/2020 | Paid |
5 | 105 | 1 | 12/01/2022 | NULL | Paid |
In this case, the and are in the format 'MM/DD/YYYY'. If the is , it means that the customer is still renting the storage unit. The indicates whether the customer has paid their most recent bill. If the is 'Paid', it means the customer is up to date with their payments.
Here is the PostgreSQL query which can be used to solve this problem:
This query first joins and on . It then filters the joined data where is 'Paid', is more than six months ago, and is , indicating that the rental is ongoing. The query then selects the , , , and from the filtered data.
Denormalization is the process of modifying a database schema in a way that deviates from the typical rules of normalization (1NF, 2NF, 3NF, etc.). There's a few reasons to denormalize a database:
Improved performance: Joins are slow AF, especially when dealing with the massive datasets that are typically used at Public Storage. Denormalization can improve the performance of a database by reducing the number of joins that are required to retrieve data. This can be particularly useful when the database is being used for querying and reporting purposes, as joins can be expensive and slow.
Scalability: By reducing the amount of data that needs to be read and processed to execute a query, denormalization can enhance the scalability of a database. This can be useful when the database is anticipated to handle a large number of read-only queries (such as in OLAP use cases).
Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.
Because denormalization can create more complex update and delete operations, and pose potential data integrity issues, consider denormalization only if joins are causing performance bottlenecks.
As a data analyst at Public Storage, you're tasked with finding out the average volume (in cubic feet) of storage units rented each month. You have access to the following two tables:
rental_id | customer_id | storage_id | start_date | end_date |
---|---|---|---|---|
20001 | 789 | 10001 | 2022-06-01 | 2022-06-30 |
20002 | 654 | 10002 | 2022-06-15 | 2022-07-15 |
20003 | 321 | 10003 | 2022-07-01 | 2022-07-28 |
20004 | 987 | 10001 | 2022-07-01 | 2022-07-15 |
20005 | 456 | 10002 | 2022-07-02 | NULL |
storage_id | volume_cft |
---|---|
10001 | 150 |
10002 | 200 |
10003 | 350 |
Create a SQL query to determine the average volume of storage units rented each month.
In this query, we first join the and tables using the shared between the tables. Then, we group by the month portion of the column in the table and calculate the average volume of the storage units rented during each month. The results are sorted by the month.
To practice a very similar question try this interactive Amazon Maximize Prime Item Inventory Question which is similar for requiring calculations on item volumes or this Google Average Review Ratings Question which is similar for needing average calculation per month.
Imagine you are a data analyst at Public Storage. Your manager has asked you to retrieve customer data for a specific branch. The branch is not identified by the branch_id, but rather by the city. Your task is to list all the customers whose records contain the name of the city, e.g., 'San Francisco'.
customer_id | first_name | last_name | branch_id | city |
---|---|---|---|---|
7546 | John | Doe | 123 | Los Angeles |
2045 | Jane | Smith | 456 | San Francisco |
6213 | Emily | Brown | 789 | San Diego |
9047 | Robert | White | 123 | San Francisco |
8673 | Mary | Green | 456 | Los Angeles |
branch_id | branch_name | city |
---|---|---|
123 | Central L.A. | Los Angeles |
456 | Downtown SF | San Francisco |
789 | Old Town SD | San Diego |
To return the results, you will need to write an SQL query.
This query performs a join operation on the customer information and branch tables using the field. It then filters the results to return only those customer records where the field contains the string 'San Francisco'. The results will list all the customers who have a record at the San Francisco branch of Public Storage.
The constraint is used to specify a condition that the data in a column must meet. If a row is inserted or updated and the data in the column doesn't meet the condition specified by the CHECK constraint, the operation will sadly fail.
For example, you might use a constraint to ensure that a column contains only positive numbers, or that a date is within a certain range.
For example, if you had a table of Public Storage employees, here's an example of how to use the CHECK constraint in a CREATE TABLE statement:
You are given two tables - and . The table has information about each customer like their id, name, and email. The table tracks all the storage unit rentals and includes the renter's id, the rental unit's id, the date of the rental, and the rental price.
Write a SQL query to find the average rental price paid by each customer, also return the customer's name and id for easy identification.
Below are the structures of the and tables.
customer_id | customer_name | |
---|---|---|
1 | John Doe | johndoe@example.com |
2 | Jane Smith | janesmith@example.com |
3 | Alice Johnson | alicejohnson@example.com |
renter_id | unit_id | rental_date | rental_price |
---|---|---|---|
1 | 5 | 06/08/2022 | 100 |
1 | 6 | 06/15/2022 | 120 |
2 | 7 | 06/18/2022 | 110 |
2 | 8 | 07/26/2022 | 100 |
3 | 9 | 07/05/2022 | 125 |
This PostgreSQL query works by joining the and tables on the customer id (which is the renter id in the rentals table). Then it groups by customer id and name, and applies the AVG function to compute the average rental price for each customer.
Since join questions come up frequently during SQL interviews, take a stab at this Snapchat JOIN SQL interview question:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Besides solving the earlier Public Storage SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Microsoft, Google, and Facebook.
Each DataLemur SQL question has hints to guide you, full answers and best of all, there's an online SQL code editor so you can right online code up your SQL query and have it executed.
To prep for the Public Storage SQL interview you can also be wise to practice SQL questions from other real estate and REIT companies like:
However, if your SQL query skills are weak, don't worry about diving straight into solving questions – go learn SQL with this SQL tutorial for Data Analytics.
This tutorial covers things like manipulating string/text data and GROUP BY – both of which pop up frequently during Public Storage interviews.
Besides SQL interview questions, the other types of questions tested in the Public Storage Data Science Interview include:
To prepare for Public Storage Data Science interviews read the book Ace the Data Science Interview because it's got: