At PagerDuty, SQL is used across the company for analyzing incident management data to improve system reliability, and extracting insights from customer usage data to optimize product features. Because of this, PagerDuty often tests SQL questions during interviews for Data Science and Data Engineering positions.
As such, to help you practice for the PagerDuty SQL interview, we'll cover 9 PagerDuty SQL interview questions – able to solve them?
As a DevOps or SRE engineer using PagerDuty, the effective and efficient resolution of incidents is an important activity. PagerDuty classifies its "Power Users" as users who frequently resolve incidents, and do so in an exceptionally quick time-frame.
Using the given two tables and , write a SQL query to identify the "Power Users". You should consider a Power User as someone who resolves more than 50 incidents per month, with an average incident resolution time less than 2 hours.
incident_id | resolver_id | start_date | end_date |
---|---|---|---|
1001 | 001 | 2022/06/01 02:00:00 | 2022/06/01 03:15:00 |
1002 | 002 | 2022/06/01 04:05:00 | 2022/06/01 05:10:00 |
1003 | 001 | 2022/06/02 01:10:00 | 2022/06/02 02:20:00 |
1004 | 003 | 2022/06/02 05:30:00 | 2022/06/02 06:00:00 |
1005 | 001 | 2022/06/03 01:00:00 | 2022/06/03 01:30:00 |
user_id | username |
---|---|
001 | john_doe |
002 | jane_doe |
003 | richard_roe |
This SQL query starts by joining the and tables together on the and fields respectively. It then groups the results by username and filters it further to only include results from the current month where an individual has resolved more than 50 incidents, and their average incident resolution time is under 2 hours. The function in Postgres is used to calculate the time difference in hours.
To practice a related super-user data analysis question on DataLemur's free online SQL code editor, try this Microsoft Teams Power User SQL Interview Question:
At PagerDuty, the service response time is a crucial metric for evaluating platform performance and customer satisfaction. An incident, once triggered, must be acknowledged and resolved quickly to keep systems running smoothly. Your task is to write a SQL query to find the average response time (time between an incident getting triggered and acknowledged) per technician per day. Assume the 'incidents' table records each incident with its 'triggered_at' and 'acknowledged_at' timestamps.
Below is the sample 'incidents' table:
incident_id | technician_id | triggered_at | acknowledged_at |
---|---|---|---|
1 | 101 | 2022-09-12 09:30:00 | 2022-09-12 09:35:00 |
2 | 102 | 2022-09-12 10:40:00 | 2022-09-12 11:00:00 |
3 | 101 | 2022-09-13 18:50:00 | 2022-09-13 18:52:00 |
4 | 102 | 2022-09-13 20:30:00 | 2022-09-13 20:45:00 |
5 | 101 | 2022-09-14 09:15:00 | 2022-09-14 09:25:00 |
Here is a PostgreSQL solution for the problem:
This query first calculates the difference between the 'acknowledged_at' and 'triggered_at' timestamps for each incident to get the response time in minutes. It then groups the incidents by technician and date to find the average response time per technician per day. The results will be ordered by technician_id and the date to facilitate easy consumption.
To solve a related window function SQL problem on DataLemur's free online SQL coding environment, solve this Amazon SQL Interview Question:
Normalizing a database involves dividing a large table into smaller and more specialized ones, and establishing relationships between them using foreign keys. This reduces repetition, resulting in a database that is more adaptable, scalable, and easy to maintain. Additionally, it helps to preserve the accuracy of the data by reducing the likelihood of inconsistencies and problems.
As a Data Analyst for PagerDuty, you have been tasked with working on customer segmentation of our database. Your goal is to filter down customer records to those who have enabled email notifications, are not currently on a premium plan, and have used our services within the last 30 days.
Use the following database tables:
customer_id | email_notification | plan_type | last_login_date |
---|---|---|---|
1 | true | premium | 2022-09-05 |
2 | true | free | 2022-09-15 |
3 | false | premium | 2022-09-12 |
4 | true | free | 2022-09-10 |
5 | false | free | 2022-09-01 |
login_log_id | customer_id | login_date |
---|---|---|
1 | 1 | 2022-09-05 |
2 | 2 | 2022-09-15 |
3 | 3 | 2022-09-12 |
4 | 4 | 2022-09-10 |
5 | 5 | 2022-09-01 |
This query will return all customer IDs satisfying the given conditions: they have enabled email notifications, are not on a premium plan, and have logged into our services within the last 30 days. It uses the WHERE clause to filter down rows based on the given conditions, and the JOIN clause to look at both and tables. The condition checks if customers have used the service within the last 30 days.
In database schema design, a one-to-one relationship between two entities is characterized by each entity being related to a single instance of the other. An example of this is the relationship between a US citizen and their social-security number (SSN) - each citizen has one SSN, and each SSN belongs to one person.
On the other hand, a one-to-many relationship is when one entity can be associated with multiple instances of the other entity. A teacher's relationship with their classes is an example of this - a teacher can teach many classes, but each class is only associated with one teacher.
In PagerDuty, incidents are generated when there is an issue in the system that needs attention. Each incident has a start time and an end time. The response time for an incident is the length of time from the start time to the end time. As a database engineer at PagerDuty, can you write a SQL query to find the average response time for incidents, in minutes, for each service?
incident_id | service_id | start_time | end_time |
---|---|---|---|
101 | 201 | 2021-08-01 10:00:00 | 2021-08-01 14:00:00 |
102 | 202 | 2021-08-01 11:00:00 | 2021-08-01 12:30:00 |
103 | 201 | 2021-08-02 13:00:00 | 2021-08-02 14:15:00 |
104 | 201 | 2021-08-02 09:00:00 | 2021-08-02 12:00:00 |
105 | 202 | 2021-08-02 15:00:00 | 2021-08-02 18:00:00 |
service_id | avg_response_time (mins) |
---|---|
201 | 200.00 |
202 | 105.00 |
The above query calculates the duration of each incident in minutes by subtracting the start time from the end time and then using the EXTRACT function to convert this interval to seconds (since EPOCH gives the result in seconds) and divide by 60 to get the result in minutes. The AVG function is then used to calculate the average response time for each service.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating durations or this Stripe Repeated Payments Question which is similar for identifying time-bound events.
To discover records in one table that are not present in another, you can utilize a and filter out any values in the right-side table.
For example, say you had a table of PagerDuty customers and a 2nd table of all purchases made with PagerDuty. To find all customers who did not make a purchase, you'd use the following
This query returns all rows from the customers table, along with any matching rows from the purchases table. If there is no matching row in the purchases table, values will be returned for all of the right table's columns. The clause then filters out any rows where the purchases.id column is , leaving only customers who have not made a purchase.
In a marketing campaign, PagerDuty collects data from users who click on a digital ad (click through) and visit their product pages. They also record when a user decides to add a product to their cart (conversion). They are interested in finding the click-through conversion rate for each product.
You are given two tables: which records each visit to a product page by a user, and which logs whenever a user adds a product to their cart. Your task is to calculate the click-through conversion rate for each product. The click-through conversion rate is defined by the number of conversions divided by the number of clicks for each product.
click_id | user_id | click_timestamp | product_id |
---|---|---|---|
1001 | 123 | 07/08/2022 12:00:00 | PD001 |
1002 | 265 | 07/10/2022 14:30:00 | PD002 |
1003 | 362 | 07/18/2022 09:45:00 | PD001 |
1004 | 192 | 08/26/2022 00:00:00 | PD002 |
1005 | 981 | 08/05/2022 23:59:00 | PD002 |
conversion_id | user_id | conversion_timestamp | product_id |
---|---|---|---|
2001 | 123 | 07/08/2022 12:05:00 | PD001 |
2002 | 265 | 07/11/2022 10:00:00 | PD002 |
2003 | 362 | 07/18/2022 10:00:00 | PD001 |
This query first calculates the total number of clicks per product (subquery c) and the total number of conversions per product (subquery conv). It then left joins these two together on the product_id. The click-through conversion rate for each product is calculated as conversions divided by total_clicks. If there are no conversions for a product, COALESCE ensures that the rate is zero, not null.
To solve a similar problem about calculating rates, solve this TikTok SQL question on DataLemur's online SQL code editor:
As a PagerDuty engineer, you might sometimes want to know which customers have email addresses registered under a specific domain, say "pagerduty.com". For example, you may want to send out an internal company newsletter or host a special event for these customers. Using our customer records database, write an SQL query to find all customers that have email addresses ending with "pagerduty.com".
customer_id | first_name | last_name | sign_up_date | |
---|---|---|---|---|
1001 | John | Doe | john.doe@pagerduty.com | 01/25/2022 |
1002 | Jane | Smith | jane.smith@gmail.com | 02/14/2022 |
1003 | Andy | Lee | andy.lee@pagerduty.com | 03/01/2022 |
1004 | Olivia | Rodriguez | olivia.rodriguez@yahoo.com | 04/10/2022 |
This query will return all rows from the table where the column ends with "@pagerduty.com", effectively identifying all customers who have registered with an email address under the domain "pagerduty.com". The "%" in the LIKE operator is a wildcard, which matches any number of any characters. So '%@pagerduty.com' will match any email that ends with "@pagerduty.com".
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the earlier PagerDuty SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Netflix, Google, and Amazon.
Each problem on DataLemur has hints to guide you, full answers and best of all, there's an online SQL code editor so you can right online code up your query and have it checked.
To prep for the PagerDuty SQL interview you can also be helpful to solve interview questions from other tech companies like:
However, if your SQL coding skills are weak, forget about diving straight into solving questions – go learn SQL with this SQL interview tutorial.
This tutorial covers SQL topics like INTERCEPT/EXCEPT and ordering data – both of which pop up routinely in PagerDuty SQL assessments.
Besides SQL interview questions, the other types of questions to prepare for the PagerDuty Data Science Interview are:
The best way to prepare for PagerDuty Data Science interviews is by reading Ace the Data Science Interview. The book's got: