At EngageSmart, SQL is used all the damn time for analyzing customer engagement data, pertinent to their smart engagement and donation solutions. That's why EngageSmart frequently asks SQL questions during interviews for Data Science and Data Engineering positions.
As such, to help you study for the EngageSmart SQL interview, we'll cover 10 EngageSmart SQL interview questions – can you answer each one?
EngageSmart is a leading provider of customer engagement solutions. Like many businesses, they want to track and analyze their key metrics. One key customer segment they care about is "Power Users". A "Power User" at EngageSmart is defined as a user who makes at least 10 transactions per month for the last six months. Your task is to write a SQL query to identify these power users from their transaction database.
Here is the Example Input:
transaction_id | customer_id | transaction_date | amount |
---|---|---|---|
1001 | 555 | 06/06/2022 | 120.50 |
1002 | 234 | 06/07/2022 | 45.00 |
1003 | 555 | 06/07/2022 | 75.00 |
1004 | 234 | 06/08/2022 | 110.00 |
1005 | 555 | 06/09/2022 | 200.00 |
... | ... | ... | ... |
2345 | 234 | 11/28/2022 | 230.00 |
2346 | 555 | 11/29/2022 | 300.00 |
This query first creates a temporary view to get the transaction count per customer per month. Then, it selects those users that have had at least 10 transactions in each of the last six months. These are our Power Users. The inner query groups the transaction records by customer id and month, counting the number of transactions for each grouping. The outer query then selects customer IDs with at least 6 months of 10 or more transactions.
To practice a similar customer analytics SQL question where you can solve it interactively and have your SQL code automatically checked, try this Walmart SQL Interview Question:
EngageSmart provides services to businesses and aims to track user interactions. The objective is to dive into customer interactions with business services, and see the trend over a span of months.
Given a table tracking different customer interactions with timestamps. We want a breakdown of the number of customer interactions with the business services per month, per service.
Define this as an interaction occurring when a customer submits a form on any number of EngageSmart's software applications. Remember, this is a month over month trend, so be sure to include interactions for each service for each month, even if no interaction occurred.
interaction_id | customer_id | timestamp | service_id |
---|---|---|---|
201 | 358 | 08-16-2022 00:00:00 | 1 |
202 | 123 | 08-16-2022 00:00:00 | 2 |
203 | 897 | 09-14-2022 00:00:00 | 1 |
204 | 123 | 10-28-2022 00:00:00 | 2 |
205 | 528 | 10-29-2022 00:00:00 | 3 |
month | service_id | num_interactions |
---|---|---|
8 | 1 | 1 |
8 | 2 | 1 |
8 | 3 | 0 |
9 | 1 | 1 |
9 | 2 | 0 |
9 | 3 | 0 |
10 | 1 | 0 |
10 | 2 | 1 |
10 | 3 | 1 |
In this PostgreSQL query, we use cross join to generate all possible combinations of month and service. We then use a left join to connect interactions if they exist. Date trunc is used to get the first day of each month and generate series is used to fill gaps between months. Finally, we count interactions for each service in each month, even if no interaction occurred. We order by month and service for a neat tabular view.
p.s. Window functions show up super often during SQL interviews, so practice the 27+ window function questions on DataLemur
A join in SQL allows you to retrieve data from multiple tables and combine it into a single set of results.
In SQL, there are four different types of JOINs. To demonstrate each kind, imagine you had two database tables, an advertising campaigns table which had information on Google Ads keywords and how much was bid for each keyword, and a sales table, which has data on how many products were sold and which Google Ads keyword drove that sale.
: An INNER JOIN retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the table and the table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the table matches the keyword in the table.
: A LEFT JOIN retrieves all rows from the left table (in this case, the table) and any matching rows from the right table (the Sales table). If there is no match in the right table, values will be returned for the right table's columns.
: A RIGHT JOIN retrieves all rows from the right table (in this case, the Sales table) and any matching rows from the left table (the table). If there is no match in the left table, values will be returned for the left table's columns.
: A FULL OUTER JOIN retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, values will be returned for the columns of the non-matching table.
EngageSmart is a software company which provides solutions for a variety of industries, including public sector, utilities, healthcare, and others. They want to analyze the usage pattern and behavior of their customers to improve their services.
They have two main entities - Customers(Customer_ID, Name, Signup_date, Industry) and Events(Event_ID, Customer_ID, Event_Date, Event_Type).
The "Customers" table stores information about their customers and the "Events" table stores different kinds of events (For Ex: "Product Login", "Customer Support Query", "Service Upgrade", etc.) generated by a customer.
Your task is to create a query which returns the total number of events per customer for each event type in each month of the year 2022.
Customer_ID | Name | Signup_date | Industry |
---|---|---|---|
1 | Company A | 2021-11-01 | Healthcare |
2 | Company B | 2022-01-15 | Utilities |
3 | Company C | 2021-12-30 | Public Sector |
4 | Company D | 2022-03-05 | Healthcare |
Event_ID | Customer_ID | Event_Date | Event_Type |
---|---|---|---|
101 | 2 | 2022-02-01 | Product Login |
102 | 3 | 2022-02-05 | Service Upgrade |
103 | 1 | 2022-02-10 | Customer Support Query |
104 | 2 | 2022-03-01 | Product Login |
105 | 4 | 2022-03-15 | Service Upgrade |
The query extracts the month from the field and groups the data by , , and . The function is then used to count the number of events per customer for each event type in each month. The result is then ordered by , , and .
While knowing this answer is beyond the scope of most Data Analyst & Data Science interviews, Data Engineers at EngageSmart should be at least aware of SQL vs. NoSQL databases.
Relational databases and non-relational (NoSQL) databases have some key differences, particularly in terms of how data is stored. Whereas relational databases have tables, made up of rows and columns, NoSQL databases use various data models like:
This added flexibility makes NoSQL databases well-suited for handling non-tabular data or data with a constantly changing format. However, this flexibility comes at the cost of ACID compliance, which is a set of properties (atomic, consistent, isolated, and durable) that ensure the reliability and integrity of data in a database. While most relational databases are ACID-compliant, NoSQL databases may not provide the same level of guarantees.
EngageSmart is a customer engagement platform that provides businesses with the tools to strengthen their customer relationships. In EngageSmart, customers make payments for the services used. The aim of this question is to calculate the average amount of payments each customer has made per transaction over a specific period.
Assume that we have a table for tracking each customer's payment activities.
payment_id | customer_id | payment_date | amount |
---|---|---|---|
101 | 789 | 04/30/2021 00:00:00 | 87.50 |
102 | 234 | 04/30/2021 00:00:00 | 100.00 |
103 | 789 | 05/24/2021 00:00:00 | 75.00 |
104 | 234 | 05/26/2021 00:00:00 | 100.00 |
105 | 567 | 06/07/2021 00:00:00 | 50.00 |
customer_id | Avg_payment |
---|---|
789 | 81.25 |
234 | 100.00 |
567 | 50.00 |
The SQL query for this can be the following:
This query groups the data from the payments table by the customer_id and computes the average amount each customer has paid per transaction.
To practice a very similar question try this interactive Stripe Repeated Payments Question which is similar for dealing with payment transactions or this Amazon Average Review Ratings Question which is similar for calculating averages.
The clause is used to remove all duplicate records from a query.
For example, if you had a table of open jobs EngageSmart was hiring for, and wanted to see what are all the unique job titles that were currently available at the company, you could write the following query:
EngageSmart depends heavily on its online marketing to attract its customers. To measure its performance, EngageSmart tracks two key metrics: Click-Through-Rate (CTR) and Conversion Rate.
The CTR is the proportion of users who click on an advertisement to the total number who view it. On the other hand, Conversion Rate is the proportion of users who add a product to cart out of those who viewed the product page.
As a Data Analyst at EngageSmart, your task is to calculate the CTR and Conversion Rate for each advertisement served by EngageSmart in the past month.
For this purpose, you have access to two tables: and
The query above starts by creating two CTEs and that group by and count the respective actions. Then, it performs a left join of the advertisements table with the and tables using the . Finally, it calculates the CTR and Conversion Rate by dividing the count of clicks by the count of views, and dividing the count of add_to_cart actions by the count of clicks, respectively.
To solve another question about calculating rates, solve this SQL interview question from TikTok within DataLemur's online SQL coding environment:
You are provided with a customer's database for the company EngageSmart. The marketing team has a particular interest in customers whose first names start with "A" and have a sales representative by the name of "Johnson". They would like a list of such customers. Can you write a SQL query for this purpose?
Here is the initial customer data:
customer_id | First_name | Last_name | representative |
---|---|---|---|
123 | Alex | Smith | Johnson |
265 | Andrew | Olsen | Black |
362 | Albert | Pinto | Johnson |
192 | Adam | Taylor | Johnson |
981 | Aria | Johnson | Miller |
customer_id | First_name | Last_name | representative |
---|---|---|---|
123 | Alex | Smith | Johnson |
362 | Albert | Pinto | Johnson |
192 | Adam | Taylor | Johnson |
The corresponding SQL query for PostgreSQL would be:
In this query, filters the customers that start with the letter 'A' and ensures that the representative for these customers is 'Johnson'. The combination of these conditions gives us the customers who fit both criteria.
A cross-join, also known as a cartesian join, is a JOIN that produces the cross-product of two tables. In a cross-join, each row from the first table is matched with every row from the second table, resulting in a new table with a row for each possible combination of rows from the two input tables.
Let's say you were building a Machine Learning model that attempts to score the probability of a customer purchasing a EngageSmart product. Before working in Pandas and Tensorflow, you might want to do some Exploratory Data Analysis (EDA) in SQL, and generate all pairs of customers and EngageSmart products.
Here's a cross-join query you could run:
Cross-joins are useful for generating all possible combinations, but they can also create huge tables if you're not careful. For instance, if you had 10,000 potential customers and EngageSmart had 500 different product SKUs, the resulting cross-join would have 5 million rows!
The best way to prepare for a EngageSmart SQL interview is to practice, practice, practice. In addition to solving the earlier EngageSmart SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Google, Microsoft and Silicon Valley startups.
Each problem on DataLemur has multiple hints, fully explained answers along with a discussion board to see how others solved it and most importantly, there is an online SQL code editor so you can easily right in the browser your SQL query and have it graded.
To prep for the EngageSmart SQL interview you can also be a great idea to practice SQL questions from other tech companies like:
However, if your SQL coding skills are weak, forget about going right into solving questions – strengthen your SQL foundations with this SQL interview tutorial.
This tutorial covers things like manipulating string/text data and filtering data with WHERE – both of which pop up routinely in EngageSmart interviews.
Beyond writing SQL queries, the other question categories to practice for the EngageSmart Data Science Interview are:
The best way to prepare for EngageSmart Data Science interviews is by reading Ace the Data Science Interview. The book's got: