At ePlus, SQL is used all the damn time for extracting and analysing complex business data from multiple databases, especially in their AI Ignite division. That's why ePlus asks SQL coding questions during interviews for Data Analyst, Data Science, and BI jobs.
As such, to help you study for the Plus SQL interview, we've collected 11 ePlus SQL interview questions – how many can you solve?
Imagine you're a data analyst at Plus, an online marketplace, and you've been tasked with the following:
"In order to stay competitive in the market, we need to better understand how our products are faring amongst our customers. Write a SQL query to calculate the monthly average user rating for each product from the 'reviews' table."
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | '2022-06-08 00:00:00' | 50001 | 4 |
7802 | 265 | '2022-06-10 00:00:00' | 69852 | 4 |
5293 | 362 | '2022-06-18 00:00:00' | 50001 | 3 |
6352 | 192 | '2022-07-26 00:00:00' | 69852 | 3 |
4517 | 981 | '2022-07-05 00:00:00' | 69852 | 2 |
In PostgreSQL, the query for the above would look like:
The SQL function is used to truncate the date to the specified precision ('month'). The clause is used here to calculate the average stars by both the month and product_id. This provides us a monthly average of user star ratings for each product. The results are then sorted by month and product to provide a chronological order of the average ratings per product.
mth | product_id | avg_stars |
---|---|---|
2022-06-01 00:00:00 | 50001 | 3.50 |
2022-06-01 00:00:00 | 69852 | 4.00 |
2022-07-01 00:00:00 | 69852 | 2.50 |
Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur
As a car sharing service, Plus needs to keep track of its members, the vehicles, and each rental transaction. The database consists of three tables:
A table with columns for , , and .
A table with columns for , , , and (available, rented or maintenance).
A table with columns for , , , and .
We are interested in the members who rented cars more than once in the last 30 days.
member_id | first_name | last_name | contact |
---|---|---|---|
1301 | Sam | Johnson | sam.johnson@gmail.com |
2502 | Sophia | Williams | sophia.williams@gmail.com |
3630 | Michael | Brown | michael.brown@gmail.com |
4920 | Emily | Taylor | emily.taylor@gmail.com |
vehicle_id | model | year | status |
---|---|---|---|
1001 | Toyota Corolla | 2018 | Available |
2002 | Honda Civic | 2017 | Rented |
3003 | Ford Escape | 2020 | Available |
4004 | Chevrolet Equinox | 2019 | Maintenance |
rental_id | member_id | vehicle_id | rent_date | return_date |
---|---|---|---|---|
9101 | 1301 | 1001 | 08/01/2022 | 08/05/2022 |
9202 | 2502 | 2002 | 08/03/2022 | 08/06/2022 |
9303 | 3630 | 3003 | 07/30/2022 | 08/01/2022 |
9404 | 1301 | 4004 | 08/10/2022 | 08/13/2022 |
This query first creates a subquery that selects the from the table where the is in the last 30 days, grouping by and filtering for those whose count of is more than 1. The main query then joins the table to this subquery on to get the names of these members.
There are several advantages to normalizing a database, including less redundancy, more flexibility, and better performance.
Less Redundancy: Normalization reduces redundancy by breaking down a larger, more general table into smaller, more specific tables. This reduces the amount of data that needs to be accessed for queries.
More Flexibility: Normalization makes it easier to modify the structure of the database, as there is less redundancy, so it allows you to make changes to one table without affecting others. This makes it easier to adapt the database to changing business needs (a very real reality at Plus!)
Better Performance: Normalization can improve the performance of the database by reducing the amount of data that needs to be stored and accessed in a single table. This can result in faster query times and better overall performance.
Customers of Plus are purchasing products and leaving reviews. The 'reviews' table records the details of each transaction including the review_id, user_id, the date the review was submitted (submit_date), product_id, and a star rating (stars). You need to filter this table to display the average star rating for each product, grouped by the month of review submission. The result should present the product_id in place of 'product' and the month portion of the submit_date in place of 'mth'.
For example, if you have two products, 50001 and 69852, you will provide a table that shows the average stars each product has gotten per month.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 00:00:00 | 50001 | 4 |
7802 | 265 | 2022-06-10 00:00:00 | 69852 | 4 |
5293 | 362 | 2022-06-18 00:00:00 | 50001 | 3 |
6352 | 192 | 2022-07-26 00:00:00 | 69852 | 3 |
4517 | 981 | 2022-07-05 00:00:00 | 69852 | 2 |
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
This query works by grouping the reviews based on the month of submission and product_id. For each group, it calculates the average star rating (avg_stars). The EXTRACT function is used to fetch just the month from the submit_date column.
The / operator is used to remove to return all rows from the first SELECT statement that are not returned by the second SELECT statement.
Note that is available in PostgreSQL and SQL Server, while MINUS is available in MySQL and Oracle (but don't worry about knowing which DBMS supports which exact commands since Plus interviewers aren't trying to trip you up on memorizing SQL syntax).
For a tangible example of in PostgreSQL, suppose you were doing an HR Analytics project for Plus, and had access to Plus's contractors and employees data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables.
You could use operator to find all contractors who never were a employee using this query:
At Plus, a financial service company, you are given a list of transactions performed by their customers. Your task is to find the average transaction amount for each customer.
Consider the table:
transaction_id | customer_id | transaction_date | transaction_amount |
---|---|---|---|
1 | 341 | 01/01/2022 00:00:00 | 100 |
2 | 932 | 03/01/2022 00:00:00 | 200 |
3 | 341 | 04/25/2022 00:00:00 | 300 |
4 | 932 | 05/01/2022 00:00:00 | 400 |
5 | 341 | 08/08/2022 00:00:00 | 500 |
Your answer should return the following:
customer_id | avg_transaction_value |
---|---|
341 | 300 |
932 | 300 |
In PostgreSQL, you could use the function grouped by the like so:
The query starts by selecting the from the table. The function computes the average by customer. The clause groups the transactions by each customer, allowing the function to compute an average for each group. The keyword is used to rename the column created by the function to .
This will provide a table with each customer's ID along with their average transaction value.
To practice a very similar question try this interactive Stripe Repeated Payments Question which is similar for transaction analysis or this Uber User's Third Transaction Question which is similar for customer transactions.
The SQL command merges the results of multiple statements and keeps only those rows that are present in all sets.
For example, say you were doing an HR Analytics project for Plus, and had access to Plus's employees and contractors data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables. You could use operator to find all contractors who also show up in the employees table:
For the company Plus, we want to understand the effectiveness of our digital advertising and the conversion rates of our products. In this context, effectiveness is defined by the Click-Through Rate (CTR) of our digital ads, while conversion rate refers to the proportion of views of a product that leads to the product being added to a cart.
To measure these, we'll need to calculate the CTR for our ads as well as the conversion rate from viewing a product to adding it to a cart.
click_id | user_id | ad_id | click_time |
---|---|---|---|
101 | 554 | 3001 | 06/08/2022 00:00:00 |
102 | 667 | 3002 | 06/10/2022 00:00:00 |
103 | 554 | 3001 | 06/18/2022 00:00:00 |
104 | 992 | 3003 | 07/26/2022 00:00:00 |
105 | 667 | 3004 | 07/27/2022 00:00:00 |
imp_id | user_id | ad_id | imp_time |
---|---|---|---|
201 | 554 | 3001 | 06/08/2022 00:00:00 |
202 | 667 | 3003 | 06/10/2022 00:00:00 |
203 | 883 | 3001 | 06/18/2022 00:00:00 |
204 | 992 | 3004 | 07/26/2022 00:00:00 |
205 | 667 | 3002 | 07/27/2022 00:00:00 |
view_id | user_id | product_id | view_time |
---|---|---|---|
301 | 922 | 4001 | 06/08/2022 00:00:00 |
302 | 432 | 4001 | 06/10/2022 00:00:00 |
303 | 922 | 4002 | 06/18/2022 00:00:00 |
304 | 465 | 4003 | 07/26/2022 00:00:00 |
305 | 432 | 4004 | 07/27/2022 00:00:00 |
cart_id | user_id | product_id | add_time |
---|---|---|---|
401 | 922 | 4001 | 06/08/2022 00:00:00 |
402 | 432 | 4003 | 06/18/2022 00:00:00 |
403 | 922 | 4002 | 07/26/2022 00:00:00 |
404 | 465 | 4004 | 07/27/2022 00:00:00 |
405 | 432 | 4001 | 07/27/2022 00:00:00 |
These queries return the CTR and CR for each ad and product respectively. The CTR or CR is calculated as the ratio between the number of click-throughs or items added to the cart and the number of impressions or product views. The ensures we include all ads that were displayed and products that were viewed, even if they didn't receive any clicks or get added to a cart.
To solve a similar problem about calculating rates, try this TikTok SQL question on DataLemur's online SQL code editor:
As a data analyst for Plus company, you are tasked with running a monthly analysis on each item's performance based on their ratings. The focus is on understanding how users perceive each product in different months - we want to know the average rating given to each product for each month.
Given a table that tracks every review submitted to any product, and includes the following columns:
Can you write a SQL query that shows the average rating () for each product () for each month ()?
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
1 | 1 | 2022-06-08 | 50001 | 4 |
2 | 2 | 2022-06-10 | 69852 | 4 |
3 | 3 | 2022-06-18 | 50001 | 3 |
4 | 4 | 2022-07-26 | 69852 | 3 |
5 | 5 | 2022-07-05 | 69852 | 2 |
Using PostgreSQL, the SQL query could be:
This query groups data by the month of and and then calculates the average star rating for each group. The function is used to get the month from . Finally, the results are ordered by the and in ascending order.
month | product_id | avg_stars |
---|---|---|
6 | 50001 | 3.5 |
6 | 69852 | 4 |
7 | 69852 | 2.5 |
In this output, we can see the average rating of each product per month. For example, the product with id 50001 had an average rating of 3.5 in June (month 6), and the product with id 69852 had an average rating of 2.5 in July (month 7).
A cross-join, also known as a cartesian join, is a type of JOIN operation in SQL that creates a new table by combining each row from the first table with every row from the second table. This results in a table with a row for every possible combination of rows from the two input tables.
An example of when this might be useful is if you wanted to first make a dataset containing all possible pairs of customers and products data, in order to later build a Machine Learning model to predict the probability of a customer purchasing a particular product.
However, it is important to note that cross-joins can create very large tables, especially if the input tables have a large number of rows. For example, if you had 10,000 customers and 5,000 different product SKUs, the resulting cross-join would have 50 million rows.
The question is about evaluating the average product ratings by month for Plus Company. Here, you are asked to write a SQL query that will join the 'reviews' table, which includes data about customer reviews, and the 'products' table, which includes data about specific products.
Your task is to create a SQL query that can determine the average star rating of each product for each month.
The results should present:
Order the results by the month, and inside each month, by the product id.
Given the following sample data:
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
1 | 1 | 01/15/2022 | 1 | 5 |
2 | 2 | 02/16/2022 | 1 | 4 |
3 | 1 | 02/18/2022 | 2 | 3 |
4 | 2 | 02/19/2022 | 2 | 4 |
5 | 3 | 03/01/2022 | 1 | 5 |
product_id | product_name | product_category |
---|---|---|
1 | Product A | Electronics |
2 | Product B | Electronics |
3 | Product C | Home Appliance |
The PostgreSQL query to obtain this result is:
This PostgreSQL query first extracts the month from the 'submit_date' column in the 'reviews' table. Then it averages the 'stars' for each product for each month, by grouping by both 'mth' and 'product'. Finally, it sorts the resulting table by 'mth' and 'product'. The use of a JOIN in this query allows it to match rows from two tables, 'reviews' and 'products', when their 'product_id' columns match.
Since join questions come up routinely during SQL interviews, practice this SQL join question from Spotify:
The key to acing a Plus SQL interview is to practice, practice, and then practice some more! Beyond just solving the earlier Plus SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like FAANG (FB, Apple, Amazon, Netflix, Google).
Each problem on DataLemur has hints to guide you, step-by-step solutions and most importantly, there is an online SQL coding environment so you can right in the browser run your SQL query and have it graded.
To prep for the Plus SQL interview it is also useful to practice interview questions from other tech companies like:
In case your SQL coding skills are weak, don't worry about going right into solving questions – strengthen your SQL foundations with this free SQL for Data Analytics course.
This tutorial covers SQL concepts such as AND/OR/NOT and window functions like RANK() and ROW_NUMBER() – both of these come up frequently during SQL job interviews at Plus.
Besides SQL interview questions, the other types of problems to practice for the Plus Data Science Interview are:
To prepare for Plus Data Science interviews read the book Ace the Data Science Interview because it's got: