At Schneider Electric, SQL is used across the company for analyzing energy consumption patterns, and optimizing supply chain logistics for more efficient energy management. Because of this, Schneider Electric frequently asks SQL questions during interviews for Data Science and Data Engineering positions.
In case you want to ace the SQL Assessment, here’s 8 Schneider Electric SQL interview questions to practice, which are similar to commonly asked questions at Schneider Electric – able to answer them all?
Schneider Electric sells a range of products, with sales transactions recorded in the database. A "power user" is defined as a customer that has a total purchase amount that ranks among the top 5% of all customers in the last 6 months. We would like to understand who these power users are, what their total purchase amounts are, and how many purchases they've made within this period.
customer_id | last_name | first_name |
---|---|---|
1 | Smith | John |
2 | Doe | Jane |
3 | Johnson | Michael |
4 | Lee | Linda |
5 | Brown | George |
purchase_id | customer_id | purchase_date | purchase_amount |
---|---|---|---|
1 | 2 | 01/02/2022 00:00:00 | 1200 |
2 | 1 | 01/15/2022 00:00:00 | 500 |
3 | 3 | 01/18/2022 00:00:00 | 700 |
4 | 2 | 02/04/2022 00:00:00 | 2800 |
5 | 1 | 03/12/2022 00:00:00 | 1500 |
6 | 4 | 03/18/2022 00:00:00 | 900 |
7 | 5 | 04/25/2022 00:00:00 | 1600 |
8 | 1 | 05/10/2022 00:00:00 | 2000 |
9 | 3 | 06/15/2022 00:00:00 | 3000 |
This query first calculates the total purchase amount and the number of transactions for each customer in the last six months. Then, it filters out the customers whose total purchase amount falls into the top 5%. This is achieved by ordering the total purchase amount in descending order, then skip to the customer who is at the top 5% of the customer count. Finally, it orders the result set by total purchase amount in descending order.
Suppose Schneider Electric, an energy management and automation solutions provider, would like to analyze their product sales to identify the purchased items in each month. You are provided a purchases table that stores the log of every product purchase, and each row corresponds to one item in a specific purchase. For each month, you are required to find the total sales for each product, including the running total of sales month over month.
Here are the tables:
purchase_id | product | date | qty | price |
---|---|---|---|---|
101 | Circuit Breaker | 2021-09-03 | 2 | 100 |
102 | Solar Panel | 2021-09-05 | 1 | 300 |
103 | PLC | 2021-09-10 | 5 | 350 |
104 | Circuit Breaker | 2021-10-02 | 3 | 150 |
105 | Solar Panel | 2021-10-06 | 2 | 600 |
106 | Circuit Breaker | 2021-11-12 | 4 | 200 |
The expected output is:
month | product | monthly_sales | running_sales |
---|---|---|---|
9 | Circuit Breaker | 200 | 200 |
9 | PLC | 350 | 350 |
9 | Solar Panel | 300 | 300 |
10 | Circuit Breaker | 150 | 350 |
10 | Solar Panel | 600 | 900 |
11 | Circuit Breaker | 200 | 550 |
Here is a PostgreSQL query that achieves the desired result:
The function is used to get the year and month from the field. The function wrapped with the clause calculates the total sales for each product for each month (), and the running total sales (). The clause allows separate sums for each product, and the clause inside the clause is used to specify the order of rows - which are needed to calculate the running total. The outer and clauses ensure that the results are grouped and displayed in the correct order.
Pro Tip: Window functions are a frequent SQL interview topic, so practice every window function problem on DataLemur
For a tangible example of in PostgreSQL, suppose you were doing an HR Analytics project for Schneider Electric, and had access to Schneider Electric's contractors and employees data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables.
You could use operator to find all contractors who never were a employee using this query:
Note that is available in PostgreSQL and SQL Server, while is the equivalent operator which is available in MySQL and Oracle (but don't worry about knowing which RDBMS supports which exact commands since Schneider Electric interviewers aren't trying to trip you up on memorizing SQL syntax).
Schneider Electric holds customer data in their database, they would like to filter the data in the 'Customers' table in order to find all customers who have made a purchase within the last month (assuming current date is 2022-07-30), but have not made any purchase within the last seven days.
For this, they specifically want to filter down the records based on these criteria:
Assuming the relevant table in the database is 'Customers', structured as below:
customer_id | name | last_purchase_date |
---|---|---|
1 | Alice | 2022-07-10 |
2 | Bob | 2022-07-20 |
3 | Charlie | 2022-07-25 |
4 | David | 2022-06-30 |
5 | Eve | 2022-06-25 |
You can use the SQL clause for filtering and PostgreSQL's for marking a whole month and seven days respectively. The SQL query to perform the filtering is as follows:
This SQL query performs as follows:
It filters the customers who have made a purchase within the last month using .
It further filters from the above result set for customers who have not made any purchases in the last 7 days using .
Therefore, the customers returned from the query have made a purchase at least once in the last month, but not in the last week. undefined
Both clustered and non-clustered indexes are used to improve query performance, but they have different characteristics.
A clustered index determines the physical order of the data rows in a table, while a non-clustered index does not.This means that a table can have only one clustered index, but it can have multiple non-clustered indexes.
In terms of query performance, a clustered index is usually faster for searches that return a large number of records, while a non-clustered index is faster for searches that return a small number of records. However, updates to a clustered index are slower, as they require the data rows to be physically rearranged, while updates to a non-clustered index are faster, as they only require the index data structure to be updated.
Schneider Electric provides energy and automation digital solutions for efficiency and sustainability. Imagine, you are working with the energy usage datasets from various clients.
For an audit, Schneider Electric wants to know the average monthly energy consumption per client. You are given the following tables:
usage_id | client_id | measurement_date | kWh_consumed |
---|---|---|---|
101 | 201 | 2022-07-23 | 400 |
102 | 202 | 2022-07-24 | 550 |
103 | 203 | 2022-07-24 | 620 |
104 | 201 | 2022-08-23 | 430 |
105 | 202 | 2022-08-24 | 480 |
106 | 203 | 2022-08-24 | 590 |
107 | 201 | 2022-09-23 | 480 |
108 | 202 | 2022-09-24 | 510 |
109 | 203 | 2022-09-24 | 620 |
month | client_id | avg_energy |
---|---|---|
7 | 201 | 400.00 |
7 | 202 | 550.00 |
7 | 203 | 620.00 |
8 | 201 | 430.00 |
8 | 202 | 480.00 |
8 | 203 | 590.00 |
9 | 201 | 480.00 |
9 | 202 | 510.00 |
9 | 203 | 620.00 |
The query extracts the month from the measurement_date, groups by the month and client_id, and averages kWh_consumed for each group. This will give us the average energy consumption per client for each month.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for calculating average values grouped by month or this Alibaba Compressed Mean Question which is similar for working with consumption data.
The COALESCE() function returns the first non-NULL value from a list of values. This function is often used to replace a NULL with some default value, so that you can then take a or of some column without NULLs messing things up.
For example, suppose you ran a customer satisfaction survey for Schneider Electric and had statements like "I'd buy from Schneider Electric again". In the survey, customers would then answer how strongly they agreed with a statement on a scale of 1 to 5 (strongly disagree, disagree, neutral, agreee, strongly agree).
Because attention spans are short, many customers skipped many of the questions, and thus our survey data might be filled with NULLs:
customer_id | question_id | agree_scale |
---|---|---|
101 | 1 | 4 |
101 | 2 | 5 |
202 | 1 | 4 |
202 | 2 | NULL |
303 | 1 | 5 |
303 | 2 | NULL |
Before doing further analytics on this customer survey data, you could replace the NULLs in the column with the value of (because that corresponds to the default 'neutral' answer) using the function:
This would result in the following:
customer_id | question_id | agree_scale |
---|---|---|
101 | 1 | 4 |
101 | 2 | 5 |
202 | 1 | 4 |
202 | 2 | 3 |
303 | 1 | 5 |
303 | 2 | 3 |
Suppose Schneider Electric wants to analyze the power consumption of some of its electric equipment. Each device sends hourly data records with consumed power in kilowatts. The goal is to calculate the average monthly power consumption for each device in watts and round off the result to the nearest whole number. Given the occasional signal loss, some hourly records might be missing, and the NULL value is represented .
device_id | timestamp | power_kw |
---|---|---|
1 | 2022-03-01 00:00:00 | 0.50 |
2 | 2022-03-01 01:00:00 | 1.20 |
1 | 2022-03-01 02:00:00 | 0.30 |
2 | 2022-03-01 03:00:00 | NULL |
1 | 2022-04-01 00:00:00 | 0.80 |
In the above PostgreSQL query, the DATE_TRUNC function is used to get the month and year from each timestamp. Then, the average power consumption is calculated for each device every month by multiplying power_kw by 1000 (to convert it to watts), excluding records with NULL values. Finally, the result is rounded off to the nearest whole number using the ROUND function. The result is ordered by device_id and month_year for clear and logical output.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating total activity time for servers or this Google Odd and Even Measurements Question which is similar for handling IoT device measurements.
The key to acing a Schneider Electric SQL interview is to practice, practice, and then practice some more! In addition to solving the above Schneider Electric SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like FAANG and tech startups.
Each interview question has multiple hints, fully explained answers along with a discussion board to see how others solved it and crucially, there is an interactive SQL code editor so you can right in the browser run your SQL query and have it graded.
To prep for the Schneider Electric SQL interview it is also helpful to solve interview questions from other tech companies like:
However, if your SQL query skills are weak, don't worry about diving straight into solving questions – improve your SQL foundations with this SQL interview tutorial.
This tutorial covers topics including handling date/timestamp data and 4 types of JOINS – both of which show up often in Schneider Electric SQL assessments.
For the Schneider Electric Data Science Interview, beyond writing SQL queries, the other types of questions to prepare for are:
The best way to prepare for Schneider Electric Data Science interviews is by reading Ace the Data Science Interview. The book's got: