At Lumentum, SQL is crucial for analyzing manufacturing data and optimizing supply chain efficiencies. That's why Lumentum LOVES to ask SQL query questions in interviews for Data Science, Data Engineering and Data Analytics jobs.
To help you prepare for the Lumentum SQL interview, we've collected 9 Lumentum SQL interview questions – able to solve them?
Lumentum makes optical and photonic products for corporate clients. One crucial customer activity is project initiation, where a customer client starts a new project using Lumentum products. The more projects a client starts, the more valuable they are to the company. A "VIP" client shall be defined as a client company who has started more than 3 projects in a month. Write a SQL query to identify these VIP client companies.
projects
Example Input:project_id | client_id | start_date | product_id |
---|---|---|---|
1001 | 425 | 06/02/2022 | 25 |
1002 | 425 | 06/05/2022 | 12 |
1003 | 425 | 06/20/2022 | 37 |
1004 | 725 | 06/05/2022 | 610 |
1005 | 425 | 06/25/2022 | 12 |
1006 | 725 | 07/02/2022 | 25 |
month | year | client_id |
---|---|---|
6 | 2022 | 425 |
SELECT EXTRACT(MONTH FROM start_date) AS month, EXTRACT(YEAR FROM start_date) AS year, client_id FROM projects GROUP BY month, year, client_id HAVING COUNT(project_id) > 3;
The above query works by first extracting the month and year from the project start date. It then groups the data by month, year, and client ID and counts the number of projects each client has started during each month of each year. The HAVING clause filters the data so that only clients who have started more than 3 projects in a month are included in the final result.
To solve a related customer analytics question on DataLemur's free online SQL coding environment, try this recently asked Microsoft SQL interview question:
Imagine you're a data analyst at Lumentum, and you're presented with the task of determining the average monthly sales volume per product. Thesales
table contains real-time sales data, including the product id, sales volume, and the time of each transaction. Lumentum wants to make sure that their product portfolio is balanced, and they need to make key decisions based on monthly performance of each product. Your task is to write a PostgreSQL query that calculates the total sales volume for each product per month using window functions.
sales
Example Inputtransaction_id | transaction_time | product_id | volume |
---|---|---|---|
001 | 2022-06-01 10:00:00 | A1 | 1000 |
002 | 2022-06-01 11:00:00 | A2 | 1500 |
003 | 2022-06-15 15:00:00 | A1 | 1000 |
004 | 2022-07-01 10:00:00 | A1 | 1000 |
005 | 2022-07-02 11:00:00 | A2 | 1500 |
006 | 2022-07-15 15:00:00 | A2 | 1500 |
We'll use the SUM()
function as a window function here in combination with the OVER()
clause and PARTITION BY
:
WITH monthly_sales AS ( SELECT product_id, DATE_TRUNC('month', transaction_time) AS month, volume FROM sales ) SELECT product_id, month, SUM(volume) OVER (PARTITION BY product_id, month) AS monthly_volume FROM monthly_sales ORDER BY product_id, month;
This query first generates a Common Table Expression (CTE) monthly_sales
where it extracts the month from the transaction time and selects the product id and volume. Then, it calculates the monthly sales volume per product using the window SUM()
function and sorts the result by product id and month. The PARTITION BY
clause categorizes the data into partitions based on product id and month, and the SUM()
function is applied to each of these partitions separately.
For more window function practice, try this Uber SQL Interview Question on DataLemur's online SQL coding environment:
UNION ALL
typically give the same results as a FULL OUTER JOIN
?For all practical purposes, UNION ALL
and FULL OUTER JOIN
do NOT produce the same results.
While both are similar in that they combine two tables, you can think of joins as increasing the width of the resulting table (you'll have more columns in the result set for a left/inner/right join), whereas a union is used to combine rows which increases the height of the result set but keeps the column count the same.
If your familiar with set operations, you can think of a UNION
as set addition, whereas a JOIN
is set multiplication.
Lumentum is a high-tech company producing optical and photonic products driving networking, telecommunications, and data transmission advancements. They rely on many different components to manufacture their products. Assume for a moment you have been hired as their Database Manager and they present you with the following business problem:
"We have a table of products (products
), a table for the components needed to make those products (components
), and an associative table (product_components
) detailing how many of each component go into each product. We want a way to quickly determine the total quantities of each component required to manufacture a given quantity of a product. For a start, we need an SQL query to solve this for product_id 50001 with a quantity of 50 units. Can you help us?"
The tables are structured as follows:
products
Sample Input:product_id | product_name | description |
---|---|---|
50001 | Optical Amplifier | An amplifier designed to amplify optical signals |
69852 | Fiber Laser | A laser in which the active gain medium is an optical fiber |
components
Sample Input:component_id | component_name | description |
---|---|---|
1 | Optical Diode | A crucial component for any optical device |
2 | Laser Pump | A component that excites the gain medium in a laser |
product_components
Sample Input:product_id | component_id | qty_required |
---|---|---|
50001 | 1 | 4 |
50001 | 2 | 3 |
69852 | 1 | 2 |
69852 | 2 | 6 |
Based on the problem statement and the provided tables, the following SQL query will provide us the required solution:
SELECT p.product_id, p.product_name, pc.component_id, c.component_name, (pc.qty_required * 50) AS total_qty_required FROM products p JOIN product_components pc ON p.product_id = pc.product_id JOIN components c ON pc.component_id = c.component_id WHERE p.product_id = 50001;
This query joins the three tables and multiplies the component quantity required by the production units for the specific product (in this case, 50 units of product_id 50001). It will provide a breakdown of all the components needed and the total quantity required of each to manufacture the desired quantity of the product.
UNION
and UNION ALL
?The UNION
operator combines two or more results from multiple SELECT queries into a single result. If it encounters duplicate rows, the multiple copies are removed (so there's only one instance of each would-be duplicate in the result set). Here's an example of a UNION
operator which combines all rows from table1
and table2
(making sure each row is unique):
SELECT columns FROM table1 UNION SELECT columns FROM table2;
The UNION ALL
operator is similar to the UNION
operator but it does NOT remove duplicate rows!
For Lumentum, a company that is a market-leading manufacturer of innovative optical and photonics products, you are required to write a query that computes the average sales value of each product category over each quarter of the year 2022.
Consider the following tables:
sales
Example Input:sales_id | product_id | sale_date | units_sold | unit_price |
---|---|---|---|---|
256 | 101 | 02/15/2022 | 20 | 100 |
789 | 102 | 03/20/2022 | 15 | 200 |
634 | 103 | 06/10/2022 | 10 | 300 |
987 | 101 | 08/14/2022 | 25 | 100 |
453 | 102 | 12/09/2022 | 30 | 200 |
products
Example Input:product_id | category |
---|---|
101 | Lasers |
102 | Optics |
103 | Lasers |
The query should return the average sales values (i.e., unit price multiplied by units sold) per quarter for each product category.
quarter | category | avg_sales_value |
---|---|---|
1 | Lasers | 2000.00 |
1 | Optics | 3000.00 |
2 | Lasers | 3000.00 |
4 | Optics | 6000.00 |
SELECT DATE_PART('quarter', sale_date) as quarter, p.category, AVG(s.unit_price * s.units_sold) as avg_sales_value FROM sales s JOIN products p on s.product_id = p.product_id GROUP BY DATE_PART('quarter', sale_date), p.category;
This query first joins the sales table with the products table on the product_id column. Then it uses PostgreSQL's DATE_PART
function to extract the quarter from the sale_date
field. The aggregation function AVG
is used to calculate the average sales value (unit_price * units_sold) for each group, where the groups are defined by the GROUP BY
clause on the quarter and category fields.
To find records in one table that aren't in another, you can use a LEFT JOIN
and check for NULL
values in the right-side table.
Here's an example using two tables, Lumentum employees and Lumentum managers:
SELECT * FROM lumentum_employees LEFT JOIN lumentum_managers ON lumentum_employees.id = lumentum_managers.id WHERE lumentum_managers.id IS NULL;
This query returns all rows from Lumentum employees where there is no matching row in managers based on the id
column.
You can also use the EXCEPT
operator in PostgreSQL and Microsoft SQL Server to return the records that are in the first table but not in the second. Here is an example:
SELECT * FROM lumentum_employees EXCEPT SELECT * FROM lumentum_managers
This will return all rows from employees that are not in managers. The EXCEPT
operator works by returning the rows that are returned by the first query, but not by the second.
Note that EXCEPT
isn't supported by all DBMS systems, like in MySQL and Oracle (but have no fear, since you can use the MINUS
operator to achieve a similar result).
Lumentum, as a technology company, has numerous clients dealing with varying degrees of technology. But, for a new campaign, the company needs to target clients whose names explicitly include the word 'Tech'.
Write a SQL query that will allow Lumentum to fetch all clients from their database whose names consist of the word 'Tech'.
clients
Example Input:client_id | client_name | registration_date | country |
---|---|---|---|
4450 | TechSolutions | 01/01/2019 | USA |
3839 | HealthTech | 11/07/2018 | Canada |
1923 | Biotech Firm | 10/09/2018 | Germany |
5832 | Supreme Tech Ltd. | 12/12/2020 | UK |
0928 | Agri Products | 02/05/2018 | USA |
client_id | client_name | country |
---|---|---|
4450 | TechSolutions | USA |
3839 | HealthTech | Canada |
1923 | Biotech Firm | Germany |
5832 | Supreme Tech Ltd. | UK |
SELECT client_id, client_name, country FROM clients WHERE client_name LIKE '%Tech%';
This SQL query uses the LIKE
keyword as a filter inside a WHERE
condition. The '%' sign is a wild card that matches any number of characters. So, the query will return any client whose name contains 'Tech' anywhere in their name. The selection includes the client_id, client_name, and country, as per the request of the question.
You are given two tables customers
and orders
. Table customers
contains information about the customers and table orders
contains information about the orders made by these customers.
You are required to write a SQL query to find the total amount spent per customer and also identify the most expensive product purchased by each customer. Join these tables together and present the information in a single table.
customers
Example Input:customer_id | first_name | last_name |
---|---|---|
1 | John | Doe |
2 | Jane | Doe |
3 | Mary | Smith |
orders
Example Input:order_id | customer_id | product_name | price |
---|---|---|---|
1001 | 1 | iPhone X | 700 |
1002 | 1 | AirPods Pro | 200 |
1003 | 2 | Samsung Galaxy S21 | 800 |
1004 | 2 | Galaxy Buds Pro | 199 |
1005 | 3 | iPhone 12 Pro | 999 |
1006 | 3 | AirPods Pro | 200 |
WITH customer_expenditure AS ( SELECT customer_id, SUM(price) AS total_spent, MAX(price) AS most_expensive_purchase FROM orders GROUP BY customer_id ) SELECT c.first_name, c.last_name, o.product_name AS most_expensive_product, ce.total_spent FROM customers c JOIN customer_expenditure ce ON c.customer_id = ce.customer_id JOIN orders o ON c.customer_id = o.customer_id AND ce.most_expensive_purchase = o.price;
This query first creates a subquery customer_expenditure
that computes the total amount spent and the cost of the most expensive purchase for each customer from the orders
table. This subquery is then joined on the customers
table using the customer_id
field to get the customer details. Finally, it is joined again on the orders
table to retrieve the name of the most expensive product, ensuring we are matching on both the customer_id
and the most expensive purchase price. The final output will provide information about each customer's first name, last name, the most expensive product they've purchased, and the total amount they've spent.
Because joins come up routinely during SQL interviews, take a stab at this SQL join question from Spotify:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews.
Besides solving the earlier Lumentum SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Netflix, Airbnb, and Amazon.
Each interview question has multiple hints, full answers and crucially, there's an interactive coding environment so you can right in the browser run your SQL query and have it checked.
To prep for the Lumentum SQL interview it is also useful to solve SQL problems from other tech companies like:
However, if your SQL skills are weak, don't worry about jumping right into solving questions – improve your SQL foundations with this DataLemur SQL Tutorial.
This tutorial covers SQL topics like filtering data with WHERE and filtering on multiple conditions using AND/OR/NOT – both of these come up routinely during Lumentum SQL interviews.
In addition to SQL interview questions, the other types of problems to practice for the Lumentum Data Science Interview are:
The best way to prepare for Lumentum Data Science interviews is by reading Ace the Data Science Interview. The book's got: