Data Scientists, Analysts, and Data Engineers at Garrett Motion write SQL queries as a big part of their job. They use SQL for analyzing vast quantities of automotive design and performance data, and managing relational databases. Because of this, Garrett Motion evaluates jobseekers on SQL coding interview questions.
Thus, to help you prepare, here's 10 Garrett Motion SQL interview questions – can you solve them?
Garrett Motion, a leading differentiated technology provider in the automotive industry, has myriad products being reviewed constantly by customers on their online platforms. As a data analyst, you are asked to provide a monthly analysis of their products based on the average rating received from customers' reviews.
Using the reviews
table (formatted as below), write a SQL query to calculate the average rating per product per month.
reviews
Example Input:|**review_id**|**user_id**|**submit_date**|**product_id**|**stars**|
|:----|:----|:----|:----|:----|
|1|123|2019-01-15|1|4|
|2|567|2019-01-18|2|5|
|3|234|2019-02-03|1|3|
|4|234|2019-02-05|1|2|
|5|567|2019-03-09|2|4|
|6|123|2019-03-12|2|3|
|7|234|2019-03-15|1|5|
|8|123|2019-01-17|2|3|
|**month**|**product_id**|**avg_rating**|
|:----|:----|:----|
|01|1|4|
|01|2|4|
|02|1|2.5|
|03|1|5|
|03|2|3.5|
In PostgreSQL, the syntax for window functions follows the form function_name() OVER ([PARTITION BY <columns>] [ORDER BY <columns>])
.
Here is a possible answer for this interview question.
SELECT to_char(submit_date, 'MM') as month, product_id, avg(stars) OVER (PARTITION BY to_char(submit_date, 'MM'), product_id) as avg_rating FROM reviews ORDER BY month, product_id;
This query starts by converting the submit_date to month format using to_char
function. Then the avg
function is used to compute monthly average ratings per product. The OVER
clause followed by PARTITION BY
ensures that the average is calculated for each product separately per month.
The ORDER BY
at the end orders the result set by month and product_id, making the results easier to understand.
For more window function practice, solve this Uber SQL problem within DataLemur's online SQL code editor:
Given a table of Garrett Motion employee salary data, write a SQL query to find the 2nd highest salary amongst all the employees.
employees
Example Input:employee_id | salary |
---|---|
1 | 2500 |
2 | 800 |
3 | 1000 |
4 | 1200 |
second_highest_salary |
---|
1200 |
Code your solution to this question directly within the browser on DataLemur:
SELECT MAX(salary) AS second_highest_salary FROM employee WHERE salary < ( SELECT MAX(salary) FROM employee );
You can find a detailed solution here: 2nd Highest Salary.
A transaction is a one or more SQL commands which are executed as a singular unit if the transaction -commits- (or no execution hapens if the transaction -aborts-).
For transactions, a DBMS is supposed to enforce the follwing ACID properties: Atomicity, Consistency, Isolation, & Durability.
Here's what each one means:
As you can see, it's pretty important for the multiple databases where Garrett Motion store's it's data to be ACID-compliant!
In Garrett Motion Inc., the quality control department needs to analyze the performance of each type of turbocharger product based on customer feedback. A customer provides feedback by giving a rating after the purchase.
The 'customers' table contains customer information; 'products' table holds data for the different turbochargers; and 'reviews' table stores customer feedback. The task is to write a SQL query to calculate the average rating of each turbocharger product for the last quarter of 2022.
The tables are as follows:
customers
Example Input:customer_id | first_name | last_name |
---|---|---|
1 | John | Doe |
2 | Jane | Doe |
3 | Bob | Smith |
products
Example Input:product_id | product_name |
---|---|
1 | Turbocharger A |
2 | Turbocharger B |
3 | Turbocharger C |
reviews
Example Input:review_id | customer_id | product_id | rating | review_date |
---|---|---|---|---|
1 | 1 | 1 | 5 | 2022-10-30 |
2 | 2 | 2 | 4 | 2022-11-15 |
3 | 1 | 2 | 3 | 2022-12-20 |
4 | 3 | 3 | 3 | 2022-09-30 |
5 | 3 | 1 | 4 | 2022-12-25 |
Here is a PostgreSQL query that can solve this problem:
SELECT p.product_name, ROUND(AVG(r.rating),2) AS avg_rating FROM reviews r JOIN products p ON r.product_id = p.product_id WHERE r.review_date BETWEEN '2022-10-01' AND '2022-12-31' GROUP BY p.product_name ORDER BY avg_rating DESC;
The above SQL block joins 'reviews' and 'products' tables based on product_id
and calculates the average rating for each product for reviews written between October 1, 2022, and December 31, 2022. Results are displayed in descending order of average rating. The AVG function calculates the average rating, and the ROUND function is used to limit the result to two decimal places.
COALESCE()
do, and when would you use this function?The COALESCE() function returns the first non-NULL value from a list of values. This function is often used to replace a NULL with some default value, so that you can then take a SUM()
or AVG()
of some column without NULLs messing things up.
For example, suppose you ran a customer satisfaction survey for Garrett Motion and had statements like "I'd buy from Garrett Motion again". In the survey, customers would then answer how strongly they agreed with a statement on a scale of 1 to 5 (strongly disagree, disagree, neutral, agreee, strongly agree).
Because attention spans are short, many customers skipped many of the questions, and thus our survey data might be filled with NULLs:
customer_id | question_id | agree_scale |
---|---|---|
101 | 1 | 4 |
101 | 2 | 5 |
202 | 1 | 4 |
202 | 2 | NULL |
303 | 1 | 5 |
303 | 2 | NULL |
Before doing further analytics on this customer survey data, you could replace the NULLs in the agree_level
column with the value of 3
(because that corresponds to the default 'neutral' answer) using the COALESCE
function:
SELECT customer_id, question_id, COALESCE(agree_scale, 3) as agree_scale FROM garrett_motion_customer_survey;
This would result in the following:
customer_id | question_id | agree_scale |
---|---|---|
101 | 1 | 4 |
101 | 2 | 5 |
202 | 1 | 4 |
202 | 2 | 3 |
303 | 1 | 5 |
303 | 2 | 3 |
Garrett Motion, a manufacturer of automotive turbochargers, wants to find the average time it takes to assemble each turbo model over the span of a month. This will help with project management, scheduling and streamlining the assembly process.
Assume that we have an assembly
table:
assembly
Example Input:assembly_id | turbo_model | assembly_date | assembly_time(min) |
---|---|---|---|
1001 | T123 | 06/01/2022 | 30 |
1002 | T456 | 06/02/2022 | 40 |
1003 | T123 | 06/02/2022 | 35 |
1004 | T789 | 06/03/2022 | 45 |
1005 | T123 | 06/03/2022 | 32 |
1006 | T456 | 06/05/2022 | 42 |
1007 | T789 | 06/10/2022 | 43 |
1008 | T123 | 07/01/2022 | 29 |
1009 | T456 | 07/02/2022 | 41 |
1010 | T789 | 07/02/2022 | 44 |
We want to find the average assembly time in minutes for each turbo_model
in the month of June.
mth | turbo_model | avg_assembly_time(min) |
---|---|---|
6 | T123 | 32.33 |
6 | T456 | 41.00 |
6 | T789 | 44.00 |
SELECT EXTRACT(MONTH FROM assembly_date) AS mth, turbo_model, AVG(assembly_time) AS avg_assembly_time FROM assembly WHERE EXTRACT(MONTH FROM assembly_date) = 6 GROUP BY mth, turbo_model;
In this case, we are taking the average of the assembly time (assembly_time
) for each turbo model (turbo_model
) where the month of the assembly date (assembly_date
) is 6 (representing June). The EXTRACT(MONTH FROM assembly_date)
function is used to get the month from the assembly_date
. Then grouping by the month and turbo model to calculate the average assembly time for each turbo model for the specified month.
To practice a very similar question try this interactive Tesla Unfinished Parts Question which is similar for analyzing assembly data or this Amazon Server Utilization Time Question which is similar for calculating time requirements.
Read Garrett Motion's story on how they got started and became the company they are known for today.
UNION
and UNION ALL
operators?UNION
is used to combine the output of multiple SELECT
statements into one big result!
Suppose you were a Data Analyst at Garrett Motion working on a Sales Analytics project, and had data on sales leads from both the company's Salesforce CRM and it's legacy Hubspot CRM. To write a query to analyze leads created before 2023 started, across both CRMs, you could use UNION
in the following way:
SELECT email, job_title, company_id FROM garrett_motion_sfdc_leads WHERE created_at < '2023-01-01'; UNION SELECT email, job_title, company_id FROM garrett_motion_hubspot_leads WHERE created_at < '2023-01-01'
UNION
filters out duplicates, so each email/job title/company only shows up once (even if it's in both Salesforce and HubSpot CRMs).
On the other hand, the UNION ALL
operator does NOT filter out duplicates, so if a lead shows up in both CRMs, it'll be listed twice in the output of UNION ALL
.
Garrett Motion is a global technology leader that designs, manufactures and sells Electric Boosting and Automotive Software. Imagine you are a data analyst in the company and you are tasked with the following:
Garrett Motion has recently started an online advertisement campaign for its premier auto parts, aiming to convert its ad viewer into purchasers on its e-commerce store. The ads are shown to users when they visit different web pages, and users can click the ads to view the products on the company website. They can then add these products to their cart and then purchase them.
Your task is to help the company understand the click-through conversion rates from viewing a product, to adding a product to the cart, and finally making a purchase.
ad_clicks
Example Input:|**click_id**|**user_id**|**ad_id**|**timestamp**|
|:----|:----|:----|:----|
|1|11|21|2022-08-01 00:00:00|
|2|12|22|2022-08-01 01:00:00|
|3|13|23|2022-08-01 02:00:00|
|4|14|24|2022-08-01 03:00:00|
cart_adds
Example Input:|**add_id**|**user_id**|**product_id**|**timestamp**|
|:----|:----|:----|:----|
|101|11|221|2022-08-01 00:10:00|
|102|12|222|2022-08-01 01:10:00|
|103|13|223|2022-08-01 03:10:00|
purchases
Example Input:|**purchase_id**|**user_id**|**product_id**|**timestamp**|
|:----|:----|:----|:----|
|1001|11|221|2022-08-01 00:20:00|
|1002|12|222|2022-08-01 02:20:00|
WITH clicked AS ( SELECT *, RANK() OVER (PARTITION BY user_id ORDER BY timestamp) as click_order_by_user FROM ad_clicks ), added AS ( SELECT *, RANK() OVER (PARTITION BY user_id ORDER BY timestamp) as add_order_by_user FROM cart_adds ), purchased AS ( SELECT *, RANK() OVER (PARTITION BY user_id ORDER BY timestamp) as purchase_order_by_user FROM purchases ) SELECT COUNT(DISTINCT clicked.user_id) as total_clicks, COUNT(DISTINCT added.user_id) as total_added_to_cart, COUNT(DISTINCT purchased.user_id) as total_purchases, COUNT(DISTINCT added.user_id)::float / COUNT(DISTINCT clicked.user_id) as click_to_cart_conversion_rate, COUNT(DISTINCT purchased.user_id)::float / COUNT(DISTINCT added.user_id) as cart_to_purchase_conversion_rate FROM clicked LEFT JOIN added ON clicked.user_id = added.user_id AND clicked.click_order_by_user = added.add_order_by_user LEFT JOIN purchased ON added.user_id = purchased.user_id AND added.add_order_by_user = purchased.purchase_order_by_user
We are counting the amount of unique users that reached each stage of the funnel. We then divide these counts to get the conversion rates between each stage. We take care to join on both user_id and the order of actions with ranking to ensure that these actions are actually connected and in correct chronological order.
To solve a similar problem about calculating rates, solve this SQL interview question from TikTok within DataLemur's online SQL code editor:
As an analyst for Garrett Motion which specializes in selling turbochargers and related products, you have access to the sales records for each product. Garrett Motion company wants to get a better understanding of their sales data in a monthly manner. For this specific task, you are asked to determine the average income per month for each product over the past year.
sales
Example Input:sale_id | product_id | sale_date | product_price |
---|---|---|---|
1251 | 100 | 01/03/2021 | 500 |
2302 | 200 | 02/15/2021 | 600 |
4521 | 100 | 03/28/2021 | 500 |
6453 | 300 | 04/12/2021 | 400 |
7810 | 200 | 05/25/2021 | 600 |
month | product | avg_income |
---|---|---|
1 | 100 | 500 |
2 | 200 | 600 |
3 | 100 | 500 |
4 | 300 | 400 |
5 | 200 | 600 |
SELECT EXTRACT(MONTH FROM sale_date) as month, product_id as product, AVG(product_price) as avg_income FROM sales GROUP BY month, product ORDER BY month, product;
This PostgreSQL query uses EXTRACT(MONTH FROM sale_date)
to get the month of each sale, groups the sales by both month and product_id, and then calculates the average income for each combination using AVG(product_price)
. The results are ordered first by month and then by product_id.
CHECK
constraint do, and when might you use it?The CHECK constraint is used to set a rule for the data in a column. If a row is inserted or updated and the data in the column does not follow the rule specified by the CHECK constraint, the operation will be unsuccessful.
For example, say you had Garrett Motion customer data. You could use a CHECK constraint to ensure that the email column contains only properly formatted email addresses, or that the age column contains only positive integer. Here's an example of that:
CREATE TABLE garrett_motion_customers ( id INT PRIMARY KEY, email VARCHAR(255) CHECK (email REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'), age INT CHECK (age > 0) );
The key to acing a Garrett Motion SQL interview is to practice, practice, and then practice some more! Besides solving the earlier Garrett Motion SQL interview questions, you should also solve the 200+ SQL Interview Questions on DataLemur which come from companies like Facebook, Google and unicorn tech startups.
Each SQL question has hints to guide you, full answers and crucially, there is an online SQL coding environment so you can instantly run your SQL query and have it executed.
To prep for the Garrett Motion SQL interview you can also be a great idea to practice SQL questions from other automotive companies like:
In case your SQL skills are weak, forget about jumping right into solving questions – strengthen your SQL foundations with this DataLemur SQL Tutorial.
This tutorial covers SQL topics like Self-Joins and aggreage functions like MIN()/MAX() – both of these pop up often in SQL job interviews at Garrett Motion.
In addition to SQL interview questions, the other types of problems covered in the Garrett Motion Data Science Interview include:
I'm a tad biased, but I believe the best way to prep for Garrett Motion Data Science interviews is to read the book I wrote: Ace the Data Science Interview.
The book has 201 data interview questions sourced from tech companies like Google & Microsoft. It also has a refresher covering SQL, Product-Sense & ML. And finally it's vouched for by the data community, which is how it's gotten to be an Amazon best-seller with over 1000+ 5-star reviews.