At Harley-Davidson, SQL is used for analyzing sales data for regional trends, and managing inventory databases. For this reason Harley-Davidson almost always evaluates jobseekers on SQL questions during interviews for Data Science, Data Engineering and Data Analytics jobs.
Thus, to help you prepare, we've curated 11 Harley-Davidson SQL interview questions – can you solve them?
Harley-Davidson wants to identify their 'VIP users' for a special recognition event. A 'VIP user' in this context is defined as someone who frequently rents Harley-Davidson motorcycles and provides a high number of quality reviews on their rented motorcycles. In other words, it's not just about quantity but also quality.
Write a SQL query to identify the top 10 customers based on their frequency of rentals and their average review scores for the year 2022.
Suppose you have access to three tables with the following data:
rental_id | user_id | rent_date | motorcycle_id |
---|---|---|---|
1024 | 15 | 01/21/2022 | 225 |
1025 | 22 | 02/18/2022 | 330 |
1026 | 15 | 03/10/2022 | 445 |
1027 | 33 | 04/01/2022 | 445 |
1028 | 44 | 05/20/2022 | 620 |
review_id | user_id | submit_date | motorcycle_id | stars |
---|---|---|---|---|
1701 | 15 | 01/25/2022 | 225 | 5 |
1702 | 22 | 02/19/2022 | 330 | 3 |
1703 | 15 | 03/12/2022 | 445 | 4 |
1704 | 33 | 04/03/2022 | 445 | 2 |
1705 | 44 | 05/22/2022 | 620 | 1 |
motorcycle_id | model | year |
---|---|---|
225 | Street 750 | 2022 |
330 | Iron 883 | 2022 |
445 | Roadster | 2022 |
620 | Softail Slim | 2022 |
To solve this, we'll SELECT the user_id, count the number of rentals as 'total_rentals', and the average review score as 'avg_score'. We'll then ORDER the result by total_rentals and avg_score in descending order and LIMIT the result to 10.
The above query will give a list of top 10 users who have the highest total rentals and average review scores for the year 2022. These users can be considered as 'VIP users'.
To practice a related customer analytics question on DataLemur's free online SQL coding environment, try this Microsoft Azure Cloud SQL Interview Question:
Imagine you had a table of Harley-Davidson employee salaries. Write a SQL query to find the employees who earn more than their direct manager.
employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns $8,000, surpassing her manager, William Davis who earns 7,800.
Try this interview question directly within the browser on DataLemur:
First, we perform a SELF-JOIN where we treat the first table () as the managers' table and the second table () as the employees' table. Then we use a clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
If the code above is confusing, you can find a step-by-step solution with hints here: Employees Earning More Than Their Boss.
The EXCEPT operator is used to return all rows from the first SELECT statement that are not returned by the second SELECT statement. Note that is available in PostgreSQL and SQL Server, and it's equivalent operator is called and is available in MySQL and Oracle.
For a tangible example, suppose you were doing an HR Analytics project for Harley-Davidson, and had access to Harley-Davidson's contractors and employees data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables. You could use operator to find all employees who never were a contractor using this query:
Harley-Davidson, the infamous motorcycle manufacturer, is keeping track of its monthly motorcycle sales. To understand the trend, they compiled following datasets, and .
The dataset has the following schema:
column | type |
---|---|
sale_id | int |
sale_date | datetime |
motorcycle_id | int |
dealership_id | int |
selling_price | float |
And dataset has the following schema:
column | type |
---|---|
motorcycle_id | int |
model | string |
year | int |
MSRP | float |
The task is to write a SQL query that provides monthly total sales units and sales value for each motorcycle model. The output should be sorted by model and date.
The following table shows an example input with sales data:
sale_id | sale_date | motorcycle_id | dealership_id | selling_price |
---|---|---|---|---|
123 | 2022-01-12 00:00:00 | 001 | 456 | 20000 |
124 | 2022-02-28 00:00:00 | 002 | 567 | 25000 |
125 | 2022-02-15 00:00:00 | 001 | 456 | 21000 |
126 | 2022-03-20 00:00:00 | 003 | 678 | 30000 |
127 | 2022-03-21 00:00:00 | 003 | 789 | 28000 |
The following table presents a set of motorcycle model data:
motorcycle_id | model | year | MSRP |
---|---|---|---|
001 | Iron 883 | 2022 | 9999 |
002 | Fat Boy | 2023 | 19999 |
003 | Roadster | 2022 | 11999 |
This query first joins the and tables on the column. It then groups the joined table by and to calculate the total number of units sold (COUNT) and total sales (SUM) for each month and model. The results are ordered by and for easy reading. The function is used to cut off the day and hour from the and keep only the year-month information for grouping.
To solve a related window function SQL problem on DataLemur's free interactive coding environment, try this Google SQL Interview Question:
To clarify the distinction between a primary key and a foreign key, let's examine employee data from Harley-Davidson's HR database:
:
+------------+------------+------------+------------+ | employee_id| first_name | last_name | manager_id | +------------+------------+------------+------------+ | 1 | Aubrey | Graham | 3 | | 2 | Marshal | Mathers | 3 | | 3 | Dwayne | Carter | 4 | | 4 | Shawn | Carter | | +------------+------------+------------+------------+
In this table, serves as the primary key. It uniquely identifies each employee and cannot be null.
functions as a foreign key, linking to the of the employee's manager. This establishes a relationship between Harley-Davidson employees and their managers, allowing for easy querying to find an employee's manager or see which employees report to a specific manager.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to connect each employee to their respective department and location.
Harley-Davidson is interested in understanding the sales distribution of it's various motorcycle models across different regions. They are particularly interested in how sales trends align with the release years of the motorcycles.
You have the following tables:
sale_id | customer_id | motorcycle_id | sale_date | region_id | price |
---|---|---|---|---|---|
1 | 1001 | 301 | 02/08/2020 | 2 | $25000 |
2 | 1025 | 456 | 04/22/2020 | 1 | $18000 |
3 | 1042 | 301 | 05/18/2020 | 3 | $25000 |
4 | 1052 | 789 | 08/26/2020 | 2 | $30000 |
5 | 1081 | 456 | 09/05/2020 | 1 | $18000 |
motorcycle_id | model_name | release_year |
---|---|---|
301 | 'Softail' | 2019 |
456 | 'Sportster' | 2020 |
789 | 'Touring' | 2020 |
The task here is to write a sql query that shows the total revenue from sales and the number of motorcycles sold per model per region per year. We also want to know how many years it has been since the model was released up to the year of sale.
This query first joins the 'sales' and 'motorcycles' tables on the 'motorcycle_id' column, and the 'sales' and 'regions' tables based on the 'region_id' column. The SELECT statement then returns the required fields - region name, model name, sale year, number of motorcycles sold per model per region per year (achieved using the 'COUNT(*)' aggregate function), total revenue (sum of sale prices), and the difference between the sale year and the release year to indicate the years since the motorcycle model was released. The result set is ordered by the region name, model name and sale year.
A cross-join, also known as a cartesian join, is a type of join that returns the cross-product of two tables. In a cross-join, each row from the first table is paired with every row from the second table, resulting in a new table that has a row for each possible combination of rows from the two input tables.
For example, say you worked on the Marketing Analytics team at Harley-Davidson, and needed to understand what advertising copy (text) goes best with what advertising creative (the photo that gets posted with the advertisement copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
You could this query to generate all possible combinations of ad copy and ad creative to help you create more effective ads for Harley-Davidson. Just be careful: if you had 1,000 rows of ad copy and 1,000 different image creatives, the resulting cross-join would have 1 million rows!
Check out this job description for a Product Quality Analyst at Harley-Davidson and see how they utilize SQL for their operations.
Given a database with information about motorcycles owned by Harley-Davidson's customers, write a SQL query to find the average mileage a bike model has traveled.
id | customer_id | model_id | mileage |
---|---|---|---|
1 | 101 | 601 | 12000 |
2 | 102 | 602 | 8500 |
3 | 103 | 601 | 15000 |
4 | 104 | 602 | 7000 |
5 | 105 | 603 | 5000 |
model_id | model_name |
---|---|
601 | 'Street 750' |
602 | 'Roadster' |
603 | 'Iron 883' |
model_name | avg_mileage |
---|---|
'Street 750' | 13500 |
'Roadster' | 7750 |
'Iron 883' | 5000 |
In the above SQL query, a JOIN operation is performed between the and tables using the as the matching attribute. Then, the average mileage () is calculated within each group of the same using the statement.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for requiring group-by and aggregation calculation or this Facebook Average Post Hiatus (Part 1) Question which is similar for requiring average calculation.
Harley-Davidson would like to understand their best selling motorcycle models for each month. Write a SQL query that will return the motorcycle model and the maximum number of units sold per month.
sale_id | sale_date | motorcycle_model | units_sold |
---|---|---|---|
1 | 01/06/2021 | M1 | 8 |
2 | 02/06/2021 | M2 | 6 |
3 | 01/06/2021 | M2 | 10 |
4 | 01/07/2021 | M1 | 12 |
5 | 15/07/2021 | M2 | 14 |
6 | 17/08/2021 | M1 | 9 |
7 | 25/08/2021 | M2 | 11 |
month | motorcycle_model | max_units_sold |
---|---|---|
6 | M1 | 8 |
6 | M2 | 10 |
7 | M1 | 12 |
7 | M2 | 14 |
8 | M1 | 9 |
8 | M2 | 11 |
The query works by first extracting the month from the sale_date column. It then groups the rows by month and motorcycle model. The MAX function is applied to the units_sold within each group (i.e., for each month-model combination), resulting in the maximum number of units sold per month for each motorcycle model. Finally, the result is sorted by month and motorcycle model for easy viewing.
A foreign key is a column or group of columns in a table that refers to the primary key in another table. The foreign key constraint helps maintain referential integrity between the two tables. The table with the foreign key is called the child table, while the table with the candidate key is called the parent or referenced table.
For example, consider a database with two tables: and . The Customers table might have a primary key column called , while the Harley-Davidson orders table might have a foreign key column called that references the column in Harley-Davidson customers table.
Harley-Davidson offers a unique customization service to its customers. Some prefer standard motorcycle models, while others choose to have unique customizations on their vehicles. As a data analyst at Harley-Davidson, your task is to filter down the customers' records and find out the customers who opted for special customizations on their motorcycles.
Suppose the and tables in the company's database are as shown below:
customer_id | customer_name | |
---|---|---|
101 | John Doe | johndoe@example.com |
102 | Jane Doe | janedoe@example.com |
103 | Peter Parker | peterparker@example.com |
order_id | customer_id | product_name | customized |
---|---|---|---|
501 | 101 | Harley Davidson Street 750 | No |
502 | 102 | Harley Davidson Roadster | Yes |
503 | 103 | Harley Davidson Iron 883 | No |
504 | 101 | Harley Davidson Fat Boy | Yes |
505 | 103 | Harley Davidson Street Rod | Yes |
Please write a SQL query to identify all the customers who opted for special customizations on their motorcycles.
The following PostgreSQL query can be used to solve this problem:
This query joins the and tables on and then filters the rows where is 'Yes'. The result includes the customer ids, names, and email addresses of all the customers who opted for the special motorcycle customizations.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Harley-Davidson SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the earlier Harley-Davidson SQL interview questions, you should also solve the 200+ SQL Interview Questions on DataLemur which come from companies like Google, Microsoft and Silicon Valley startups.
Each SQL question has hints to guide you, detailed solutions and most importantly, there's an online SQL code editor so you can right online code up your SQL query answer and have it graded.
To prep for the Harley-Davidson SQL interview it is also wise to solve SQL problems from other automotive companies like:
However, if your SQL coding skills are weak, don't worry about diving straight into solving questions – strengthen your SQL foundations with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers things like aggregate functions and cleaning text data – both of which come up routinely during Harley-Davidson interviews.
In addition to SQL query questions, the other question categories to practice for the Harley-Davidson Data Science Interview include:
I'm sort of biased, but I believe the best way to prep for Harley-Davidson Data Science interviews is to read my book Ace the Data Science Interview.
The book solves 201 interview questions taken from Microsoft, Amazon & startups. It also has a crash course on SQL, AB Testing & ML. And finally it's helped thousands of people land their dream job in data, which is why it's earned 1000+ reviews on Amazon and a 4.5-star rating.