Carvana employees write SQL queries daily for analyzing customer purchasing behaviors in the used car marketplace and optimizing vehicle pricing strategies based on real-time data. That's why Carvana almost always asks SQL problems in interviews for Data Science and Data Engineering positions.
Thus, to help you prep, here's 11 Carvana SQL interview questions – can you solve them?
Given the database of Carvana, assess the customer's database to find out the 'whale' customers. Whale customers are the ones who have bought the most number of cars and have spent the most in terms of purchase value in the last 12 months.
Consider the following tables:
purchase_id | user_id | purchase_date | car_id | purchase_price |
---|---|---|---|---|
3435 | 54 | 06/08/2021 | 3343 | $20,000 |
3876 | 298 | 09/14/2021 | 5625 | $25,000 |
2935 | 54 | 10/05/2021 | 4487 | $30,000 |
5832 | 432 | 11/23/2021 | 7812 | $18,000 |
4665 | 54 | 12/31/2021 | 3343 | $35,000 |
user_id | user_name | register_date |
---|---|---|
54 | John Doe | 01/10/2020 |
298 | Jane Smith | 03/12/2021 |
432 | Mary Johnson | 04/21/2020 |
In this query, we first create a CTE called "highvalue". Here, we select the user_id, the count of purchases (as num_purchases), and the sum of purchase_price (as total_spent) from the table where the purchase_date is within the last year. We group by user_id, order by both num_purchases (in descending order) and total_spent (also in descending order), and limit to the top 10 'whale' customers.
Finally, we join this "highvalue" CTE with the table on user_id and select all columns from "highvalue" and the user_name from the table. This gives us a list of the top 10 'whale' customers, including their names, along with the total amount they have spent and the number of purchases they have made in the last year.
To work on a similar customer analytics SQL question where you can solve it interactively and have your SQL solution automatically checked, try this Walmart Labs SQL Interview Question:
Given a table of Carvana employee salary data, write a SQL query to find the top 3 highest paid employees in each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
You can solve this problem and run your code right in DataLemur's online SQL environment:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
If the code above is confusing, you can find a step-by-step solution here: Top 3 Department Salaries.
Read about how Carvana is able to provide same day delivery in Richmond, VA.
A join in SQL allows you to retrieve data from multiple tables and combine it into a single set of results.
In SQL, there are four different types of JOINs. To demonstrate each kind, imagine you had two database tables, an advertising campaigns table which had information on Google Ads keywords and how much was bid for each keyword, and a sales table, which has data on how many products were sold and which Google Ads keyword drove that sale.
: An INNER JOIN retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the table and the table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the table matches the keyword in the table.
: A LEFT JOIN retrieves all rows from the left table (in this case, the table) and any matching rows from the right table (the Sales table). If there is no match in the right table, values will be returned for the right table's columns.
: A RIGHT JOIN retrieves all rows from the right table (in this case, the Sales table) and any matching rows from the left table (the table). If there is no match in the left table, values will be returned for the left table's columns.
: A FULL OUTER JOIN retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, values will be returned for the columns of the non-matching table.
Carvana is a company that deals with buying and selling cars. Assume that you are given sales records each containing the id of the car sold, the model of the car, the selling price, and the day the sale was made.
sale_id | car_model | selling_price | sale_date |
---|---|---|---|
1 | Toyota Camry | 15000 | 2022-01-10 |
2 | Honda Accord | 20000 | 2022-01-15 |
3 | Nissan Altima | 18000 | 2022-01-20 |
4 | Toyota Camry | 16000 | 2022-02-05 |
5 | Honda Accord | 21000 | 2022-02-14 |
6 | Nissan Altima | 19000 | 2022-02-25 |
7 | Toyota Camry | 15500 | 2022-03-01 |
8 | Honda Accord | 20500 | 2022-03-12 |
9 | Nissan Altima | 18500 | 2022-03-15 |
10 | Toyota Camry | 15250 | 2022-04-07 |
Write to return the average selling price for each car model per month.
month | car_model | avg_selling_price |
---|---|---|
1 | Toyota Camry | 15000.00 |
1 | Honda Accord | 20000.00 |
1 | Nissan Altima | 18000.00 |
2 | Toyota Camry | 16000.00 |
2 | Honda Accord | 21000.00 |
2 | Nissan Altima | 19000.00 |
3 | Toyota Camry | 15500.00 |
3 | Honda Accord | 20500.00 |
3 | Nissan Altima | 18500.00 |
4 | Toyota Camry | 15250.00 |
This SQL query first extracts the month part from the sale_date using the EXTRACT function. Then, it groups the sales by both the extracted month and the car model. For each group (i.e., for each unique combination of month and car model), it calculates the average selling price using the AVG function. The resulting table shows, for each month, the average selling price for each car model, ordered by month and car model.
To practice another window function question on DataLemur's free online SQL coding environment, solve this Google SQL Interview Question:
The function is used to remove NULLs, and replace them with another value.
For example, say you were a Data Analyst at Carvana and were doing a customer analytics project to see who the most marketing-engaged customers were, and had access to the below table.
carvana_customers:
customer_id | email_engagement | sms_engagement |
---|---|---|
101 | very_active | not_opted_in |
201 | un-subscribed | NULL |
301 | NULL | not_opted_in |
401 | not_active | very_active |
501 | very_active | mildly_active |
303 | NULL | NULL |
Before you could procede, you had to remove the NULLs, and replace them with the default value (not_active for email, and not_opted_in for SMS). To do this, you'd run the following query:
This would get you the following output:
customer_id | email_engagement | sms_engagement |
---|---|---|
101 | very_active | not_opted_in |
201 | un-subscribed | not_opted_in |
301 | not_active | not_opted_in |
401 | not_active | very_active |
501 | very_active | mildly_active |
303 | not_active | not_opted_in |
Assume Carvana is interested in analyzing the annual performance of their car sales. Besides the total number of sales, the company also wants to know the total sales value for each model and the favourite car model for each customer.
You're provided with two tables, and . The table contains all the car sales transactions, with columns: (unique identifier for the sale), (identifier for the car sold), (date of sale), and (identifier for the customer). The table contains information about each car, with columns: (unique identifier for the car), (car model), (sale value of the car).
sale_id | car_id | sale_date | customer_id |
---|---|---|---|
1 | 1 | 2022-01-01 | 1 |
2 | 2 | 2022-01-02 | 1 |
3 | 3 | 2022-01-02 | 2 |
4 | 1 | 2022-01-03 | 2 |
5 | 2 | 2022-01-04 | 3 |
car_id | model | value |
---|---|---|
1 | Model 1 | $25,000 |
2 | Model 2 | $30,000 |
3 | Model 3 | $15,000 |
4 | Model 4 | $35,000 |
5 | Model 5 | $20,000 |
This combined query fetches the total sales and total sales value for each model in the table and the most sold model for each customer.
This will return two sets of rows: the first will contain each car model, the total sales, and total sales value; the second will contain each customer's id and their most purchased car model.
combines the results from two or more SELECT statements, but only returns the rows that are exactly the same between the two sets.
For example, let's use to find all of Carvana's Facebook video ads with more than 10k views that are also being run on YouTube:
Carvana is an online used car retailer. We have a table containing the selling price and selling date of each car. The question is, could you write a SQL query to calculate the average selling price per year for each car model?
Consider the sample data below:
sales_id | sell_date | model | price |
---|---|---|---|
1001 | 01/10/2019 | Honda Accord | 20000 |
1002 | 05/20/2020 | Toyota Camry | 23000 |
1003 | 08/15/2021 | Honda Accord | 21000 |
1004 | 12/12/2020 | Toyota Camry | 22500 |
1005 | 03/26/2020 | Honda Accord | 19500 |
We want to calculate the average selling price per year for each car model.
year | model | avg_price |
---|---|---|
2019 | Honda Accord | 20000.00 |
2020 | Toyota Camry | 22750.00 |
2020 | Honda Accord | 19500.00 |
2021 | Honda Accord | 21000.00 |
Here's a PostgreSQL query to accomplish this:
This query uses the function to get the year from the of each sale. It then groups the sales by year and model, and calculates the average () selling price for each group. The clause orders the results by year and model.
To practice a very similar question try this interactive Wayfair Y-on-Y Growth Rate Question which is similar for requiring sales aggregation by year, or this Amazon Average Review Ratings Question which is similar for average calculations and date usage.
Carvana sells used cars online and they are interested in analyzing their data to understand their click-through to add-to-cart conversion rates. Given the tables and :
Table :
click_id | user_id | click_time | vehicle_id |
---|---|---|---|
1 | 123 | 2022-07-12 12:10:00 | 100 |
2 | 265 | 2022-07-12 12:11:00 | 200 |
3 | 362 | 2022-07-12 12:12:00 | 300 |
4 | 123 | 2022-07-12 12:14:00 | 200 |
5 | 265 | 2022-07-12 12:15:00 | 100 |
Table :
cart_id | user_id | add_to_cart_time | vehicle_id |
---|---|---|---|
1 | 123 | 2022-07-12 12:20:00 | 100 |
2 | 123 | 2022-07-12 12:24:00 | 200 |
3 | 362 | 2022-07-12 12:25:00 | 300 |
4 | 123 | 2022-07-12 12:16:00 | 400 |
5 | 265 | 2022-07-12 12:17:00 | 500 |
Calculate the daily click-through to add-to-cart conversion rate. Assume that the add-to-cart action has to occur on the same day but not necessarily immediately after the click.
This query first groups the data by day, since we want to calculate the daily conversion rate. The conversion rate is calculated by dividing the number of distinct add-to-cart actions by the number of distinct clicks. The left join ensures that all clicks are included in the calculation, even if they did not lead to an add-to-cart action.
To solve a similar problem about calculating rates, solve this SQL interview question from TikTok within DataLemur's online SQL code editor:
A correlated sub-query is one that relies on the outer query and cannot be run independently. It uses the outer query to filter or transform data by referencing a column from it, while the outer query uses the results of the inner query. On the other hand, a non-correlated sub-query is not linked to the outer query and can be run on its own. It does not reference any columns from the outer query and is used to retrieve additional data for the outer query.
Here is an example of a correlated sub-query:
This query selects the and total sales of all Carvana customers in the sales table whose are greater than the average of their own customer group. The sub-query in this case is correlated with the outer query, as it references the column from the outer query ().
Here is an example of a non-correlated sub-query:
The sub-query in this case is non-correlated, as it does not reference any columns from the outer query.
Correlated sub-queries are slower to execute, as they have to be re-evaluated for each row of the outer query, while non-correlated sub-queries are faster, as they only have to be executed once.
Carvana deals with purchasing, selling and financing used cars. Imagine if Carvana wants to identify the customers who have titles containing a specific word, say 'Manager', in order to offer them a special deal.
Your task is to write a SQL query that filters the customer records based on the 'Title' column and returns all records where the title contains the word 'Manager'.
customer_id | first_name | last_name | title | |
---|---|---|---|---|
543 | John | Doe | johndoe@email.com | General Manager |
768 | Jane | Smith | janesmith@email.com | Product Manager |
321 | Peter | Johnson | peterj@email.com | Software Developer |
997 | Jenny | Lee | jennyl@email.com | Marketing Manager |
745 | Andrew | Brown | andrewb@email.com | Sales Associate |
customer_id | first_name | last_name | title | |
---|---|---|---|---|
543 | John | Doe | johndoe@email.com | General Manager |
768 | Jane | Smith | janesmith@email.com | Product Manager |
997 | Jenny | Lee | jennyl@email.com | Marketing Manager |
This PostgreSQL query selects all columns of records in the 'customers' table where the customer's title contains the word 'Manager'. The character is a wildcard, so any title containing the word 'Manager' anywhere in the string will be included in the result.
The output contains only the records of customers having 'Manager' in their title.
The key to acing a Carvana SQL interview is to practice, practice, and then practice some more! Besides solving the above Carvana SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Amazon, Microsoft, Meta, and smaller tech companies.
Each problem on DataLemur has multiple hints, step-by-step solutions and crucially, there's an online SQL coding environment so you can easily right in the browser your SQL query answer and have it graded.
To prep for the Carvana SQL interview it is also a great idea to solve SQL questions from other automotive companies like:
In case your SQL foundations are weak, don't worry about diving straight into solving questions – improve your SQL foundations with this interactive SQL tutorial.
This tutorial covers things like math functions and SUM/AVG window functions – both of which show up often in Carvana interviews.
In addition to SQL interview questions, the other question categories tested in the Carvana Data Science Interview include:
The best way to prepare for Carvana Data Science interviews is by reading Ace the Data Science Interview. The book's got: