At eBay, SQL is used all the damn time for analyzing customer behavior trends and managing product inventory databases. That's why eBay made an in-house SQL tool which has over 5,000 employee users. Because of this, it shouldn't surprise you that eBay often asks SQL interview questions during interviews for Data Analyst, Data Science, and BI jobs.
So, in case you're studying for a SQL Interview, here’s 11 eBay SQL interview questions to practice, which are similar to recently asked questions at eBay – able to answer them all?
Assume that you are given the table below containing information on various orders made by eBay customers. Write a query to obtain the user IDs and number of products purchased by the top 3 customers; these customers must have spent at least $1,000 in total.
Output the user id and number of products in descending order. To break ties (i.e., if 2 customers both bought 10 products), the user who spent more should take precedence.
Column Name | Type |
---|---|
transaction_id | integer |
product_id | integer |
user_id | integer |
spend | decimal |
transaction_id | product_id | user_id | spend |
---|---|---|---|
131432 | 1324 | 128 | 699.78 |
131433 | 1313 | 128 | 501.00 |
153853 | 2134 | 102 | 1001.20 |
247826 | 8476 | 133 | 1051.00 |
247265 | 3255 | 133 | 1474.00 |
136495 | 3677 | 133 | 247.56 |
user_id | product_num |
---|---|
133 | 3 |
128 | 2 |
102 | 1 |
This is the same question as problem #5 in the SQL Chapter of Ace the Data Science Interview!
To solve this question, try this eBay SQL question within DataLemur's interactive SQL code editor:
The business case here is simulating the eBay listing and bidding system. We'd like to be able to keep track of users, the products they list for sale, and the bids that other users place on these products. Design the tables to model this system and write a SQL query to fetch the highest current bid for each active listing.
Let's start by identifying the tables we'll need:
Here are your example tables:
user_id | username |
---|---|
1 | user1 |
2 | user2 |
3 | user3 |
listing_id | user_id | product_name | active |
---|---|---|---|
1 | 1 | Prod1 | TRUE |
2 | 1 | Prod2 | TRUE |
3 | 2 | Prod3 | FALSE |
4 | 3 | Prod4 | TRUE |
bid_id | listing_id | user_id | amount |
---|---|---|---|
1 | 1 | 2 | 200 |
2 | 1 | 3 | 250 |
3 | 2 | 3 | 50 |
4 | 3 | 1 | 400 |
5 | 4 | 2 | 300 |
Here is the SQL query to fetch the highest bid for each active listing:
This query joins the and table on the and filters out the listings that are not active. The function is used to find the highest bid for listings and they are grouped by the in order to display the highest bid per listing. The result set is ordered in descending order of the .
A self-join is a type of join in which a table is joined to itself. To perform a self-join, you need to specify the table name twice in the FROM clause, and give each instance of the table a different alias. You can then join the two instances of the table using a JOIN clause, and use a WHERE clause to specify the relationship between the rows.
Self-joins are the go-to technique for any data analysis that involves pairs of the same thing, like identifying pairs of products that are frequently purchased together like in this Walmart SQL interview question.
For another example, say you were doing an HR analytics project and needed to analyze how much all eBay employees in the same department interact with each other. Here's a self-join query you could use to retrieve all pairs of eBay employees who work in the same department:
This query returns all pairs of eBay employees who work in the same department, and excludes pairs where the employee's id is the same (since this would represent the same eBay employee being paired with themselves).
In eBay's customer database table 'activities', the 'activity_type' column distinguishes between different types of customer activities such as "VIEW", "BID", and "BUY". The goal of this question is to filter out the customers who have viewed a product after bidding on it, but have not yet purchased it. You need to write a query that returns the user_id, product_id and the timestamp of their last "VIEW" activity. Assume that the 'timestamp' column is of the type 'timestamp', and it records the exact time when the customer made the activity.
activity_id | user_id | timestamp | product_id | activity_type |
---|---|---|---|---|
6171 | 123 | 2022-03-18 10:00:00 | 1001 | "VIEW" |
7802 | 123 | 2022-03-18 09:00:00 | 1001 | "BID" |
5293 | 362 | 2022-03-19 14:00:00 | 1002 | "BUY" |
6352 | 123 | 2022-03-19 11:00:00 | 1001 | "VIEW" |
4517 | 123 | 2022-03-17 00:00:00 | 1001 | "BID" |
user_id | product_id | last_view |
---|---|---|
123 | 1001 | 2022-03-19 11:00:00 |
The SQL query for this problem would look like this:
This query first creates a subquery table consisting of "VIEW" activities, but filters out any activity from the customers who have also made a "BID" and have not made a "BUY" activity. Then it groups this table by customer and product, and selects the latest "VIEW" timestamp for each group.
To solve a related problem on DataLemur's free online SQL coding environment, try this Facebook SQL Interview question:
A is a field in a table that references the of another table. It creates a link between the two tables and ensures that the data in the field is valid.
Say for example you had sales analytics data from eBay's CRM (customer-relationship management) tool.
In this example, the table has a foreign key field called that references the "account_id" field in the table (the primary key). This helps to link the data about each opportunity to the corresponding account information in the accounts table.
This makes sure the insertion of rows in the table that do not have corresponding entries in the table. It also helps to enforce the relationship between the two tables and ensures that data is not deleted from the table if there are still references to it in the table.
Given the tables and , write a SQL query to calculate the average bidding price for each category of items on eBay. The table stores all the bids made for an item while the table contains details about the item including its category.
bid_id | item_id | user_id | bid_price | bid_time |
---|---|---|---|---|
1 | 101 | 123 | 20.00 | 2022-10-01 |
2 | 101 | 456 | 25.00 | 2022-10-05 |
3 | 102 | 123 | 15.00 | 2022-10-01 |
4 | 103 | 789 | 30.00 | 2022-10-03 |
5 | 104 | 789 | 40.00 | 2022-10-04 |
item_id | category | description | start_price | start_time | end_time |
---|---|---|---|---|---|
101 | Books | "Harry Potter Book" | 15.00 | 2022-09-30 | 2022-10-07 |
102 | Toys | "Lego Set" | 10.00 | 2022-10-01 | 2022-10-08 |
103 | Electronics | "Bluetooth Headphones" | 25.00 | 2022-10-02 | 2022-10-09 |
104 | Electronics | "Smart TV" | 35.00 | 2022-10-03 | 2022-10-10 |
This SQL query first joins the table and the table on . Then, it groups the resulting table by the column of the table. The function is used to calculate the average for each category. The result of this query would be a table that lists each category with its average bidding price.
To practice another question about calculating rates, try this TikTok SQL Interview Question within DataLemur's interactive SQL code editor:
Here's an example of a clustered index on the column of a table of eBay payments table:
This creates a clustered index on the column, which will determine the physical order of the data rows in the table.
Here is an example of a non-clustered index on the column of the same table:
This will create a non-clustered index on the column, which will not affect the physical order of the data rows in the table.
In terms of query performance, a clustered index is usually faster for searches that return a large number of records, while a non-clustered index is faster for searches that return a small number of records. However, updates to a clustered index are slower, as they require the data rows to be physically rearranged, while updates to a non-clustered index are faster, as they only require the index data structure to be updated.
You're a data analyst at eBay and are tasked to calculate the click-through-conversion rate for the previous month. Click-through conversion rate is defined as the number of users that viewed a product and then added that product to their cart, divided by the total number of users that viewed the product.
To start, let's assume we have two tables: and , structured as follows.
user_id | product_id | view_date |
---|---|---|
101 | 1101 | 06/18/2022 00:00:00 |
102 | 1102 | 06/18/2022 00:00:00 |
103 | 1101 | 06/19/2022 00:00:00 |
101 | 1101 | 06/19/2022 00:00:00 |
104 | 1103 | 07/22/2022 00:00:00 |
user_id | product_id | add_date |
---|---|---|
101 | 1101 | 06/18/2022 00:00:00 |
103 | 1101 | 06/19/2022 00:00:00 |
101 | 1101 | 06/19/2022 00:00:00 |
104 | 1103 | 07/23/2022 00:00:00 |
Since we need to calculate the click-through-conversion rate on a per product basis for the previous month, we can make use of sub-queries and the clause to get desired result.
Here is a PostgreSQL statement:
The rate is calculated as the number of distinct users who added the product to the cart () after a view, divided by the total number of users that viewed the product (). We use to also include the products that were viewed but not added to the cart.
As a part of the eBay data team, you are often asked to retrieve specific information from seller records. You are required to retrieve all the sellers from the city of 'San Francisco' and their sales records whose name starts with the letter 'A' or 'B'. For the purpose of this exercise, we are only considering the sellers table which has the columns: seller_id, seller_name, city and the sales table which consists of a record of the sales done by these sellers. The sales table has the columns: sale_id, seller_id, sale_date and amount.
seller_id | seller_name | city |
---|---|---|
2001 | Amy | San Francisco |
4566 | Benny | San Francisco |
3167 | Charlie | Los Angeles |
9248 | Andy | Los Angeles |
8243 | Betty | San Francisco |
sale_id | seller_id | sale_date | amount |
---|---|---|---|
1022 | 2001 | 02/01/2022 | 120 |
3485 | 4566 | 02/01/2022 | 115 |
4761 | 3167 | 02/01/2022 | 180 |
9271 | 9248 | 02/01/2022 | 130 |
8251 | 8243 | 02/01/2022 | 110 |
Below is a PostgreSQL query to solve the problem statement:
This query first joins the sellers and sales tables together to get the combined record of sales by each seller. The WHERE clause then filters out the records where the city is 'San Francisco' and the seller's name begins with either 'A' or 'B'.
One way to find duplicates is to use a clause and then use to find groups
You could also use the operator:
For this question, let's consider that eBay wants to find the total number of sales and the average sale value for each product. You will have to use both the and tables for this analysis.
Here are the and tables:
sale_id | product_id | user_id | sale_date | sale_value |
---|---|---|---|---|
1001 | 781 | 101 | 2022-01-06 | 120.0 |
1002 | 910 | 202 | 2022-02-26 | 80.0 |
1003 | 543 | 303 | 2022-03-08 | 250.0 |
1004 | 781 | 101 | 2022-04-16 | 140.0 |
1005 | 910 | 303 | 2022-05-28 | 95.0 |
product_id | product_name | supplier_id |
---|---|---|
781 | Laptop | 5401 |
910 | Mobile Phone | 5402 |
543 | Television | 5403 |
Only consider sales for which sale_value is non-null.
Combine these two tables to provide your analysis.
Here is the PostgreSQL SQL query that can be used to solve this problem:
This SQL query first joins the and tables together based on their common field . The WHERE clause then excludes sales that have a null . Finally, it uses the GROUP BY statement with the aggregating functions COUNT and AVG to find the total sales and average sale value per product.
The result of this query would be a table where each row represents a product from the table, and the columns show the total number of sales and average sale value per product.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Besides solving the earlier eBay SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Facebook, Google and unicorn tech startups.
Each SQL question has multiple hints, full answers and most importantly, there's an interactive SQL code editor so you can instantly run your SQL query and have it checked.
To prep for the eBay SQL interview you can also be useful to solve interview questions from other tech companies like:
In case your SQL query skills are weak, don't worry about jumping right into solving questions – strengthen your SQL foundations with this free SQL for Data Analytics course.
This tutorial covers things like RANK vs. DENSE RANK and CASE/WHEN statements – both of these pop up often in eBay interviews.
For the eBay Data Science Interview, besides SQL questions, the other types of questions to prepare for are:
To prepare for eBay Data Science interviews read the book Ace the Data Science Interview because it's got: