At WiseTech Global, SQL is used quite frequently for analyzing logistics data to optimize transport networks, and for managing relational databases that power their logistics data solutions. Unsurprisingly this is why WiseTech often tests SQL problems in interviews for Data Science, Data Engineering and Data Analytics jobs.
As such, to help you ace the WiseTech SQL interview, here’s 10 WiseTech Global SQL interview questions – how many can you solve?
WiseTech is a fictional company that provides a technology service to customers. Their business success is highly influenced by their most active users, commonly known as power users. For the purpose of this interview question, a power user is defined as a customer who logs in multiple times a week, actively engages with services (i.e., performed more than 5 service interactions in the past 30 days), and has been a customer for more than 6 months.
Your task is to write a SQL query that identifies these power users from a table and a table. The table contains information about each customer (including when they became a customer), and the table records each interaction a customer has had with WiseTech services, including when the interaction happened.
Assume the two tables have the following structure:
customer_id | sign_up_date |
---|---|
1 | 01/22/2021 00:00:00 |
2 | 02/22/2022 00:00:00 |
3 | 05/01/2021 00:00:00 |
4 | 03/16/2022 00:00:00 |
5 | 10/05/2021 00:00:00 |
interaction_id | customer_id | interaction_date |
---|---|---|
5001 | 1 | 08/01/2022 00:00:00 |
5002 | 2 | 08/02/2022 00:00:00 |
5003 | 1 | 08/03/2022 00:00:00 |
5004 | 3 | 08/04/2022 00:00:00 |
5005 | 5 | 08/05/2022 00:00:00 |
5006 | 1 | 08/04/2022 00:00:00 |
5007 | 3 | 08/05/2022 00:00:00 |
5008 | 1 | 08/06/2022 00:00:00 |
5009 | 5 | 08/07/2022 00:00:00 |
5010 | 1 | 08/08/2022 00:00:00 |
This PostgreSQL query first creates a subquery that counts the total interactions within the last 30 days for each customer from the table; only customers with more than 5 interactions are included. The subquery's results are then joined with the table on . Finally, the query filters out those who have been customers for less than 6 months. The output would be a list of customer_id's who are considered power users for WiseTech.
To practice a super-customer analysis question on DataLemur's free interactive coding environment, try this Microsoft Azure Cloud SQL Interview Question:
You are given a table that contains data about the users' interactions with WiseTech's software. Each row represents an event when the user opens the software. You are being asked write a SQL query to find out the average numbers of daily active users for each month.
event_id | user_id | event_date |
---|---|---|
001 | 789 | 2022-01-01 |
002 | 456 | 2022-01-02 |
003 | 123 | 2022-01-02 |
004 | 789 | 2022-02-01 |
005 | 456 | 2022-02-01 |
006 | 123 | 2022-02-02 |
007 | 789 | 2022-03-01 |
008 | 456 | 2022-03-01 |
009 | 123 | 2022-03-02 |
mth | avg_daily_active_users |
---|---|
1 | 1.50 |
2 | 1.50 |
3 | 1.50 |
This query first calculates the number of distinct active users for each day, then it groups the data by month to find out the average number of daily active users for each month.
Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur
The CHECK constraint is used to set a rule for the data in a column. If a row is inserted or updated and the data in the column does not follow the rule specified by the CHECK constraint, the operation will be unsuccessful.
For example, say you had WiseTech customer data. You could use a CHECK constraint to ensure that the email column contains only properly formatted email addresses, or that the age column contains only positive integer. Here's an example of that:
As a data analyst at WiseTech, a logistics software company, you are tasked with finding the average freight cost per shipment for each shipping route. For simplification, a shipping route is defined as a combination of departure location and destination location.
shipment_id | departure_location | destination_location | freight_cost |
---|---|---|---|
1 | Australia | China | 1200 |
2 | USA | China | 800 |
3 | Australia | China | 1400 |
4 | USA | Australia | 3000 |
5 | Australia | China | 1000 |
departure_location | destination_location | average_freight_cost |
---|---|---|
Australia | China | 1200 |
USA | China | 800 |
USA | Australia | 3000 |
The query starts by selecting the columns departure_location and destination_location that are going to be used as the criteria for the grouping. The AVG function is applied to the freight_cost column to calculate the average freight cost per each shipping route defined by the combination of departure and destination locations. The query results are grouped using the GROUP BY clause on the columns departure_location and destination_location. This ensures that the average freight cost is calculated for each unique shipping route.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for analyzing and grouping data by categories, or this Wayfair Y-on-Y Growth Rate Question which is similar for utilization of aggregate calculations.
A join in SQL combines rows from two or more tables based on a shared column or set of columns. To demonstrate the difference between a and , say you had a table of WiseTech orders and WiseTech customers.
LEFT JOIN: A retrieves all rows from the left table (in this case, the Orders table) and any matching rows from the right table (the Customers table). If there is no match in the right table, NULL values will be returned for the right table's columns.
RIGHT JOIN: A retrieves all rows from the right table (in this case, the Customers table) and any matching rows from the left table (the Orders table). If there is no match in the left table, NULL values will be returned for the left table's columns.
Your task is to compute the click-through rate of product pages on the WiseTech platform. The click-through rate is defined as the total number of product views that led to the product being added to the cart divided by the total number of product views.
We have two tables: and .
view_id | user_id | view_date | product_id |
---|---|---|---|
5671 | 124 | 06/08/2022 00:00:00 | 50002 |
7902 | 266 | 06/10/2022 00:00:00 | 69855 |
5493 | 363 | 06/18/2022 00:00:00 | 50003 |
6452 | 193 | 07/26/2022 00:00:00 | 69854 |
4627 | 982 | 07/05/2022 00:00:00 | 69856 |
cart_id | user_id | add_date | product_id |
---|---|---|---|
5671 | 124 | 06/08/2022 00:00:00 | 50002 |
8002 | 268 | 06/12/2022 00:00:00 | 69855 |
5393 | 384 | 06/15/2022 00:00:00 | 50003 |
6352 | 183 | 07/30/2022 00:00:00 | 69854 |
This query joins the table with the on the view_id, user_id, and product_id. We then group by product_id and calculate the clickthrough rate as the count of where the product_id matches, divided by the count of total views. The cast to float is important to ensure we don't do integer division, which would round down to 0 for all click-through rates less than 1.
product_id | click_through_rate |
---|---|
50002 | 1.00 |
69855 | 0.50 |
50003 | 0.33 |
69854 | 0.50 |
69856 | 0.00 |
For example, product 50002 shows a click-through-rate of 1.00, meaning every view of this product led to the product being added to the cart. Conversely, product 69856 shows a click-through-rate of 0.00, meaning there weren't any views of this product that led to it being added to the cart.
To practice another question about calculating rates, solve this TikTok SQL question on DataLemur's online SQL coding environment:
To normalize a database, tables are divided into smaller, more specialized ones and relationships between them are defined via primary and foreign keys. This minimizes redundancy, making the database more flexible, scalable, and easier to maintain. Normalization also helps to ensure the accuracy of the data by reducing the likelihood of inconsistencies and errors.
As a data analyst at WiseTech, a global developer of cloud-based software solutions for the international and domestic logistics industries, you're asked to analyze the company's dataset of shipment records. The records contain shipment weights and carrier information. Write a SQL query to find the maximum and minimum shipment weight for each carrier from the given dataset.
shipment_id | carrier | shipment_weight_kg | shipment_date |
---|---|---|---|
1 | DHL | 500 | 2022-05-01 |
2 | UPS | 750 | 2022-05-02 |
3 | FedEx | 350 | 2022-06-20 |
4 | DHL | 2000 | 2022-07-15 |
5 | UPS | 1500 | 2022-07-20 |
6 | DHL | 1000 | 2022-08-11 |
carrier | max_weight | min_weight |
---|---|---|
DHL | 2000 | 500 |
UPS | 1500 | 750 |
FedEx | 350 | 350 |
This SQL query uses the GROUP BY clause to group the shipment records by the carrier. It then uses the MAX() and MIN() aggregate functions to find the maximum and minimum shipment weights for each carrier. The result is a table that gives the maximum and minimum shipment weights for each distinct carrier in the shipments table.
You are given two tables - and . The table contains a unique identifier for each customer, their name, and city. The table tracks each product bought by a customer, with their customer ID, the product ID, the date of purchase, and the total purchase amount.
Your task is to write a SQL query that will join these two tables together in order to find out the total purchase amount for each customer.
customer_id | name | city |
---|---|---|
1 | Alice | New York |
2 | Bob | Los Angeles |
3 | Charlie | Chicago |
4 | Donna | San Francisco |
purchase_id | customer_id | product_id | purchase_date | purchase_amount |
---|---|---|---|---|
1001 | 2 | 3001 | 2022-01-15 | 200.00 |
1002 | 1 | 2002 | 2022-03-20 | 150.00 |
1003 | 3 | 4003 | 2022-02-25 | 100.00 |
1004 | 4 | 5004 | 2022-06-10 | 250.00 |
1005 | 1 | 2002 | 2022-04-15 | 150.00 |
Here's a sample PostgreSQL query that achieves the required task:
customer_id | name | total_purchase_amount |
---|---|---|
1 | Alice | 300.00 |
2 | Bob | 200.00 |
3 | Charlie | 100.00 |
4 | Donna | 250.00 |
With the JOIN statement, we are able to link the Purchases table to the Customers table using their common field (customer_id). The GROUP BY statement groups the results by customer, and the SUM function adds up the purchase_amount for each customer.
Since joins come up so often during SQL interviews, try this interactive Snapchat JOIN SQL interview question:
One way to find duplicate records in a table is by using , and then seeing which groups have more than one occurence:
Another way is by using the operator:
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the WiseTech SQL interview is to solve as many practice SQL interview questions as you can! In addition to solving the above WiseTech SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Facebook, Google, and VC-backed startups.
Each DataLemur SQL question has multiple hints, full answers and best of all, there's an online SQL code editor so you can easily right in the browser your SQL query answer and have it checked.
To prep for the WiseTech SQL interview it is also helpful to practice SQL problems from other tech companies like:
However, if your SQL foundations are weak, don't worry about diving straight into solving questions – improve your SQL foundations with this free SQL tutorial.
This tutorial covers SQL concepts such as Union vs. UNION ALL and LEAD/LAG – both of which come up routinely in SQL job interviews at WiseTech.
In addition to SQL interview questions, the other question categories to prepare for the WiseTech Data Science Interview are:
The best way to prepare for WiseTech Data Science interviews is by reading Ace the Data Science Interview. The book's got: