At Softcat, SQL does the heavy lifting for analyzing customer data to optimize tech product offerings and managing databases to ensure efficient data storage and retrieval. That's why Softcat frequently asks SQL questions in interviews for Data Science and Data Engineering positions.
So, to help you practice for the Softcat SQL interview, this blog covers 11 Softcat SQL interview questions – can you solve them?
Softcat is a company providing software solutions to different organizations. For their business, clients who consistently purchase large volumes of software licenses are very important. They call them 'VIP' users. Can you write a SQL query to identify clients who have made more than 5 transactions in each month for the past 6 months?
Consider the following two tables:
client_id | client_name |
---|---|
101 | Abc Ltd. |
102 | Xyz Corp. |
103 | Pqr Inc. |
transaction_id | client_id | transaction_date | license_count |
---|---|---|---|
2001 | 101 | 07/20/2022 00:00:00 | 50 |
2002 | 101 | 07/24/2022 00:00:00 | 70 |
2003 | 102 | 08/01/2022 00:00:00 | 100 |
2004 | 101 | 08/05/2022 00:00:00 | 80 |
2005 | 101 | 08/08/2022 00:00:00 | 60 |
2006 | 103 | 08/13/2022 00:00:00 | 90 |
2007 | 103 | 08/20/2022 00:00:00 | 70 |
2008 | 102 | 08/28/2022 00:00:00 | 120 |
2009 | 101 | 08/29/2022 00:00:00 | 55 |
This PostgreSQL query uses subquery to count the transactions made by each 'client_id' monthly and yearly. It then checks for 'client_id' who made more than 5 transactions in each of the last 6 months. The inner join ensures we are getting the client_name from the 'client' table. The answer will be the list of 'VIP' clients who meet the criteria provided.
To practice a similar VIP customer analysis question on DataLemur's free interactive coding environment, try this Microsoft Azure Cloud SQL Interview Question:
Given the sales details for Softcat products, write a SQL query to calculate the monthly sales total and the change in sales compared to the previous month for each product. Assume that Softcat is a technology product distributor company that sells various products to its customers.
sales_id | sales_date | product_id | sales_amount |
---|---|---|---|
1001 | 01/05/2021 | 3291 | 500 |
1002 | 01/05/2021 | 3291 | 600 |
1003 | 01/06/2021 | 3291 | 450 |
1004 | 01/06/2021 | 9221 | 650 |
1005 | 02/05/2021 | 9221 | 750 |
1006 | 02/05/2021 | 9221 | 850 |
1007 | 02/06/2021 | 3291 | 900 |
1008 | 02/06/2021 | 9221 | 700 |
month | product_id | sales_total | sales_difference |
---|---|---|---|
01/2021 | 3291 | 1100 | null |
01/2021 | 9221 | 650 | null |
02/2021 | 3291 | 900 | -200 |
02/2021 | 9221 | 1550 | 900 |
In this query, we first aggregate the sales amounts by month and product_id in a CTE (). After that, we calculate the difference in sales for each product from the previous month using the window function . The output is then sorted by month and product_id.
To solve a similar window function interview problem which uses RANK() on DataLemur's free online SQL code editor, try this Amazon BI Engineer interview question:
To discover records in one table that are not present in another, you can utilize a and filter out any values in the right-side table.
For example, say you had a table of Softcat customers and a 2nd table of all purchases made with Softcat. To find all customers who did not make a purchase, you'd use the following
This query returns all rows from the customers table, along with any matching rows from the purchases table. If there is no matching row in the purchases table, values will be returned for all of the right table's columns. The clause then filters out any rows where the purchases.id column is , leaving only customers who have not made a purchase.
You are a Data Analyst at Softcat, a leading provider of IT infrastructure to corporate and public sector organizations. Softcat has several product categories and sells numerous products within each category to different clients. You have two main tables - one records all the sales transactions () and another one records the details of products ().
Your task is to design a SQL query that shows the total number of each product sold by category in the last year (2022).
sale_id | product_id | client_id | sale_date | quantity |
---|---|---|---|---|
1111 | 21 | 3345 | 01/12/2022 | 5 |
1112 | 23 | 3446 | 02/26/2022 | 2 |
1113 | 34 | 3345 | 02/15/2022 | 3 |
1114 | 45 | 3547 | 07/08/2022 | 7 |
1115 | 56 | 3648 | 10/17/2022 | 1 |
product_id | product_name | category_id | unit_price |
---|---|---|---|
21 | Product A | 1 | 100 |
23 | Product B | 2 | 200 |
34 | Product C | 1 | 150 |
45 | Product D | 3 | 250 |
56 | Product E | 4 | 300 |
The provided PostgreSQL query joins the sales table with the product table on the product_id column. We filter the records of sales that happened in 2022 specifically. Then it aggregates the total quantity of products sold by each category. The order by clause is used to list the categories by the total quantity of products sold in descending order.
Unique indexes help ensure that there are no duplicate key values in a table, maintaining data integrity. They enforce uniqueness whenever keys are added or changed within the index.
Non-unique indexes on the other hand, are used to improve query performance by maintaining a sorted order of frequently used data values, but they do not enforce constraints on the associated table.
Unique Indexes are blazing fast. Non unique indexes can improve query performance, but they are often slower because of their non-unique nature.
As an analyst at Softcat, one of your tasks includes examining customer data. You received a task to identify clients from the software sector that have a contract value of over £50,000 and whose contracts expire in 2023.
CustomerId | Sector | ContractDate | ContractValue (£) |
---|---|---|---|
C123 | Software | 2022-05-18 | 70000 |
C456 | Education | 2022-10-21 | 60000 |
C789 | Software | 2022-07-12 | 50000 |
C101 | Software | 2021-03-11 | 55000 |
C112 | Retail | 2023-12-31 | 52000 |
CustomerId | Sector | ContractDate | ContractValue (£) |
---|---|---|---|
C123 | Software | 2022-05-18 | 70000 |
C101 | Software | 2021-03-11 | 55000 |
This SQL query first filters to include only customers from the software sector with WHERE. It then filters to include only those where the ContractValue is over £50,000 with AND. Finally, it retrieves those records whose ContractDate is in 2023 using EXTRACT function which separates the year from the date field. The conditions combined with AND mean that all conditions must be met, limiting the result set to only those rows where all conditions are true.
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1NF, 2NF, 3NF, etc.).
This is typically done to improve the performance of the database when it is being used for online analytics processing (OLAP), rather than online transaction processing (OLTP) use cases.
Denormalization can improve query performance by reducing the number of expensive joins required to retrieve data, but it comes with it's own drawbacks too. By adding redundant columns, you incur more data storage costs, and there's need for more complex update and delete operations in order to maintain data integrity across all the duplicated data. Thus, it's important to carefully consider the trade-offs involved before implementing denormalization.
At Softcat, a technology solutions business, you're tasked with analyzing product sales data for the previous year. The goal is to understand which product categories contributed most to the company's revenue.
Given two tables - for sales order details and for product information, write a SQL query to find the total revenue generated by each product category in the last year.
The Table:
sale_id | product_id | sale_date | unit_price | quantity |
---|---|---|---|---|
1 | 101 | 2021-12-01 00:00 | 1000 | 3 |
2 | 102 | 2021-12-01 00:00 | 150 | 5 |
3 | 103 | 2021-12-02 00:00 | 300 | 2 |
4 | 101 | 2021-12-03 00:00 | 1000 | 6 |
5 | 104 | 2021-12-04 00:00 | 100 | 10 |
The Table:
product_id | product_name | product_category |
---|---|---|
101 | Surface Pro 7 | Computing |
102 | HP ENVY Laptop | Computing |
103 | Samsung Galaxy S21 | Telecommunications |
104 | iPhone 12 | Telecommunications |
In the above SQL block, we first join the two tables on the product ID. Next, we filter out sales records that fall within the last year. We then group the result by product category and calculate the total revenue using the SUM aggregate function. This query will return the total revenue for each product category for the last year.
As a data analyst in Softcat, you're given access to the company's customer records database. Your task is to find out all the customers whose names begin with "John". This is important for the Sales team who is planning to run a specific promotion targeted at this demographic.
customer_id | first_name | last_name | email_id | city | state |
---|---|---|---|---|---|
101 | John | Doe | john.doe@example.com | Los Angeles | CA |
102 | Johnny | Depp | johnny.depp@example.com | Paris | TX |
103 | Johnathan | Smith | johnathan.smith@example.com | New York | NY |
104 | Jennifer | Parker | jennifer.parker@example.com | Chicago | IL |
105 | Jack | Black | jack.black@example.com | Houston | TX |
When this SQL command is executed, it will return a table listing all customers in the database whose first names begin with "John". The key is to use the SQL command and the wildcard character . In this context, will match any string that starts with "John". The is a wildcard character that matches any sequence of characters (including an empty sequence), so 'John%' matches any string that starts with 'John'.
The three levels of database normalization (also known as normal forms) are:
First Normal Form (1NF):
Second Normal Form (2NF)
Said another way, to achieve 2NF, besides following the 1NF criteria, the columns should also be dependent only on that table's primary key.
Third normal form (3NF) if it meets the following criteria:
A transitive dependency means values in one column are derived from data in another column. For example, it wouldn't make sense to keep a column called ""vehicle's age"" and ""vehicle's manufacture date" (because the vehicle's age can be derived from the date it was manufactured).
Although there is a 4NF and 5NF, you probably don't need to know it for the Softcat interview.
As a data analyst for Softcat, you are tasked with evaluating customer purchasing behavior. The company has two key tables: and which hold data about the registered customers and their purchasing history respectively. Utilizing the data stored in these tables, generate a query that returns each customer's information along with his/her most costly purchase. If a customer has made multiple purchases with the same maximum cost, display only one of those purchases randomly.
customer_id | first_name | last_name | registration_date | |
---|---|---|---|---|
101 | John | Doe | johndoe@mail.com | 01/01/2022 |
202 | Jane | Smith | janesmith@mail.com | 02/01/2022 |
303 | Sam | Chen | samchen@mail.com | 03/01/2022 |
404 | Emily | Garcia | emilygarcia@mail.com | 04/01/2022 |
purchase_id | customer_id | purchase_date | product | price |
---|---|---|---|---|
1001 | 101 | 05/01/2022 | Laptop Dell | 900 |
2002 | 202 | 05/05/2022 | MS Office 365 | 150 |
3003 | 101 | 05/10/2022 | Laptop Lenovo | 1200 |
4004 | 303 | 05/15/2022 | Adobe Photoshop | 300 |
5005 | 101 | 05/20/2022 | Mouse Logitech | 50 |
6006 | 404 | 05/25/2022 | Laptop HP | 750 |
This query starts by isolating the maximum purchase price for each customer in a common table expression (CTE) named . This CTE is then used to join the and tables, thus giving us each customer's most expensive purchase. If there are records where a customer has the same maximum price for more than one product, the LIMIT clause is used to pick only one row randomly.
Because joins come up frequently during SQL interviews, practice this Snapchat Join SQL question:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the above Softcat SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Microsoft, Google, Amazon, and tech startups.
Each problem on DataLemur has hints to guide you, step-by-step solutions and crucially, there's an online SQL code editor so you can instantly run your SQL query and have it graded.
To prep for the Softcat SQL interview you can also be a great idea to solve interview questions from other tech companies like:
But if your SQL query skills are weak, forget about going right into solving questions – strengthen your SQL foundations with this DataLemur SQL tutorial.
This tutorial covers SQL topics like CASE/WHEN/ELSE statements and INTERCEPT/EXCEPT – both of which pop up frequently during Softcat SQL assessments.
In addition to SQL query questions, the other types of questions to prepare for the Softcat Data Science Interview are:
To prepare for Softcat Data Science interviews read the book Ace the Data Science Interview because it's got: