At Topicus, SQL is crucial for analyzing vast healthcare datasets for insights and managing financial data workflows to improve customer service efficiency. Unsurprisingly this is why Topicus LOVES to ask SQL query questions during interviews for Data Science and Data Engineering positions.
As such, to help you ace the Topicus SQL interview, we've collected 10 Topicus SQL interview questions – can you solve them?
Topicus is a large business solutions company that deals with a variety of customers, from individual users to larger corporations. They would like to identify their "power users", or the users who frequently use their system and generate the most business activity. Specifically, Topicus management is interested in finding users who have used their system for more than 20 hours in the past month.
Let's assume there is a table and a table.
The table includes: , ,
The table includes: , , ,
user_id | name | |
---|---|---|
123 | John Doe | johndoe@example.com |
256 | Samantha Smith | ssmith@example.com |
789 | Robert Johnson | rjohnson@example.com |
log_id | user_id | session_start | session_end |
---|---|---|---|
001 | 123 | 2022-09-01 10:00:00 | 2022-09-01 13:00:00 |
002 | 256 | 2022-09-01 09:00:00 | 2022-09-01 17:00:00 |
003 | 789 | 2022-09-01 08:00:00 | 2022-09-01 16:00:00 |
004 | 123 | 2022-09-02 10:00:00 | 2022-09-02 12:00:00 |
An example of such a PostgreSQL query could be:
This query joins the table with table on the column to associate each log item with the corresponding user. It then restricts the logs to those entries where the session started within the past month.
The duration of each session is calculated in hours with the expression. This value is summed for each user and returned only where the total accumulated time exceeds 20 hours, thus identifying "power users".
To solve a related customer analytics question on DataLemur's free online SQL code editor, try this Microsoft Azure Cloud SQL Interview Question:
Given a database table holding customer reviews for different products, with each review having its , , , and a star rating (), write a SQL query to calculate the monthly average star rating for each product. The months should be represented numerically (1 for January, 2 for February and so on). Assume that the uses the format 'MM/DD/YYYY HH:MI:SS'.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 01/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 01/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 02/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 02/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 02/05/2022 00:00:00 | 69852 | 2 |
This SQL query uses the aggregate function to compute the average star rating of each product for each month. The function is used to pull out the month number from each . The statement groups the data by month and , and the statement sorts the output first by and then by .
p.s. Window functions show up pretty often during SQL interviews, so practice the 27+ window function questions on DataLemur
A cross join is a type of JOIN operation in SQL that creates a new table by combining each row from the first table with every row from the second table. It is also known as a cartesian join.
For example, say you worked on the Marketing Analytics team at Topicus, and needed to understand what advertising copy (text) goes best with what advertising creative (the photo that gets posted with the advertisement copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
A natural join, on the other hand, is a type of JOIN that combines rows from two or more tables based on their common columns. It is called a "natural" join because it is based on the natural relationship that exists between the common columns in the tables being joined.
For an example of each one, say you had sales data exported from Topicus's Salesforce CRM stored in a datawarehouse which had two tables: and .
An (which is a type of natural join) combines the two tables on the common
This query will return rows from the sales and tables that have matching customer id values. Only rows with matching values will be included in the results.
One main difference between cross joins and natural joins is that cross joins do not require any common columns between the tables being joined, while natural joins do. Another difference is that cross joins can create very large tables if the input tables have a large number of rows, while natural joins will only return a table with the number of rows equal to the number of matching rows in the input tables.
Topicus is a tech company that manages different projects. In these projects, they have been noticing a pattern of delayed deliveries and they want to get some insights into it. Specifically, they are interested in finding the average number of days it takes for a project to be delivered post its due date. They want this information for each department in each month of the year.
The following tables show the projects and their respective departments. Each project has a start date, a due date, and an actual end date. Note: If a project is completed before the due date, the number of days past due will be negative.
This query joins the 'projects' and 'departments' tables by 'project_id'. It then groups the projects by the month they were delivered and their respective departments. In each grouping, it calculates the average number of days past the due date for project completion. This calculation is done by subtracting the due_date from the end_date and taking an average over each group. Please note that we've assumed that all dates are in the 'YYYY-MM-DD' format.
A DBMS (database management system), in order to ensure transactions are relaible and correct, tries to mantain the following ACID properties: Atomicity, Consistency, Isolation, and Durability
Here is what each of the ACID properties stands for:
Atomicity: ensures that a transaction is either completed in its entirety, or not completed at all. If a transaction fails halfway, the database does a rollback on the commit.
Consistency: ensures that a transaction will only be completed if adheres to the constraints defined in the database of the DB.
Isolation: ensures that concurrent transactions are isolated from each one another, so that the changes made by one transaction cannot be seen by a 2nd transaction until the 1st transaction is done.
**Durability: ** ensures that once a transaction has been committed, the database permanently stores the results in the DB.
As you can see, it's pretty important for Topicus's data systems to be ACID compliant, else they'll be a big problem for their customers!
Topicus operates as a software company and maintains a database of its customers. The database contains details about the customers, their account activation date, and their subscription level.
Write a SQL query to filter the customers who have an 'active' status, their account activation date is not more than 3 years ago, and they are subscribed to the 'Premium' or 'Gold' plan.
customer_id | full_name | account_status | activation_date | subscription_level |
---|---|---|---|---|
A101 | John Doe | active | 2019-08-16 | Premium |
A102 | Jane Smith | inactive | 2021-06-10 | Basic |
A103 | Jim Brown | active | 2020-07-17 | Gold |
A104 | Janet Johnson | active | 2015-11-01 | Gold |
A105 | Joe Davis | active | 2021-05-22 | Premium |
customer_id | full_name | account_status | activation_date | subscription_level |
---|---|---|---|---|
A101 | John Doe | active | 2019-08-16 | Premium |
A103 | Jim Brown | active | 2020-07-17 | Gold |
A105 | Joe Davis | active | 2021-05-22 | Premium |
This query works by using the WHERE clause to filter the customers based on the conditions mentioned in the question. The AND operator is used to combine the three monitoring conditions, which are:
As a result, we get a list of customers who meet all these conditions.
For a tangible example of in PostgreSQL, suppose you were doing an HR Analytics project for Topicus, and had access to Topicus's contractors and employees data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables.
You could use operator to find all contractors who never were a employee using this query:
Note that is available in PostgreSQL and SQL Server, while is the equivalent operator which is available in MySQL and Oracle (but don't worry about knowing which RDBMS supports which exact commands since Topicus interviewers aren't trying to trip you up on memorizing SQL syntax).
As part of Topicus, a leading financial services company, you are asked to analyse the transaction data of customers. Write a SQL query to find out the average daily transactions executed by each customer over the course of a month.
transaction_id | customer_id | transaction_date | transaction_amount |
---|---|---|---|
1 | 100 | 2021-10-01 | 500.00 |
2 | 101 | 2021-10-01 | 200.00 |
3 | 100 | 2021-10-02 | 200.00 |
4 | 100 | 2021-10-03 | 100.00 |
5 | 101 | 2021-10-03 | 500.00 |
6 | 100 | 2021-10-03 | 300.00 |
7 | 100 | 2021-10-04 | 200.00 |
8 | 101 | 2021-10-04 | 300.00 |
customer_id | average_daily_transactions |
---|---|
100 | 300.00 |
101 | 333.33 |
The SQL query groups transactions by the customer id and then calculates the average transaction amount. In this scenario, for customer 100, there were 4 transactions, with a total transaction amount of 1200.00 over the month, hence the average transaction is calculated as 300.00 (1200/4). Similarly, for customer 101, the average transaction is calculated as 333.33.
To practice a very similar question try this interactive Uber User's Third Transaction Question which is similar for analysis of transactions.
As a part of financial service provided by Topicus, you need to manage transactions of loans to various customers. Each transaction includes a customer_id, a unique loan_id, amount of the loan and the date on which the loan was sanctioned. The goal is to write an SQL query that will calculate the average loan amount per year.
loan_id | customer_id | loan_date | amount |
---|---|---|---|
1001 | 614 | 01/02/2019 | 50000 |
1020 | 976 | 05/10/2020 | 75000 |
1089 | 704 | 09/15/2019 | 85000 |
1122 | 614 | 03/25/2020 | 70000 |
1175 | 450 | 08/16/2021 | 120000 |
year | avg_loan_amount |
---|---|
2019 | 67500.00 |
2020 | 72500.00 |
2021 | 120000.00 |
The query begins by extracting the year from loan_date using the function. It then groups the rows by this extracted year and calculates the average loan amount for each year using the function. The final query includes an clause to sort the rows in ascending order by year.
Similar to the and / operators, the PostgreSQL INTERSECT operator combines result sets of two or more statements into a single result set. However, only returns the rows that are in BOTH select statements.
For a concrete example, say you were on the Sales Analytics team at Topicus, and had data on sales leads exported from both HubSpot and Salesforce CRMs in two different tables. To write a query to analyze leads created after 2023 started, that show up in both CRMs, you would use the command:
The key to acing a Topicus SQL interview is to practice, practice, and then practice some more! Besides solving the earlier Topicus SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Facebook, Google, and VC-backed startups.
Each DataLemur SQL question has multiple hints, step-by-step solutions and most importantly, there's an online SQL coding environment so you can easily right in the browser your query and have it checked.
To prep for the Topicus SQL interview you can also be helpful to solve interview questions from other tech companies like:
However, if your SQL foundations are weak, don't worry about going right into solving questions – refresh your SQL knowledge with this free SQL tutorial.
This tutorial covers SQL concepts such as aggregate window functions and aggreage functions like MIN()/MAX() – both of these pop up frequently during Topicus SQL assessments.
Besides SQL interview questions, the other question categories to practice for the Topicus Data Science Interview are:
To prepare for Topicus Data Science interviews read the book Ace the Data Science Interview because it's got: