SQL is crucial to IBM's business - it's how database admins and Data Engineers interact with IBM's popular db2 database. That's why IBM uses HackerRank for SQL query questions in interviews for Data Science, Solutions Architect, and & Data Engineering jobs.
So, to help you prepare, we've collected 10 IBM SQL interview questions to practice, which are similar to recently asked HackerRank questions at IBM – can you solve them?
IBM is analyzing how their employees are utilizing the Db2 database by tracking the SQL queries executed by their employees. The objective is to generate data to populate a histogram that shows the number of unique queries run by employees during the third quarter of 2023 (July to September). Additionally, it should count the number of employees who did not run any queries during this period.
Display the number of unique queries as histogram categories, along with the count of employees who executed that number of unique queries.
Column Name | Type | Description |
---|---|---|
employee_id | integer | The ID of the employee who executed the query. |
query_id | integer | The unique identifier for each query (Primary Key). |
query_starttime | datetime | The timestamp when the query started. |
execution_time | integer | The duration of the query execution in seconds. |
Assume that the table below displays all queries made from July 1, 2023 to 31 July, 2023:
employee_id | query_id | query_starttime | execution_time |
---|---|---|---|
226 | 856987 | 07/01/2023 01:04:43 | 2698 |
132 | 286115 | 07/01/2023 03:25:12 | 2705 |
221 | 33683 | 07/01/2023 04:34:38 | 91 |
240 | 17745 | 07/01/2023 14:33:47 | 2093 |
110 | 413477 | 07/02/2023 10:55:14 | 470 |
Assume that the table below displays all employees in the table:
Column Name | Type | Description |
---|---|---|
employee_id | integer | The ID of the employee who executed the query. |
full_name | string | The full name of the employee. |
gender | string | The gender of the employee. |
employee_id | full_name | gender |
---|---|---|
1 | Judas Beardon | Male |
2 | Lainey Franciotti | Female |
3 | Ashbey Strahan | Male |
unique_queries | employee_count |
---|---|
0 | 191 |
1 | 46 |
2 | 12 |
3 | 1 |
To this question about product analytics, solve this IBM SQL question within DataLemur's online SQL coding environment.
IBM sells a variety of technological products, which are regularly reviewed by users on its platform. Determine the average monthly rating for each product, rounded to 2 decimal places.
For this task, consider the following two tables:
product_id | product_name |
---|---|
50001 | "IBM Quantum Computer" |
69852 | "IBM Thinkpad X1" |
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
Write a SQL query to determine the average monthly rating for each product.
This query performs a join operation between the and tables using the field. It then groups the result by month (using the function in PostgreSQL to isolate the month from the ), and product. Finally, it calculates the average product rating (rounded to 2 decimal places with the function) for each product per month.
To practice a similar problem about calculating rates, solve this TikTok SQL question within DataLemur's online SQL coding environment:
Cross joins and natural joins are two types of JOIN operations in SQL that are used to combine data from multiple tables. A cross join creates a new table by combining each row from the first table with every row from the second table, and is also known as a cartesian join. On the other hand, a natural join combines rows from two or more tables based on their common columns, forming a new table. One key difference between these types of JOINs is that cross joins do not require common columns between the tables being joined, while natural joins do.
Here's an example of a cross join:
If you have 20 products and 10 colors, that's 200 rows right there!
Here's a natural join example using two tables, IBM employees and IBM managers:
This natural join returns all rows from IBM employees where there is no matching row in managers based on the column.
Given the data on IBM employees, can you find the average duration of service for employees across different departments? The Duration of service is represented as . If is NULL, consider it as the current date.
employee_id | name | start_date | end_date | department |
---|---|---|---|---|
101 | John | 2015-01-15 | 2020-06-30 | Technology |
102 | Emma | 2016-08-01 | NULL | Management |
103 | Ava | 2017-05-30 | 2019-08-01 | Strategy |
104 | Oliver | 2018-11-11 | NULL | Technology |
105 | Sophia | 2020-01-17 | NULL | Management |
In this query, we first calculate the duration of service for each employee. If is NULL, we consider the current date () as the end of service. We then average these durations over each department using the AVG() function.
This will provide an overview of how long employees typically serve in each department at IBM.
To practice a related SQL problem on DataLemur's free online SQL code editor, solve this Facebook SQL Interview question:
Both and are used to combine the results of two or more SELECT statements into a single result set.
However, only includes one instance of a duplicate, whereas includes duplicates.
Imagine that you are working for IBM and the marketing team wants to analyze the click-through conversion rates for their digital products. This involves the number of times a product is viewed, added to the cart and finally purchased.
For this scenario, let's assume that you have access to three tables: , and , and the tables have the following schema:
Example Input:
view_id | user_id | view_date | product_id |
---|---|---|---|
211 | 4895 | 06/10/2022 00:00:00 | 1325 |
364 | 2375 | 06/08/2022 00:00:00 | 3452 |
897 | 4403 | 06/14/2022 00:00:00 | 1325 |
999 | 3318 | 07/04/2022 00:00:00 | 9648 |
Example Input:
cart_id | user_id | add_to_cart_date | product_id |
---|---|---|---|
113 | 4895 | 06/11/2022 00:00:00 | 1325 |
570 | 2375 | 06/09/2022 00:00:00 | 3452 |
953 | 4403 | 06/15/2022 00:00:00 | 1325 |
Example Input:
purchase_id | user_id | purchase_date | product_id |
---|---|---|---|
118 | 4895 | 06/12/2022 00:00:00 | 1325 |
596 | 2375 | 06/10/2022 00:00:00 | 3452 |
The question is: What is the click-through conversion rate, defined as the number of purchases per view, for each product?
Here is a PostgreSQL query to solve this problem:
This query uses common table expressions ( statement) to first calculate the number of views and cart adds for each product (view_to_cart), and then the number of cart adds and purchases for each product (cart_to_purchase). Then it joins these two derived tables on the and calculates the conversion rate as the ratio of purchases to views. Finally, it orders the result by the conversion rate in descending order.
To solve a similar problem on DataLemur's free interactive SQL code editor, solve this Meta SQL interview question:
To find records in one table that aren't in another, you can use a and check for values in the right-side table.
Here's an example using two tables, IBM employees and IBM managers:
This query returns all rows from IBM employees where there is no matching row in managers based on the column.
You can also use the operator in PostgreSQL and Microsoft SQL Server to return the records that are in the first table but not in the second. Here is an example:
This will return all rows from employees that are not in managers. The operator works by returning the rows that are returned by the first query, but not by the second.
Note that isn't supported by all DBMS systems, like in MySQL and Oracle (but have no fear, since you can use the operator to achieve a similar result).
As a data analyst at IBM, you have been asked to analyze sales data from the IBM e-commerce platform. Given a table named with columns , , , and , write a SQL query to find out the average sales quantity for each product on a monthly basis.
sale_id | product_id | customer_id | sale_date | quantity |
---|---|---|---|---|
1 | 1001 | 123 | 03/01/2022 | 10 |
2 | 1002 | 265 | 03/04/2022 | 5 |
3 | 1001 | 478 | 03/15/2022 | 7 |
4 | 1003 | 192 | 04/01/2022 | 3 |
5 | 1002 | 123 | 04/05/2022 | 8 |
month | product | avg_quantity |
---|---|---|
3 | 1001 | 8.50 |
3 | 1002 | 5.00 |
4 | 1002 | 8.00 |
4 | 1003 | 3.00 |
The above SQL query first extracts the month from the sale_date column. It then groups by the extracted month and the product_id to compute the average sales quantity for each product in each month. The result is then ordered by month and product to make it easily readable. The AVG function in PostgreSQL calculates the average of a set of values by adding all the values and then dividing by the count of the values.
To practice a similar problem about calculating rates, try this TikTok SQL question within DataLemur's online SQL code editor:
As a data analyst at IBM, you are required to analyze data from two given tables; and . Write a SQL query in PostgreSQL to find out the total amount spent by each customer and the total number of orders placed by each customer. Join and tables using appropriate keys.
Sample data from the table is as follows:
Sample data from the table is as follows:
Here is the SQL query using and clause to find the total amount spent and total number of orders placed by each customer:
This query joins the and tables using the . Then, it groups the results by and , which results in aggregating the data per customer. It uses to calculate the total number of orders and to calculate the total amount spent by each customer.
To practice a similar problem on DataLemur's free interactive SQL code editor, solve this SQL interview question asked by Facebook:
The / operator is used to remove to return all rows from the first SELECT statement that are not returned by the second SELECT statement.
Note that is available in PostgreSQL and SQL Server, while MINUS is available in MySQL and Oracle (but don't worry about knowing which DBMS supports which exact commands since IBM interviewers aren't trying to trip you up on memorizing SQL syntax).
For a tangible example of in PostgreSQL, suppose you were doing an HR Analytics project for IBM, and had access to IBM's contractors and employees data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables.
You could use operator to find all contractors who never were a employee using this query:
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the IBM SQL interview is to solve as many practice SQL interview questions as you can! In addition to solving the earlier IBM SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Facebook, Google and unicorn tech startups.
Each exercise has hints to guide you, full answers and best of all, there is an interactive SQL code editor so you can instantly run your query and have it graded.
To prep for the IBM SQL interview it is also helpful to solve SQL problems from other tech companies like:
You can also checkout the db2 SQL reference documentation.
However, if your SQL query skills are weak, don't worry about going right into solving questions – refresh your SQL knowledge with this DataLemur SQL Tutorial.
This tutorial covers SQL concepts such as functions like SUM()/COUNT()/AVG() and CASE/WHEN/ELSE statements – both of which pop up frequently in SQL interviews at IBM.
For the IBM Data Science Interview, beyond writing SQL queries, the other types of questions to prepare for are:
To prepare for IBM Data Science interviews read the book Ace the Data Science Interview because it's got: