Back to questions
IBM is analyzing how their employees are utilizing the Db2 database by tracking the SQL queries executed by their employees. The objective is to generate data to populate a histogram that shows the number of unique queries run by employees during the third quarter of 2023 (July to September). Additionally, it should count the number of employees who did not run any queries during this period.
Display the number of unique queries as histogram categories, along with the count of employees who executed that number of unique queries.
Column Name | Type | Description |
---|---|---|
employee_id | integer | The ID of the employee who executed the query. |
query_id | integer | The unique identifier for each query (Primary Key). |
query_starttime | datetime | The timestamp when the query started. |
execution_time | integer | The duration of the query execution in seconds. |
Assume that the table below displays all queries made from July 1, 2023 to 31 July, 2023:
employee_id | query_id | query_starttime | execution_time |
---|---|---|---|
226 | 856987 | 07/01/2023 01:04:43 | 2698 |
132 | 286115 | 07/01/2023 03:25:12 | 2705 |
221 | 33683 | 07/01/2023 04:34:38 | 91 |
240 | 17745 | 07/01/2023 14:33:47 | 2093 |
110 | 413477 | 07/02/2023 10:55:14 | 470 |
Assume that the table below displays all employees in the table:
Column Name | Type | Description |
---|---|---|
employee_id | integer | The ID of the employee who executed the query. |
full_name | string | The full name of the employee. |
gender | string | The gender of the employee. |
employee_id | full_name | gender |
---|---|---|
1 | Judas Beardon | Male |
2 | Lainey Franciotti | Female |
3 | Ashbey Strahan | Male |
unique_queries | employee_count |
---|---|
0 | 191 |
1 | 46 |
2 | 12 |
3 | 1 |
The output indicates that 191 employees did not run any queries, 46 employees ran exactly 1 unique queries, 12 employees ran 2 unique queries, and so on.
The dataset you are querying against may have different input & output - this is just an example!
First, we need to filter the table to include only those queries executed between July 1, 2023, and September 30, 2023. We use these dates because they correspond to the third quarter of the year.
Using the function, we also count the number of unique queries per employee within this time frame.
💡Why Filter Up to October 1st?
The date range specified in the filter () ensures we include all queries executed from the start of July to the end of September. By using , we capture all timestamps up to, but not including October 1st. This is standard practice to include the entire last day of September without accidentally including any part of October.
Next, we need to ensure that all employees are included, even if they did not run any queries during the third quarter. Using the same query in Step 1, we'll use a to achieve this, combining the table with the results from Step 1.
We'll also use function to set the query count to 0 for employees with no queries.
We'll wrap this query in a CTE called .
Finally, in the main query, we group the results by the number of unique queries and count how many employees fall into each group. We also order the results by the number of unique queries to generate the histogram.