At NEXTDC, SQL is typically used for analyzing data center operations. That's why NEXTDC frequently asks SQL query questions during interviews for Data Science, Data Engineering and Data Analytics jobs.
To help you ace the NEXTDC SQL interview, we've curated 9 NEXTDC SQL interview questions – can you answer each one?
Suppose NEXTDC wants you to analyze their data center's power usage statistics. Given the table , write a SQL query to find the average power usage of each data center for each month in the year 2021. Also, find out the percentage change in the average power usage as compared to the previous month.
The table has the following columns:
id | data_center | usage_date | power_used |
---|---|---|---|
1 | DC1 | 2021-01-01 | 500 |
2 | DC1 | 2021-01-02 | 520 |
3 | DC1 | 2021-02-01 | 600 |
4 | DC2 | 2021-01-01 | 750 |
5 | DC2 | 2021-02-01 | 800 |
6 | DC1 | 2021-02-02 | 610 |
7 | DC2 | 2021-01-02 | 770 |
8 | DC2 | 2021-02-02 | 820 |
month | data_center | average_power_used | percentage_change |
---|---|---|---|
1 | DC1 | 510 | NULL |
2 | DC1 | 605 | 18.6 |
1 | DC2 | 760 | NULL |
2 | DC2 | 810 | 6.6 |
In this query, we start by grouping power usage by month and data center name. We use the AVG function to calculate the average power usage for each month.
Next, we use a window function to calculate the percentage change in power usage compared to the previous month. We use the LAG function to get the previous month's usage and subtract it from the current month's usage. To convert this difference into a percentage, we divide it by the previous month's usage and multiply the result by 100.
The COALESCE and NULLIF functions are used to handle cases where there is no previous month's power usage (i.e., for the first month). If there's no previous month's data, the percentage change is set as NULL. This query should return a row for each data center for each month, with columns for month, average power use, and percentage change from the previous month.
To solve a similar window function question on DataLemur's free interactive coding environment, solve this Google SQL Interview Question:
Given a table containing customer information of NEXTDC. The information provided includes , , , , and .
Write a SQL query that returns the names of customers who live in 'QLD', have the service type 'Premium' and have been on board since 2018 or later. Also, these customers should have a contract length of more than 2 years.
Consider the tables below as example input and output.
customer_id | name | service_type | state | onboard_year | contract_length |
---|---|---|---|---|---|
101 | John | Standard | QLD | 2019 | 2 |
102 | Mary | Premium | VIC | 2020 | 3 |
103 | Steve | Premium | QLD | 2018 | 3 |
104 | Jane | Premium | QLD | 2017 | 3 |
105 | Mark | Premium | QLD | 2021 | 3 |
name |
---|
Steve |
Mark |
In this query, we use the clause to filter the customers' data based on multiple boolean conditions. We use to ensure all conditions are met. This query will return the names of customers who live in 'QLD', signed up for the 'Premium' service, joined in 2018 or later, and have a contract length of more than 2 years.
{#Question-3}
is used to combine the results of multiple statements into a single result set.
Suppose you were doing an HR Analytics project for NEXTDC, and needed to analyze both NEXTDC's contractors and employees. You could use in the following way:
This statement would return a combined result set of NEXTDC contractors and employees who were hired after the start of the year 2023.
As a data center services provider, NEXTDC may have multiple clients. Each client may have different servers that use different amounts of power. Your task is to figure out the average power usage per client.
usage_id | client_id | server_id | power_usage (in kWh) |
---|---|---|---|
1 | 101 | 201 | 350 |
2 | 101 | 202 | 450 |
3 | 102 | 203 | 500 |
4 | 102 | 204 | 600 |
5 | 103 | 205 | 700 |
client_id | average_power_usage |
---|---|
101 | 400 |
102 | 550 |
103 | 700 |
This PostgreSQL query groups the data by and calculates the average for each group. The function in SQL computes the average of the values in a group. The statement groups the rows that have the same values in specified columns into aggregated data. The result of the query gives us the average power usage for each client.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for requiring aggregation of entities based on usage metrics or this Microsoft Supercloud Customer Question which is similar for getting information related grouped by client.
The clause in SQL allows you to select records that are unique, eliminating duplicates.
For example, if you had a table of NEXTDC employees, and wanted to see what are all the unique job titles that currently worked at the company, you could write the following query:
NEXTDC is an Australian-based technology company that delivers data center outsourcing solutions, connectivity services, and infrastructure management software. As a part of the day-to-day operations, NEXTDC needs to keep track of the power usage of each of its data centers. In particular, they are interested in calculating the average power usage per data center for each month.
log_id | data_center_id | log_date | power_usage_in_kw |
---|---|---|---|
101 | DC1 | 02/08/2022 00:00:00 | 800 |
102 | DC1 | 02/09/2022 00:00:00 | 850 |
103 | DC2 | 02/10/2022 00:00:00 | 900 |
104 | DC2 | 03/11/2022 00:00:00 | 950 |
105 | DC1 | 03/12/2022 00:00:00 | 1000 |
month | data_center | avg_power_usage_in_kw |
---|---|---|
2 | DC1 | 825 |
2 | DC2 | 900 |
3 | DC2 | 950 |
3 | DC1 | 1000 |
The above PostgreSQL query first extracts the month from the column. It then groups the data by both the extracted month and the , effectively grouping the data by both month and data center. Finally, it calculates the average for each group. This provides the required output, which is the average power usage for each data center, per month.
A database index is a data structure that improves the speed of data retrieval operations on a database table.
There are few different types of indexes that can be used in a database:
For a concrete example, say you had a table of NEXTDC customer payments with the following columns:
Here's what a clustered index on the column would look like:
A clustered index on the column would determine the physical order of the records in the table based on the . This means that the records with the earliest values would be stored together physically in the table, followed by records with later payment_date values. This speeds up queries that filter or sort the data based on the , as the records are already physically sorted in the table. For example, if you want to retrieve all the payments made in the month of June, the database can use the clustered index to quickly locate and retrieve the desired records without having to search through the entire table.
Write a SQL query to find out the total orders per customer and the details about the product they ordered most. You will need to join the table with the and tables.
Assuming we have the following tables:
order_id | customer_id | product_id | order_date |
---|---|---|---|
45 | 1 | 203 | 01/06/2022 |
46 | 2 | 105 | 01/15/2022 |
47 | 1 | 203 | 01/18/2022 |
48 | 3 | 187 | 01/20/2022 |
49 | 2 | 105 | 01/24/2022 |
customer_id | first_name | last_name |
---|---|---|
1 | John | Doe |
2 | Jane | Doe |
3 | Mark | Smith |
product_id | product_name | category |
---|---|---|
203 | Product A | Electronics |
105 | Product B | Appliances |
187 | Product C | Furniture |
customer_name | total_orders | most_ordered_product |
---|---|---|
John Doe | 2 | Product A |
Jane Doe | 2 | Product B |
Mark Smith | 1 | Product C |
In this query, we join the , , and tables based on their respective IDs. We use the count function to find the total number of orders placed by each customer. We then display the customer's full name (concatenated first and last name), total number of orders, and the name of the most ordered product. The result is grouped by and and ordered in descending order of the total number of orders.
Because joins come up so often during SQL interviews, practice an interactive Spotify JOIN SQL question:
Suppose you are working for NEXTDC and you want to examine the company's power usage for its various data centers. Assume each data center consumes power based on the number of running servers and the power draw of each server. Specifically, keep in mind that older servers use more power than new ones.
You are given two tables: and .
The table has three columns: (an integer that uniquely identifies each center), (the number of servers at the center), and (a string for the city where the data center resides).
center_id | server_count | location |
---|---|---|
1 | 400 | Sydney |
2 | 200 | Melbourne |
3 | 100 | Perth |
The table has three columns: (unique identifier for each server), (number of years the server has been operating), and (the center that server resides in).
server_id | age | center_id |
---|---|---|
1 | 2 | 1 |
2 | 5 | 1 |
3 | 3 | 2 |
4 | 1 | 2 |
5 | 4 | 3 |
Calculate the total power consumption for each data center, assuming the power draw of each server is watts per server. Then, round the total power for each center to the nearest whole watt.
center_id | location | total_power |
---|---|---|
1 | Sydney | 3600 |
2 | Melbourne | 2600 |
3 | Perth | 1600 |
In the above query, we join two tables on the column. Then, for each data center, we calculate the total power consumption by raising the server's age to power 2 and multiplying it by server count for that center. We use the function to round this total power to nearest whole watt. We use the and functions to get the total power for each data center.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating server usage time or this Microsoft Supercloud Customer Question which is similar for working with multivariate data.
The key to acing a NEXTDC SQL interview is to practice, practice, and then practice some more! Besides solving the above NEXTDC SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like FAANG (FB, Apple, Amazon, Netflix, Google).
Each problem on DataLemur has multiple hints, detailed solutions and best of all, there's an online SQL coding environment so you can right in the browser run your SQL query and have it checked.
To prep for the NEXTDC SQL interview it is also a great idea to solve SQL questions from other tech companies like:
In case your SQL skills are weak, forget about jumping right into solving questions – go learn SQL with this free SQL for Data Analytics course.
This tutorial covers SQL topics like math functions in SQL and Self-Joins – both of which come up often during NEXTDC SQL interviews.
In addition to SQL query questions, the other question categories tested in the NEXTDC Data Science Interview are:
The best way to prepare for NEXTDC Data Science interviews is by reading Ace the Data Science Interview. The book's got: