11 Chindata SQL Interview Questions (Updated 2024)

Updated on

March 3, 2024

At Chindata, SQL is crucial for extracting and manipulating customer data for insights and designing databases to house server-related information. That's why Chindata frequently asks SQL query questions during interviews for Data Analyst, Data Science, and BI jobs.

As such, to help you ace the Chindata SQL interview, here’s 11 Chindata SQL interview questions – how many can you solve?

11 Chindata SQL Interview Questions

SQL Question 1: Identifying Power Users

Identify the top 5 power users in "Chindata" company based on the total number of transactions made within the last 30 days. Power Users are defined as users who have made at least 10 transactions within the last 30 days.

- Example Input:
transaction_iduser_idtransaction_dateamount
102456710/01/2022 00:00:0050.00
204812310/02/2022 00:00:0075.00
409645610/05/2022 00:00:00100.00
819212310/10/2022 00:00:00150.00
1638456710/20/2022 00:00:0050.00
Example Output:
user_idtotal_transactionstotal_amount
1232225.00
5672100.00
4561100.00

This query identifies and returns the top 5 VIP users within the last 30 days. It does this by grouping transactions by , counting the number of transactions, and summing up the total transaction amount per user only for transactions made within the past 30 days. The query then filters to keep only the users with ten or more transactions within the time frame. The result is ordered first by the total number of transactions in descending order, and then by the total transaction amount in descending order (for situations where two users have an equal number of transactions).

To work on another SQL customer analytics question where you can solve it right in the browser and have your SQL code instantly graded, try this Walmart SQL Interview Question:

SQL Question 2: Analyze Server Usage Data

You're given a table representing the usage data of Chindata's servers. Each row of the table represents a single day of usage for a given server. The table contains the following columns:

• : The unique identifier of a server
• : The date of data collection
• : The CPU usage in percentage
• : The RAM usage in percentage

With this data, write a SQL query to find the average CPU usage and RAM usage of each server in the previous seven days for each given day.

Example Input:
server_idusage_datecpu_usageram_usage
S1012022-05-014563
S1022022-05-014660
S1012022-05-023550
S1022022-05-025575
S1012022-05-034065
S1022022-05-035070
S1012022-05-045575
S1022022-05-044568
Example Output:
server_idusage_dateavg_cpu_usageavg_ram_usage
S1012022-05-024056.5
S1022022-05-0250.567.5
S1012022-05-034059.3
S1022022-05-0350.368.3
S1012022-05-0443.763.7
S1022022-05-044968.3

The SQL query above calculates the rolling 7 days average CPU and RAM usages. We must have to use a window function because we're calculating a running average that depends on the current row and the 6 preceding rows. This is accomplished by using the clause in the function. The clause ensures that each server's usage is restarted when calculating the average, to prevent the overflow from one server to another.

p.s. Window functions show up pretty frequently during SQL interviews, so practice the 27+ window function questions on DataLemur

SQL Question 3: What are SQL constraints, and can you give some examples?

Think of SQL constraints like the rules of a game. Just like a game needs rules to keep things fair and fun, a database needs constraints to keep things organized and accurate.

There are several types of SQL constraints like:

NOT NULL: This constraint is like a bouncer at a nightclub - it won't let anything NULL through the door. UNIQUE: This constraint is like a VIP list - only special, one-of-a-kind values get in. PRIMARY KEY: This constraint is like an elected official - it's made up of NOT NULL and UNIQUE values and helps identify each row in the table. FOREIGN KEY: This constraint is like a diplomatic ambassador - it helps establish relationships between tables. CHECK: This constraint is like a referee - it makes sure everything follows the rules. DEFAULT: This constraint is like a backup plan - it provides a default value if no other value is specified.

So, whether you're playing a game or organizing a database, constraints are an important part of the process!

SQL Question 4: Chindata Usage Analysis

Chindata is a company that provides cloud storage services to its clients. As their cloud storage utilization is growing, they need to understand the usage patterns to offer better and efficient services. The company wants an SQL query that can determine the average usage and the maximum usage each month for every client, so they can plan for future upgrades and efficiently assign resources.

They maintain two main tables, and , representing the information about their clients and their storage usage respectively.

Example Input:
client_idnameclient_since
1Client A01/01/2022
2Client B03/25/2022
3Client C05/15/2022
Example Input:
usage_idclient_idusage_dateusage_gb
1106/01/2022200
2106/15/2022215
3107/01/2022220
4206/25/2022150
5207/01/2022155
6305/15/2022100
7306/01/2022120
8307/01/2022130
9307/15/2022140

This query joins the and tables on the field and extracts from the . It further groups data by and and computes the average and maximum usage for each group. The function truncates the date to month, and and functions are used to calculate the average and maximum usage respectively.

SQL Question 5: What does database normalization mean?

Normalization involves breaking up your tables into smaller, more specialized ones and using primary and foreign keys to define relationships between them. Not only does this make your database more flexible and scalable, it also makes it easier to maintain. Plus, normalization helps to keep your data accurate by reducing the chance of inconsistencies and errors.

The only downside is now is that your queries will involve more joins, which are slow AF and often a DB performance botteleneck.

SQL Question 6: Find the Average CPU Usage

As the data analyst of Chindata, your task is to track the performance of company servers. Given a table named that contains server data logged every minute, write an SQL query to find the average CPU usage by server, sorted by the server with the highest average usage.

Example Input:
log_idserver_idlog_timecpu_usage
101A101/10/2022 00:01:005%
102A101/10/2022 00:02:0010%
103A201/10/2022 00:02:0015%
104B101/10/2022 00:03:0020%
105A101/10/2022 00:03:0010%
106B101/10/2022 00:04:0015%

Note: CPU usage is a percentage string.

Example Output:
server_idavg_cpu_usage
B117.5%
A215.0%
A18.33%

The AVG function is used here to find the average CPU usage per server. The REPLACE and CAST functions are used to transform the cpu_usage field into a numeric field because it is originally a string field containing a percentage sign (%). The TO_CHAR function is used to convert the resulting average back into a percentage string format in the result set. The server_id results are then ordered in descending order according to their average CPU usage.

To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating server time usage or this Google Odd and Even Measurements Question which is similar for working with different types of server data.

SQL Question 7: How is the constraint used in a database?

A is a column or set of columns in a table that references the primary key of another table. It is used to link the two tables together, and to ensure that the data in the foreign key column is valid.

The constraint helps to ensure the integrity of the data in the database by preventing the insertion of rows in the foreign key table that do not have corresponding entries in the primary key table. It also helps to enforce the relationship between the two tables, and can be used to ensure that data is not deleted from the primary key table if there are still references to it in the foreign key table.

For example, if you have a table of Chindata customers and an orders table, the customer_id column in the orders table could be a that references the id column (which is the primary key) in the Chindata customers table.

SQL Question 8: Click-Through-Rate for Chindata

Recently, Chindata has launched several digital ads and they want to analyze their click-through rates (CTR). The CTR is calculated as the number of users who clicked on the ad divided by the number of total impressions (the number of times the ad was shown). Assume you have access to two tables.

The first one, named , contains data for every time an ad was shown to a user. The second one, named , contains data for every time a user clicked on an ad.

Example Input:
675182306/08/2021 00:00:00403
875243206/10/2021 00:00:00403
454367806/18/2021 00:00:00612
587299507/26/2021 00:00:00612
643790007/05/2021 00:00:00522
Example Input:
493282306/08/2021 00:00:01403
265854306/10/2021 00:00:01612
753299507/26/2021 00:00:01612
431779707/05/2021 00:00:01403
321472107/10/2021 00:00:01522

Using the above two tables, write a SQL query that provides the CTR per advertisement.

This query uses a LEFT JOIN to combine the impressions and clicks tables on ad_id and user_id. As we're interested in the CTR, it's necessary to count the number of unique users who clicked on the ad and divide this by the total number of unique impressions . The result is the CTR for each ad.

To practice a related SQL interview question on DataLemur's free online SQL coding environment, solve this Meta SQL interview question:

SQL Question 9: Calculate the Monthly Bandwidth Usage

Assume Chindata is a data center providing hosting services. At the end of each day, the system automatically records the amount of data transferred by each hosted server. Your goal is to write a SQL query that calculates the total amount of data transferred for each server on a monthly basis.

The table has the following columns:

• -- a unique identifier for each usage record (int)
• -- the identifier of the server (int)
• -- the date when the data usage was recorded (datetime)
• -- the amount of data transferred in gigabytes (int)
Example Input
usage_idserver_iddatedata_gb
101106/01/2022 00:00:005
102106/02/2022 00:00:007
103206/02/2022 00:00:002
104107/01/2022 00:00:0010
105207/01/2022 00:00:0015
Example Output
monthserver_idtotal_data_gb
6112
622
7110
7215

This above query first extracts the month part from the date using the DATE_PART function. It then groups the rows by the month and server_id and calculates the sum of data_gb for each group. The output is ordered by the month and server_id for readability.

SQL Question 10: When it comes to database normalization, what's the difference between 1NF, 2NF, and 3NF?

There are several normal forms that define the rules for normalizing a database:

A database is in first normal form (1NF) if it meets the following criteria:

• Each column in a table contains a single value (no lists or containers of data)
• Each column should contain the same type of data (no mixing strings vs. integers)
• Each row in the table is unique

A database is in second normal form (2NF) if it meets the following criteria:

• It is in first normal form.
• All non-key attributes in a table are fully dependent on the primary key.

Said another way, to achieve 2NF, besides following all the rules from 1NF all the columns in a given table should be dependent only on that table's primary key.

A database is in third normal form (3NF) if it meets the following criteria:

• It is in second normal form.
• There are no transitive dependencies in the table.

A transitive dependency means that a piece of data in one column is derived from another column. For example, it wouldn't make sense to keep a column called "user's age" and "user's birthdate" (because age can be derived from birthdate.

While there's also a 4th and 5th normal form, it's too pedantic and hence out-of-scope to know for the Chindata SQL interview.

SQL Question 11: Calculate the Average Power Consumption per Server Rack by Region

Chindata has several data centers distributed across various regions. Each data center consists of several server racks. Each rack contains several servers, and the power consumption of each varies. Chindata wants to calculate the average power consumption per rack for each region.

Below are the data structures for the , , and .

dc_idregion
101East
102West
103North
104South
rack_iddc_id
201101
202101
203102
204103
205104
Input:
server_idrack_idpower_consumption
301201150
302201200
303202250
304203350
305204120
306205360
Example Output:
regionavg_power_per_rack
East200
West350
North120
South360

This query joins the , , and tables on their respective foreign keys. It then groups the result by , and generates an average power consumption for each rack in each region using PostgreSQL's and functions. The function is used to round the result to the nearest whole number.

The two most similar questions to your query are:

1. "Server Utilization Time" by Amazon This question's similarity lies in the requirement to calculate a specific metric (total time) for server utilization.

2. "Supercloud Customer" by Microsoft This question is similar as it also involves working with different server racks, only in this question, it's about identifying companies instead of calculating average power consumption.

So the markdown-friendly output you requested is:

To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating server related metrics or this Microsoft Supercloud Customer Question which is similar for dealing with server racks.

Chindata SQL Interview Tips

The best way to prepare for a Chindata SQL interview is to practice, practice, practice. Beyond just solving the above Chindata SQL interview questions, you should also solve the 200+ SQL exercises on DataLemur which come from companies like Netflix, Google, and Amazon.

Each SQL question has hints to guide you, step-by-step solutions and best of all, there is an interactive coding environment so you can right in the browser run your query and have it executed.

To prep for the Chindata SQL interview it is also wise to practice interview questions from other tech companies like:

But if your SQL skills are weak, don't worry about going right into solving questions – go learn SQL with this DataLemur SQL Tutorial.

This tutorial covers SQL concepts such as window functions and CASE/WHEN statements – both of these come up routinely in SQL job interviews at Chindata.

Chindata Data Science Interview Tips

What Do Chindata Data Science Interviews Cover?

In addition to SQL query questions, the other question categories to prepare for the Chindata Data Science Interview are:

How To Prepare for Chindata Data Science Interviews?

The best way to prepare for Chindata Data Science interviews is by reading Ace the Data Science Interview. The book's got:

• 201 Interview Questions from tech companies like Netflix, Google, & Airbnb
• A Crash Course covering Product Analytics, SQL & ML
• Great Reviews (900+ 5-star reviews on Amazon)