At OVH Cloud, SQL does the heavy lifting for analyzing server infrastructure data and optimizing customer database queries for OVH's cloud computing services. So, it shouldn't surprise you that OVHcloud almost always evaluates jobseekers on SQL questions in interviews for Data Science and Data Engineering positions.
So, to help you ace the OVHcloud SQL interview, here’s 9 OVH Groupe SQL interview questions – can you solve them?
At OVHcloud, we have a comprehensive product listing on which users can give their reviews and ratings. These reviews greatly impact our product ranking algorithm and we want to closely monitor them. For this purpose, write a SQL query which will provide average rating for each product every month using SQL window functions.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 00:00:00 | 50001 | 4 |
7802 | 265 | 2022-06-10 00:00:00 | 69852 | 4 |
5293 | 362 | 2022-06-18 00:00:00 | 50001 | 3 |
6352 | 192 | 2022-07-26 00:00:00 | 69852 | 3 |
4517 | 981 | 2022-07-05 00:00:00 | 69852 | 2 |
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
In PostgreSQL, you can use the DATE_TRUNC function to group by months.
This will calculate the average ratings per product for each month. The PARTITION BY clause specifies the columns that we want to calculate the average over. Because we want the average per each product per each month, we include the product_id column and the month of the submit_date in PARTITION BY clause.
Note: This expects that is a timestamp or date data type. If is a string, it first needs to be converted to a date or timestamp using, for example, the function.
Pro Tip: Window functions are a frequent SQL interview topic, so practice every window function problem on DataLemur
As a database designer for OVHcloud, a large-scale cloud infrastructure company, you're required to manage the resources associated with various data centers across the world. Your task is to design a system that keeps track of the servers present in these data centers and their respective utilization. You're required to model the tables, their relationships and column distributions.
Consider there are two tables, and . The table contains information about each data center including its location and maximum capacity (measured in number of servers). The table includes information about each server, such as its current utilization (measured as a percentage), the data center it's located in, and the type of service it's supporting.
datacenter_id | location | max_capacity |
---|---|---|
100 | New York | 500 |
101 | Los Angeles | 600 |
102 | Paris | 700 |
server_id | datacenter_id | utilization | service_type |
---|---|---|---|
11001 | 100 | 50 | webhosting |
11002 | 100 | 60 | |
11003 | 101 | 70 | storage |
11004 | 102 | 60 | webhosting |
11005 | 102 | 80 |
Suppose the company introduces a new service type and wants to know which data center has the most space (potential for maximum servers) and least average server utilization. Write a PostgreSQL query to find this out.
This query first creates a CTE (Common Table Expression) named , which calculates the number of servers () and average utilization () for each data center. It then selects the data center with most space for potential servers and then orders these by average utilization, picking the one with the lowest average utilization. This provides the information about which data center is ideally suited for introducing new servers for a new service.
A correlated sub-query is a sub-query that depends on the outer query and cannot be executed separately. It uses the outer query to filter or transform data by referencing a column from it, and the outer query uses the results of the inner query. In contrast, a non-correlated sub-query is independent of the outer query and can be executed on its own. It does not reference any columns from the outer query and is used to retrieve additional data needed by the outer query.
Here is an example of a correlated sub-query:
This query selects the and total sales of all OVHcloud customers in the sales table whose are greater than the average of their own customer group. The sub-query in this case is correlated with the outer query, as it references the column from the outer query ().
Here is an example of a non-correlated sub-query:
The sub-query in this case is non-correlated, as it does not reference any columns from the outer query.
Performance-wise, correlated sub-queries are generally slower to execute, as they have to be re-evaluated for each row of the outer query, while non-correlated sub-queries are faster, as they only have to be executed once.
For OVHcloud, a global provider of hyperscale cloud, it's essential to know the efficiency of their product-viewing to the adding-to-cart mechanism. We would want to understand how often users viewing the details of a server, for instance, end up adding it to their cart.
Assume OVHcloud maintains two PostgreSQL tables: and , with the following schema and example data.
view_id | user_id | product_id | view_date |
---|---|---|---|
4839 | 165 | SVR01 | 11/10/2022 00:00:00 |
8346 | 192 | SVR02 | 11/15/2022 00:00:00 |
6732 | 716 | SVR03 | 11/18/2022 00:00:00 |
3839 | 432 | SVR02 | 11/12/2022 00:00:00 |
4732 | 395 | SVR01 | 11/11/2022 00:00:00 |
cart_event_id | user_id | product_id | add_to_cart_date |
---|---|---|---|
4738 | 165 | SVR01 | 11/10/2022 00:10:00 |
8974 | 716 | SVR03 | 11/18/2022 01:12:00 |
4592 | 432 | SVR02 | 11/13/2022 00:00:00 |
3829 | 498 | SVR02 | 11/12/2022 00:20:00 |
Calculate the daily clickthrough conversion rate of each product from a product view to an add-to-cart event. The conversion rate should be calculated as: on a given day.
This provided query calculates the daily click-through-rate for all the products. By joining the product_views and add_to_cart_events table on user_id and product_id, we are able to aggregate counts and calculate the required conversion rate. The date truncation ensures we are comparing events on the same day. As we are using , all product views are included even when there are no matching add_to-cart events, allowing us to compute a complete picture of conversion rates.
To practice a related SQL problem on DataLemur's free interactive coding environment, try this Meta SQL interview question:
If you have two tables and want to retrieve only the rows that are present in both tables, just use the operator!
For example, let's use to find all of OVHcloud's Facebook video ads that are also being run on YouTube:
As a cloud service provider, OVHcloud offers different types of services such as Cloud Compute, Web Hosting, and Dedicated Servers. Each service uses certain amount of resources like CPU, RAM, and storage. Your task is to calculate the average resource usage by each service for the last month.
usage_id | service_type | date | cpu_usage | ram_usage | storage_usage |
---|---|---|---|---|---|
7291 | Cloud Compute | 06/15/2022 | 0.6 | 5.2 | 25 |
8392 | Web Hosting | 06/17/2022 | 0.3 | 2.1 | 10 |
9633 | Dedicated Servers | 06/18/2022 | 1.2 | 24 | 500 |
6754 | Cloud Compute | 06/20/2022 | 0.7 | 6 | 30 |
7085 | Dedicated Servers | 06/20/2022 | 1.4 | 32 | 600 |
service_type | avg_cpu_usage | avg_ram_usage | avg_storage_usage |
---|---|---|---|
Cloud Compute | 0.65 | 5.6 | 27.5 |
Web Hosting | 0.3 | 2.1 | 10 |
Dedicated Servers | 1.3 | 28 | 550 |
This query calculates the average CPU, RAM, and storage usage of each service type for the last month. It groups by the service_type column and uses the AVG function to calculate the mean of each resource's usage for each service. The date filter restricts it to only the data from the past month.
The function can take in multiple paramaters, and returns the first input paramater that is not null. If all arguments are null, the COALESCE function will return null too.
Suppose you have a table of OVHcloud salespeople and the number of deals they closed. If a salesperson didn't close any deals, the data from the 3rd-party CRM system exports a NULL value.
sales_person | closed_deals |
---|---|
Jason Wright | NULL |
Drew Jackson | 3 |
Chris Ho | NULL |
Adam Cohen | 2 |
Samantha Perez | 4 |
To change these NULLs to zeros, you can use the function in the following way:
This would result in the following data:
sales_person | closed_deals |
---|---|
Jason Wright | 0 |
Drew Jackson | 3 |
Chris Ho | 0 |
Adam Cohen | 2 |
Samantha Perez | 4 |
Given a database table named that stores information about OVHcloud customers, including customer_id, name, email, and country. Could you write a SQL query to find all customers whose email address domain matches 'ovhcloud.com' and they are located in France?
Here's a sample of the input table:
customer_id | name | country | |
---|---|---|---|
1 | John Doe | john.doe@ovhcloud.com | France |
2 | Jane Smith | jane.smith@yahoo.com | USA |
3 | Alice Johnson | alice.johnson@ovhcloud.com | USA |
4 | Bob Wilson | bob.wilson@ovhcloud.com | France |
5 | Charlie Brown | charlie.brown@gmail.com | UK |
Here's the PostgreSQL query to solve this case:
This query uses the keyword to filter those customers whose email address ends with '@ovhcloud.com', meaning they are using the OVHcloud's email domain. Then it further filters to show only those customers who are located in France.
The output of this query would be:
name | country | |
---|---|---|
John Doe | john.doe@ovhcloud.com | France |
Bob Wilson | bob.wilson@ovhcloud.com | France |
This results show the customers with 'ovhcloud.com' email domain and are located in France.
OVHcloud would like to calculate the utilization percentage of their servers over time. They define their server utilization as the total storage used divided by the total storage capacity. Given the and tables below, write a SQL query to calculate the utilization percentage of each server monthly. At the end of each month, calculate the average of the daily utilization percentages. Round it to 3 decimal places and output values as a percentage such that 0.500 is displayed as 50.000%.
server_id | server_name | capacity_GB |
---|---|---|
101 | Apollo | 500 |
102 | Zeus | 1000 |
103 | Hermes | 800 |
usage_id | server_id | usage_GB | usage_date |
---|---|---|---|
1 | 101 | 200 | 2022-06-01 |
2 | 101 | 250 | 2022-06-02 |
3 | 101 | 300 | 2022-06-03 |
4 | 102 | 500 | 2022-06-01 |
5 | 102 | 600 | 2022-06-02 |
6 | 102 | 700 | 2022-06-03 |
7 | 103 | 300 | 2022-06-01 |
8 | 103 | 350 | 2022-06-02 |
9 | 103 | 400 | 2022-06-03 |
The above query joins the and tables using a clause on the . The function is used to truncate the to a monthly precision. The function calculates the average utilization by dividing the by and then multiplied by 100 to convert it to a percentage. The function is used to round this value to three decimal places. This result is grouped by and and sorted by the same.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating server utilization time or this Facebook Average Review Ratings Question which is similar for calculating average percentages over time.
The key to acing a OVHcloud SQL interview is to practice, practice, and then practice some more! In addition to solving the earlier OVHcloud SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Amazon, Microsoft, Meta, and smaller tech companies.
Each interview question has hints to guide you, fully explained answers along with a discussion board to see how others solved it and best of all, there's an interactive coding environment so you can right in the browser run your SQL query answer and have it checked.
To prep for the OVHcloud SQL interview you can also be useful to solve interview questions from other tech companies like:
In case your SQL coding skills are weak, don't worry about jumping right into solving questions – go learn SQL with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers SQL topics like CASE/WHEN/ELSE statements and math functions like CEIL()/FLOOR() – both of which come up often in OVHcloud SQL assessments.
Beyond writing SQL queries, the other types of problems tested in the OVHcloud Data Science Interview are:
The best way to prepare for OVHcloud Data Science interviews is by reading Ace the Data Science Interview. The book's got: