At Nemetschek, SQL is used across the company for analyzing complex construction, real estate, and geospatial data, as part of their work in the AEC/O industry. That's why Nemetschek almost always asks SQL problems in interviews for Data Analytics, Data Science, and Data Engineering jobs.
So, to help you ace the Nemetschek SQL interview, we've collected 11 Nemetschek SQL interview questions – how many can you solve?
Nemetschek is a software company that sells a wide variety of software licenses to clients. In order to keep track of who their most important users are, they would like to identify users that have the most licenses. Transactions from the last financial quarter need to be analyzed to identify these important users. License usage is determined by the number of unique active software license keys a user has purchased and is presently using.
For this purpose, you need to write a SQL query that will yield information on the top 10 users based on the total number of unique active licenses they own.
The table is as follows:
license_key | user_id | purchase_date | expiry_date |
---|---|---|---|
LIC0001 | 123 | 2022-04-01 | 2023-04-01 |
LIC0023 | 456 | 2022-06-05 | 2023-06-05 |
LIC0045 | 123 | 2022-05-11 | 2023-05-11 |
LIC0067 | 265 | 2022-07-20 | 2023-07-20 |
LIC0089 | 456 | 2022-04-30 | 2023-04-30 |
The table is as follows:
user_id | user_name |
---|---|
123 | Alice |
456 | Bob |
265 | Charlie |
In this query, the COUNT() function is used to count the number of unique license_key associated with each user_id for the given time period. The results are ordered in descending order of and limited to the top 10 users. Please note that only active licenses are considered (those whose is later than the current date).
To solve a similar power-user data analysis problem question on DataLemur's free interactive SQL code editor, try this recently asked Microsoft SQL interview question:
In Nemetschek, one of your tasks could be analyzing product ratings to help track and improve product performance. Given a dataset of review information, please write a SQL query to calculate the average star rating for each product per month. Consider the submit_date is in format.
The table is defined as follows:
review_id (INTEGER) | user_id (INTEGER) | submit_date (TEXT) | product_id (INTEGER) | stars (INTEGER) |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
You should return a table in the following format:
mth (INTEGER) | product (INTEGER) | avg_stars (FLOAT) |
---|
This query first creates a Common Table Expression (CTE) where it extracts the month from the submit_date and selects the required columns. Then it calculates the average ratings for each product per month using a window function. This window is partitioned by month and product, and ordered by month. This will give the average ratings for each product for every month. The results are finally ordered by product.
p.s. Window functions show up super often during SQL interviews, so practice the 27+ window function questions on DataLemur
In SQL, a join is used to combine rows from different tables based on a shared key or set of keys, resulting in a single merged table of data.
There are four distinct types of JOINs: , , , and .
(INNER) JOIN: Retrieves records that have matching values in both tables involved in the join.
LEFT (OUTER) JOIN: Retrieves all the records/rows from the left and the matched records/rows from the right table.
RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the matched records/rows from the left table.
FULL (OUTER) JOIN: Retrieves all the records where there is a match in either the left or right table.
Nemetschek is a software company that specializes in design and management solutions for the AEC (Architecture, Engineering, Construction) industry. For this question, let's assume that the company wants to analyze their software sales. Your task is designing tables and writing a PostgreSQL query that will retrieve the total number of licenses sold per software product, the total sales value and the average sales price.
Scenario: The company operates on a licensing model where customers can buy multiple licenses of various software products. They would like to track how many licenses of each product have been sold, the total sales generated, and also the average price of these sold licenses.
Here are the two tables you would have:
software_id | software_name |
---|---|
101 | 'Architect Pro' |
102 | 'Engineer X' |
103 | 'Construct Master' |
sale_id | software_id | customer_id | num_licenses | sale_price |
---|---|---|---|---|
1 | 101 | 1001 | 3 | 1500 |
2 | 102 | 1002 | 2 | 2000 |
3 | 101 | 1003 | 1 | 500 |
4 | 103 | 1001 | 5 | 2500 |
5 | 102 | 1003 | 4 | 4000 |
6 | 103 | 1002 | 2 | 1000 |
The table records each sale, with the software product sold, the customer to whom it was sold, how many licenses were purchased in that sale, and the total price of the sale.
Your task is to write a query that would produce the following report:
software_name | total_licenses_sold | total_sales_value | average_sale_price |
---|---|---|---|
'Architect Pro' | 4 | 2000 | 500 |
'Engineer X' | 6 | 6000 | 666.67 |
'Construct Master' | 7 | 3500 | 500 |
This query first joins the and tables on the field. The function is then used to calculate the total number of licenses sold per software product and their total sales value. The function calculates the average price of a license for each software product. The results are grouped by the software name.
A foreign key is a field in a database table that links to the primary key of another table, establishing a connection between the two tables. For example, let's analyze Nemetschek's Google Ads campaigns data:
:
+------------+------------+------------+------------+ | ad_id | campaign_id| keyword | click_count| +------------+------------+------------+------------+ | 1 | 201 | Nemetschek reviews | 120 | | 2 | 202 | Nemetschek pricing | 150 | | 3 | 101 | buy Nemetschek | 65 | | 4 | 101 | Nemetschek alternatives | 135 | +------------+------------+------------+------------+
is a foreign key that connects to the of the corresponding Google Ads campaign. This establishes a relationship between the ads and their campaigns, enabling easy querying to find which ads belong to a specific campaign or which campaigns a specific ad belongs to.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to link each ad to its ad group and the Google Ads account that the campaigns belong to, respectively.
Nemetschek, a software company, maintains a record of their customers and their purchases. You are tasked to filter their customer database to identify all customers from 'Germany' who have purchased 'Software' more than once in the year '2022'.
For this task, you are given two tables - and .
customer_id | first_name | last_name | country |
---|---|---|---|
101 | John | Doe | Germany |
102 | Jane | Smith | USA |
103 | Tim | Brown | Germany |
104 | Anna | Johnson | Canada |
purchase_id | customer_id | product | purchase_date |
---|---|---|---|
5001 | 101 | Software | 01/12/2022 |
5002 | 102 | Software | 01/14/2022 |
5003 | 103 | Hardware | 02/17/2022 |
5004 | 101 | Software | 03/12/2022 |
5005 | 103 | Software | 04/22/2022 |
This PostgreSQL query first uses a subquery that selects all customers from the table that have more than one Software purchase in 2022. The command is used as it allows conditional filtering post-aggregation. The result is then joined with the table to attribute customer information. The final filtered table consists of German customers who have purchased more than one software product in the year 2022.
A clustered index is an index where the order of the rows in the database corresponds to the order of the rows in the index. Because of this, a table can only have one clustered index, but it can have multiple non-clustered indexes.
The main difference between the two is that the database tries to keep the data in the same order as the corresponding keys in the clustered index. This can improve the performance of most query operations, as it provides a linear-access path to the data stored in the database.
Nemetschek, a software company that offers solutions in the realms of construction, media, and design, uses digital marketing to reach its target customers. The marketing department conducts several ad campaigns, and they maintain a detailed log of every ad clicked by users. They would like to analyze the click-through rates (CTR) of their campaigns.
The CTR is calculated as the total number of clicks that an ad receives divided by the total number of impressions (times the ad was shown) it gets, displayed as a percentage.
Using the available data in the - which stores data about each click including the ad campaign id, date, and user id - and tables - which stores data about each ad impression including the ad campaign id, date, and user id - calculate the CTR for each ad campaign.
click_id | user_id | click_date | ad_campaign_id |
---|---|---|---|
101 | 123 | 06/08/2022 00:00:00 | 50001 |
102 | 265 | 06/10/2022 00:00:00 | 69852 |
103 | 362 | 06/18/2022 00:00:00 | 50001 |
104 | 192 | 07/26/2022 00:00:00 | 69852 |
105 | 981 | 07/05/2022 00:00:00 | 69852 |
impression_id | user_id | impression_date | ad_campaign_id |
---|---|---|---|
201 | 123 | 06/08/2022 00:00:00 | 50001 |
202 | 265 | 06/09/2022 00:00:00 | 69852 |
203 | 375 | 06/10/2022 00:00:00 | 50001 |
204 | 198 | 07/24/2022 00:00:00 | 69852 |
205 | 972 | 07/27/2022 00:00:00 | 69852 |
In this SQL query, we first join the and tables on . For each ad campaign, we count the distinct click ids and distinct impression ids. The CTR is then calculated by dividing the number of unique clicks by the number of unique impressions and multiplying by 100 to get the rate in percentage. The result is grouped by , giving us the CTR for each campaign.
To solve a related SQL problem on DataLemur's free interactive coding environment, attempt this Meta SQL interview question:
As a software company, Nemetschek maintains a database of its customers. Due to their international operations, they want to filter down the customer base to find all customers whose last names start with 'Sch' to target them for their new software launch in German-speaking countries. Can you write a SQL query that will return a list of customers whose last name starts with 'Sch' from their table?
For instance, if the last_name is 'Schmidt', it should be returned by your query.
Consider the table with the following data:
customer_id | first_name | last_name | country | |
---|---|---|---|---|
1 | John | Schmidt | john.schmidt@example.com | Germany |
2 | Maria | Schwarz | maria.schwarz@example.com | Switzerland |
3 | Tim | Michaels | tim.michaels@example.com | USA |
4 | Anna | Schroeder | anna.schroeder@example.com | Austria |
5 | Bob | Brown | bob.brown@example.com | USA |
Here is a PostgreSQL query that will filter all customers whose last name starts with 'Sch':
This SQL query uses the LIKE keyword in combination with a wildcard character (%) to select all customers whose last name starts with 'Sch'. The '%' is used to match any sequence of characters in SQL, so 'Sch%' will match any last name that starts with 'Sch'.
This should output:
customer_id | first_name | last_name | country | |
---|---|---|---|---|
1 | John | Schmidt | john.schmidt@example.com | Germany |
2 | Maria | Schwarz | maria.schwarz@example.com | Switzerland |
4 | Anna | Schroeder | anna.schroeder@example.com | Austria |
To ensure the reliability and integrity of data, a database management system (DBMS) strives to maintain the ACID properties: Atomicity, Consistency, Isolation, and Durability. To illustrate these concepts, consider the following examples of how ACID properties apply to banking transactions:
As a database manager for the company Nemetschek, you are asked by the marketing team to provide an analysis for customer subscription and login frequency.
You are given two tables:
subscription_id | user_id | subscription_date | subscription_type |
---|---|---|---|
5641 | 475 | 05/08/2022 | monthly |
2802 | 888 | 06/10/2022 | yearly |
7293 | 492 | 06/28/2022 | monthly |
1452 | 849 | 07/26/2022 | monthly |
8617 | 961 | 07/05/2022 | yearly |
login_id | user_id | login_date |
---|---|---|
1281 | 475 | 06/08/2022 |
7803 | 888 | 06/12/2022 |
5298 | 492 | 06/30/2022 |
6358 | 849 | 08/01/2022 |
4514 | 961 | 07/15/2022 |
From the subscriptions table, they want to know the type of subscription each user has. From the logins table, they want to know how many times each user has logged in. They need you to join these details together based on the user_id.
The resulting table should have a row for every user in the 'subscriptions' table, along with their subscription type and login count (even if the count is zero).
user_id | subscription_type | login_count |
---|---|---|
475 | monthly | 1 |
888 | yearly | 1 |
492 | monthly | 1 |
849 | monthly | 1 |
961 | yearly | 1 |
In this query, we start by using a LEFT JOIN on the logins table using the user_id as the common factor. This ensures that we retrieve a row for each user in the 'subscriptions' table irrespective of whether they have logged in or not. The COUNT function is applied to count the number of logins and the output is grouped by user_id and subscription type.
Since joins come up so often during SQL interviews, try this interactive Snapchat JOIN SQL interview question:
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Nemetschek SQL interview is to solve as many practice SQL interview questions as you can! Besides solving the above Nemetschek SQL interview questions, you should also solve the 200+ SQL exercises on DataLemur which come from companies like Microsoft, Google, and Facebook.
Each SQL question has hints to guide you, detailed solutions and crucially, there is an interactive coding environment so you can easily right in the browser your SQL query answer and have it executed.
To prep for the Nemetschek SQL interview it is also wise to solve SQL problems from other tech companies like:
However, if your SQL skills are weak, don't worry about jumping right into solving questions – refresh your SQL knowledge with this DataLemur SQL tutorial.
This tutorial covers topics including manipulating string/text data and filtering with LIKE – both of which show up routinely in Nemetschek interviews.
In addition to SQL query questions, the other types of problems to practice for the Nemetschek Data Science Interview are:
The best way to prepare for Nemetschek Data Science interviews is by reading Ace the Data Science Interview. The book's got: