At Ansys, SQL does the heavy lifting for querying and analyzing simulation data, and managing the relational databases of product design and engineering data. Because of this, Ansys asks SQL query questions in interviews for Data Science and Data Engineering positions.
So, to help you ace the Ansys SQL interview, this blog covers 9 Ansys SQL interview questions – can you solve them?
Ansys is a company that specializes in engineering simulation software. As a Data Analyst at Ansys, your task is to identify the 'whale users' - the users who frequently use the application and perform significant activities. You have access to a table which records the user ID, the date, and description of the activity performed.
A 'whale user' is identified as a user who has performed 'Simulation Run' activity at least 100 times in the last year.
Design a SQL query to identify these 'whale users' from the table. List their user IDs and the total number of 'Simulation Run' activities they have performed.
activity_id | user_id | activity_date | activity |
---|---|---|---|
7819 | 101 | 12/01/2021 | Simulation Run |
5256 | 502 | 12/05/2021 | Model Building |
6109 | 101 | 12/07/2021 | Simulation Run |
5945 | 308 | 12/10/2021 | Result Analysis |
8819 | 101 | 12/10/2021 | Simulation Run |
The SQL query will use GROUP BY to group records by , and WHERE to filter 'Simulation Run' activities. Additionally, we will use HAVING to impose the condition of a user performing more than 100 'Simulation Run' activities. The PostgreSQL query can be written as follows:
This query returns a list of the users who are designated 'whale users' as per the given conditions. For each user, the query lists the user_id and the total number of 'Simulation Run' activities conducted by the user in the last year. If a user's id appears in the output of this query, that user is considered a 'whale user'.
To practice a similar VIP customer analysis question on DataLemur's free interactive coding environment, try this Microsoft Azure Cloud SQL Interview Question:
Given a table named containing Ansys product's data submission by various users, write a SQL query to find out the average submission time across products for each user, leaving out their single longest submission time. Assume that submission_time is in seconds.
The table is structured as:
submission_id | user_id | product_id | submission_time |
---|---|---|---|
1 | 123 | 1 | 1000 |
2 | 123 | 2 | 1100 |
3 | 123 | 1 | 1300 |
4 | 265 | 3 | 800 |
5 | 265 | 2 | 900 |
6 | 362 | 1 | 1200 |
7 | 362 | 1 | 1100 |
8 | 192 | 3 | 2000 |
9 | 192 | 2 | 1800 |
10 | 981 | 3 | 2700 |
Where:
Write a query that returns the following result:
user_id | average_submission_time |
---|---|
123 | 1100 |
265 | 800 |
362 | 1150 |
192 | 1800 |
981 | NULL |
Note that for user 981, as there is only one submission, we can't disregard any longest submission times and so it returns NULL.
The following PostgreSQL SQL statement uses the window function ROW_NUMBER which orders each user's submissions from longest to shortest:
In this query, we first the data by and then in descending order. This causes each user's longest submission_time to get a rk of 1 and other times get subsequent ranks. We then use a filter in the aggregation function to exclude the longest duration (where ) while calculating averages for each user.
p.s. Window functions show up super frequently during SQL interviews, so practice the 27+ window function questions on DataLemur
In SQL, zero's are numerical values which can be used in calculations and comparisons just like any other number. A blank space, also known as an empty string, is a character value and can be used in character manipulation functions and comparisons.
NULLs aren't the same as zero's or blank spaces. NULLs represent unkonwn, missing, or not applicable values. They are not included in calculations and comparisons involving NULL values always result in NULL.
Ansys is a multinational corporation that produces simulation software. Imagine that you are working on their database which tracks the usage of their software across different products, users and subscription types.
The database has three tables: , and . The table contains information about the product, including its and . The table tracks user data, including a , and . The table logs each unique software session by a user with an , , , and .
Your task is to write a PostgreSQL query to find the total time each user spent on each product for the last month.
product_id | product_name |
---|---|
1 | Simulation X |
2 | Simulation Y |
3 | Simulation Z |
user_id | name | subscription_type |
---|---|---|
101 | John Doe | monthly |
102 | Jane Doe | yearly |
103 | Mike Smith | monthly |
activity_id | user_id | product_id | session_start | session_end |
---|---|---|---|---|
1001 | 101 | 1 | 2022-09-01 09:00:00 | 2022-09-01 10:00:00 |
1002 | 102 | 2 | 2022-09-02 14:00:00 | 2022-09-02 16:00:00 |
1003 | 103 | 3 | 2022-09-03 11:00:00 | 2022-09-03 12:30:00 |
1004 | 101 | 1 | 2022-09-04 13:00:00 | 2022-09-04 14:15:00 |
1005 | 102 | 2 | 2022-09-05 14:30:00 | 2022-09-05 15:00:00 |
In the given PostgreSQL query, we join the , , and tables based on their keys. We limit our selection to the sessions from the last month and compute the total usage time for each user and product combination using the function and to get the session duration in seconds. The results are sorted by the and .
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1st, 2nd, 3rd normal forms).
Denormalization is done to improve the performance of the database when it is being used for online analytics processing (OLAP), rather than online transaction processing (OLTP) use cases.
For these OLAP use cases, you're bottleneck frequently is joining multiple tables, but de-normalizing your database cuts these outs.
Denormalization can improve query performance by reducing the number of expensive joins required to retrieve data, but it comes with its own drawbacks too. By adding redundant columns, you incur more data storage costs, and there's need for more complex update and delete operations in order to maintain data integrity across all the duplicated data. Thus, it's important to carefully consider the trade-offs involved before implementing denormalization.
Ansys has a large customer base with diverse user groups across various sectors, and each group has different average purchase amounts. The task is to write a PostgreSQL query to calculate the average purchase amount for each user group. Please filter customers who made at least 3 purchases over the last three months.
Consider the , , and tables structured as follows:
user_id | user_group_id |
---|---|
732 | 1 |
145 | 2 |
356 | 1 |
789 | 3 |
926 | 2 |
user_group_id | group_name |
---|---|
1 | "Education" |
2 | "Healthcare" |
3 | "Automotive" |
purchase_id | user_id | amount | purchase_date |
---|---|---|---|
501 | 732 | 220.50 | 06/15/2022 |
982 | 145 | 320.00 | 06/12/2022 |
323 | 356 | 180.00 | 06/10/2022 |
204 | 789 | 190.75 | 06/20/2022 |
418 | 732 | 250.00 | 06/22/2022 |
729 | 145 | 300.00 | 08/22/2022 |
190 | 732 | 240.00 | 07/10/2022 |
682 | 926 | 150.00 | 06/10/2022 |
459 | 145 | 125.00 | 08/20/2022 |
This SQL query calculates the average amount of purchases per user group for users who have made at least 3 purchases over the last three months. The query first joins the tables on the respective foreign keys ( and ). Then it filters the purchase data by a specific date range, after which it groups the data by the user groups. The clause is finally used to filter user groups with users that made at least 3 purchases.
A database index is a data structure that provides a quick lookup of data in a column or columns of a table.
There are several types of indexes that can be used in a database:
As an analyst at Ansys, your task is to filter the customer records and find out those customers who have their emails registered with 'gmail.com'. Specifically, your job is to select the customers' names and their emails from the records.
Here is the sample custom data we will use for this problem:
customer_id | first_name | last_name | email_address |
---|---|---|---|
231 | John | Doe | johndoe@gmail.com |
342 | Adam | Smith | adamsmith@yahoo.com |
473 | Lucy | Williams | lucywilliams@gmail.com |
584 | Michael | Brown | michaelbrown@outlook.com |
695 | Emma | Johnson | emmajohnson@gmail.com |
To answer this question, you can use the SQL keyword to filter the for customers from 'gmail.com'. Here is the PostgreSQL query you can use:
This query selects the , , and columns from the table where the ends with '@gmail.com'.
The '%' symbol is a wildcard character that matches any sequence of characters. In the context of the keyword, the '%@gmail.com' pattern is used to match any that ends with '@gmail.com'. And the returned result would be:
first_name | last_name | email_address |
---|---|---|
John | Doe | johndoe@gmail.com |
Lucy | Williams | lucywilliams@gmail.com |
Emma | Johnson | emmajohnson@gmail.com |
It indicates these customers have their emails registered with 'gmail.com'.
Ansys is a leader in simulation software. They would like to understand the usage pattern of their software tools by their customers so they can make interesting insights or operations strategy decisions. They are particularly interested in how many unique users are using their software by month of the year.
You have a table called , that has the following data format:
activity_id | user_id | usage_date | software_id | duration_in_hours |
---|---|---|---|---|
1 | 1000 | 01/02/2022 00:00:00 | 10 | 2.5 |
2 | 1001 | 01/06/2022 00:00:00 | 10 | 1.8 |
3 | 1000 | 02/08/2022 00:00:00 | 11 | 3.7 |
4 | 1002 | 02/14/2022 00:00:00 | 10 | 4.5 |
5 | 1001 | 02/15/2022 00:00:00 | 11 | 2.3 |
Write a SQL query that will output a table containing the count of unique users per month, for each of their software products.
mth | software | unique_users |
---|---|---|
1 | 10 | 2 |
2 | 10 | 1 |
2 | 11 | 2 |
This query first extracts the month from the using the function. Then, it groups by the extracted month and , and applies the function to count the unique user_ids in each group. The clause sorts the output by month and software.
The best way to prepare for a Ansys SQL interview is to practice, practice, practice. In addition to solving the earlier Ansys SQL interview questions, you should also solve the 200+ SQL Interview Questions on DataLemur which come from companies like Microsoft, Google, Amazon, and tech startups.
Each problem on DataLemur has multiple hints, step-by-step solutions and best of all, there's an online SQL coding environment so you can easily right in the browser your query and have it checked.
To prep for the Ansys SQL interview it is also helpful to solve interview questions from other industrial-tech companies like:
However, if your SQL coding skills are weak, forget about diving straight into solving questions – go learn SQL with this free SQL for Data Analytics course.
This tutorial covers SQL concepts such as RANK vs. DENSE RANK and LEFT vs. RIGHT JOIN – both of these pop up often during Ansys interviews.
In addition to SQL interview questions, the other types of questions to prepare for the Ansys Data Science Interview are:
The best way to prepare for Ansys Data Science interviews is by reading Ace the Data Science Interview. The book's got: