At UiPath, SQL is used quite frequently for analyzing and manipulating robotic process automation (RPA) data. They even support querying SQL databases as part of their automation tools. Because of how important SQL is to the company, UiPath asks SQL coding questions during interviews for Data Science and Data Engineering positions.
As such, to help you study for the UiPath SQL interview, here’s 9 UiPath SQL interview questions – able to answer them all?
You are reviewing our database, which includes data about users' tasks execution in UiPath's RPA software. You are given a dataset that includes the user_id, task_id, and the start_time when each task was run. Assuming each distinct task_id corresponds to a totally separate task, write a SQL query to find the total number of tasks run per user for each month.
Additionally, for each user_id and month, calculate the running total of tasks run since 2021 (including the current month's count).
The dataset, named , is formatted as follows:
user_id | task_id | start_time |
---|---|---|
1 | 100 | 2021-06-01 10:00:00 |
1 | 200 | 2021-06-15 15:30:00 |
1 | 312 | 2021-07-22 09:00:00 |
2 | 435 | 2021-08-11 17:30:00 |
2 | 531 | 2021-08-12 14:00:00 |
2 | 612 | 2021-08-13 07:00:00 |
3 | 714 | 2021-09-21 08:30:00 |
This SQL solution uses the window function to calculate the running total of task executions for each user. The window function is partitioned over , meaning it restarts the count for each new . The order of the window is configured using the , specifically, the task execution year and month. This ensures the running total is accumulated in chronological order for each user from 2021 onwards.
To practice a similar window function interview problem which uses RANK() on DataLemur's free interactive coding environment, solve this Amazon SQL Interview Question:
UiPath utilizes software robots to automate repetitive tasks for businesses. Let's assume that the key journal table in the UiPath database is which tracks the activity of each robot for given tasks.
The business problem at hand is to understand the percentage of time that each robot spends executing tasks in a given month.
With this information, the team can analyze how effectively the robots are being utilized and find areas for potential improvement.
The table has the following attributes: (unique identifier), (the robot performing the task), and (the start & end timestamps of each task).
Let's consider the following sample data:
activity_id | robot_id | start_time | end_time |
---|---|---|---|
1001 | 1 | 2022-06-08T10:15:00 | 2022-06-08T10:35:00 |
1002 | 1 | 2022-06-08T11:00:00 | 2022-06-08T11:20:00 |
1003 | 2 | 2022-06-08T15:30:00 | 2022-06-08T16:00:00 |
1004 | 1 | 2022-06-09T09:00:00 | 2022-06-09T09:45:00 |
1005 | 2 | 2022-06-09T14:15:00 | 2022-06-09T14:30:00 |
The answer is a SQL query that uses PostgreSQL functions to calculate the ratio of time that each robot spends on tasks in each month. The function is used to calculate the duration of each task in seconds. The result of the function is summed for each robot and each month and then divided by the number of seconds in a month to get the utilization ratio.
The function truncates the timestamp to the month level, which allows tasks to be grouped by month.
The result of this query would be a table that shows each robot's utilization ratio for each month. For example, a utilization ratio of 0.5 would mean that the robot was executing tasks for 50% of the time in that month. The business team can use this data to analyze robot utilization and make more informed decisions.
The keyword added to a statement can be used to get records without duplicates.
For example, say you had a table of UiPath customers:
name | city |
---|---|
Akash | SF |
Brittany | NYC |
Carlos | NYC |
Diego | Seattle |
Eva | SF |
Faye | Seattle |
Suppose you wanted to figure out which cities the customers lived in, but didn't want duplicate results, you could write a query like this:
Your result would be:
city |
---|
SF |
NYC |
Seattle |
In UIPath Inc., we record all transactions made by our customers in our 'transactions' database table. Right now, we are trying to analyze our customer's behavior, and we'd like to extract all records made by customers who have either a 'pending' or a 'rejected' transaction. Please write an SQL query that filters the 'transactions' table and returns all transactions which were carried out by customers with at least one 'pending' or 'rejected' status.
The 'transactions' table has the following structure:
transaction_id | customer_id | transaction_date | amount | status |
---|---|---|---|---|
1 | 101 | 2022-07-01 | 1200.00 | completed |
2 | 106 | 2022-07-01 | 500.50 | completed |
3 | 101 | 2022-07-02 | 200.75 | pending |
4 | 102 | 2022-07-02 | 75.25 | completed |
5 | 103 | 2022-07-03 | 120.00 | rejected |
6 | 104 | 2022-07-03 | 300.00 | completed |
7 | 105 | 2022-07-05 | 900.00 | completed |
8 | 101 | 2022-07-05 | 600.00 | completed |
9 | 102 | 2022-07-05 | 150.00 | pending |
This query first selects all the unique customer_id from the 'transactions' table where the transaction status is either 'pending' or 'rejected'. Then it uses these customer_id to select all the transactions made by these customers from the 'transactions' table.
Database views are created to provide customized, read-only versions of your data that you can query just like a regular table. So why even use one if they're just like a regular table?
Views are useful for creating a simplified version of your data for specific users, or for hiding sensitive data from certain users while still allowing them to access other data.
UiPath, being a Robotic Process Automation (RPA) company, has multiple automation processes that run either daily, weekly or monthly. Assume a scenario, we have a log of when each process started and when each process ended. Find out the average time taken to complete each process.
In this scenario, the log data is stored in a table named as shown below:
process_id | process_name | start_time | end_time |
---|---|---|---|
1 | 'Invoice Process' | '2022-07-01 09:00:00' | '2022-07-01 09:11:00' |
1 | 'Invoice Process' | '2022-07-02 09:05:00' | '2022-07-02 09:20:00' |
2 | 'Data Extraction' | '2022-07-15 13:15:00' | '2022-07-15 13:40:00' |
2 | 'Data Extraction' | '2022-07-16 13:30:00' | '2022-07-16 13:50:00' |
3 | 'Report Generation' | '2022-07-25 15:00:00' | '2022-07-25 15:30:00' |
3 | 'Report Generation' | '2022-07-26 15:05:00' | '2022-07-26 15:40:00' |
Here is a PostgreSQL query to solve this problem:
In the above query, I used PostgreSQL’s built-in function and to convert the time interval between and into seconds. Then I divided it by 60 to get the time in minutes.
It groups by both and to calculate the average time taken to complete each process. The output will be the and along with their average time taken to complete.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating total running time or this Stripe Repeated Payments Question which is similar for log processing.
A cross-join, also known as a cartesian join, is a type of join that returns the cross-product of two tables. In a cross-join, each row from the first table is paired with every row from the second table, resulting in a new table that has a row for each possible combination of rows from the two input tables.
For example, say you worked on the Marketing Analytics team at UiPath, and needed to understand what advertising copy (text) goes best with what advertising creative (the photo that gets posted with the advertisement copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
You could this query to generate all possible combinations of ad copy and ad creative to help you create more effective ads for UiPath. Just be careful: if you had 1,000 rows of ad copy and 1,000 different image creatives, the resulting cross-join would have 1 million rows!
UiPath, a company that specializes in robotic process automation (RPA), is interested in understanding the click-through rate (CTR) from viewing a product to adding a product to the cart. In the context of UiPath, let's say a "product" refers to the licenses to use their various automation software.
They define "Click-Through-Rate (CTR)" as the number of users who viewed a product and subsequently added it to their cart, divided by the total number of users who viewed that product.
Given two tables, and , write a SQL query to calculate the CTR for each of their products for the month of August, 2022.
view_id | user_id | view_date | product_id |
---|---|---|---|
101 | 123 | 08/01/2022 00:00:00 | 1 |
102 | 456 | 08/01/2022 00:00:00 | 2 |
103 | 789 | 08/02/2022 00:00:00 | 1 |
104 | 123 | 08/02/2022 00:00:00 | 1 |
105 | 456 | 08/03/2022 00:00:00 | 2 |
cart_id | user_id | add_to_cart_date | product_id |
---|---|---|---|
201 | 123 | 08/01/2022 00:01:00 | 1 |
202 | 456 | 08/01/2022 00:01:00 | 2 |
203 | 789 | 08/02/2022 00:01:00 | 1 |
204 | 123 | 08/03/2022 00:01:00 | 1 |
In this query, we join the and table based on both the and the to ensure that we match the correct view and add-to-cart action. We also ensure that the date of adding to the cart is after or on the same day as the date of viewing the product. Aggregation is done on to calculate the CTR for each product. We use to ensure that each unique user is only counted once.
To practice a similar problem about calculating rates, try this TikTok SQL question within DataLemur's online SQL coding environment:
Given two tables and , where stores the information of each customer with a unique . The table stores all purchased transactions with relevant , , and the which the purchase was made.
Write a SQL query to find out the top 5 purchased products in 2021 by total quantity for each customer.
customer_id | first_name | last_name | date_of_birth |
---|---|---|---|
101 | John | Doe | 01/01/1970 |
102 | Jane | Smith | 02/02/1980 |
103 | Alan | Johnson | 03/03/1990 |
104 | Mary | Brown | 04/04/2000 |
purchase_id | customer_id | date | product_id | quantity |
---|---|---|---|---|
1 | 101 | 03/15/2021 | 001 | 2 |
2 | 101 | 04/20/2021 | 002 | 1 |
3 | 102 | 05/22/2021 | 001 | 3 |
4 | 104 | 06/07/2021 | 003 | 2 |
5 | 103 | 07/12/2021 | 003 | 1 |
6 | 104 | 08/01/2021 | 002 | 2 |
7 | 103 | 09/15/2021 | 001 | 2 |
customer_id | product_id | total_quantity |
---|---|---|
101 | 001 | 2 |
101 | 002 | 1 |
102 | 001 | 3 |
103 | 001 | 2 |
103 | 003 | 1 |
104 | 002 | 2 |
104 | 003 | 2 |
This query selects , , and the total of each product purchased by that customer in 2021 from the table. The result is grouped by and and sorted in descending order of total . The clause restricts the output to the top 5 records.
Because joins come up so often during SQL interviews, try an interactive Spotify JOIN SQL question:
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the UiPath SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the earlier UiPath SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Google, Microsoft and Silicon Valley startups.
Each problem on DataLemur has multiple hints, fully explained answers along with a discussion board to see how others solved it and crucially, there is an interactive SQL code editor so you can right online code up your SQL query and have it executed.
To prep for the UiPath SQL interview you can also be useful to practice interview questions from other tech companies like:
But if your SQL query skills are weak, don't worry about diving straight into solving questions – improve your SQL foundations with this free SQL tutorial.
This tutorial covers SQL concepts such as creating pairs via SELF-JOINs and CASE/WHEN statements – both of these pop up routinely during SQL job interviews at UiPath.
In addition to SQL interview questions, the other question categories to practice for the UiPath Data Science Interview are:
The best way to prepare for UiPath Data Science interviews is by reading Ace the Data Science Interview. The book's got: