At Appian, SQL is used frequently for analyzing customer usage patterns and to extract insights from Appian's business process management data. That's why Appian frequently asks SQL query questions in interviews for Data Science and Data Engineering positions.
To help you study for the Appian SQL interview, we've curated 9 Appian SQL interview questions – can you answer each one?
In Appian, a software company that specializes in low-code development platforms, a power user may be defined as someone who not only often uses the software but also contributes significantly, either through frequent bug reports or feature usage or creating reusable components like interfaces, processes etc. Design a SQL query to identify such users from two tables: one contains user activity data, and the other contains the users' contribution data.
We can consider the columns:
activity_id | user_id | activity_date | activity_type | activity_count |
---|---|---|---|---|
101 | 256 | 06/08/2022 09:23:00 | Bug Report | 1 |
102 | 378 | 06/10/2022 10:20:00 | Feature Usage | 7 |
103 | 256 | 07/18/2022 10:15:00 | Bug Report | 1 |
104 | 310 | 07/26/2022 13:45:00 | Bug Report | 1 |
105 | 378 | 08/05/2022 14:30:00 | Feature Usage | 6 |
contribution_id | user_id | contribution_date | contribution_type | contribution_count |
---|---|---|---|---|
1001 | 378 | 06/08/2022 13:23:00 | Interface | 1 |
1002 | 256 | 07/10/2022 14:50:00 | Process | 1 |
1003 | 378 | 08/18/2022 15:35:00 | Interface | 1 |
Write a SQL query to find power users. A power user is defined as a user who has an more than 5 in the table and has created any reusable component in table in the last month.
The PostgreSQL query would look something like this:
In the above PostgreSQL query, we first create a CTE (Common Table Expressions) called which contains user_id's from table having more than 5. In the main query, we JOIN this output with table and select those users who have created reusable components in last month. This will provide us the list of all 'Power Users'.
To solve a similar VIP customer analysis question on DataLemur's free online SQL code editor, try this Microsoft SQL Interview problem:
You are given a table consisting of user reviews for different products in the appian database. Each row represents a user review, with columns for (the unique identifier of the review), (the unique identifier of the user), (the date and time the review was submitted), (the unique identifier of the product), and (the rating given by the user out of 5).
Your task is to write a SQL query to calculate the average rating () for each product for each month. The result should be ordered by the month and then the product_id, both in ascending order.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 00:00:00 | 50001 | 4 |
7802 | 265 | 2022-06-10 00:00:00 | 69852 | 4 |
5293 | 362 | 2022-06-18 00:00:00 | 50001 | 3 |
6352 | 192 | 2022-07-26 00:00:00 | 69852 | 3 |
4517 | 981 | 2022-07-05 00:00:00 | 69852 | 2 |
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
This query groups the reviews by the month of the and the , and calculates the average rating for each group. Note that we use the EXTRACT function to obtain the month from the . The result is a decimal number rounded to 2 decimal places for more readable results. The ORDER BY clause subsequently orders the results by the month and product_id, both in ascending order.
To solve a related window function SQL problem on DataLemur's free online SQL coding environment, try this Amazon SQL Interview Question:
A full outer join returns all rows from both tables, including any unmatched rows, whereas an inner join only returns rows that match the join condition between the two tables.
For a tangible example, suppose you had a table of Appian orders and Appian customers.
Here's a SQL inner join using the orders and customers tables:
This query will return rows from the orders and customers tables that have matching values. Only rows with matching values will be included in the results.
Here is an example of a using the orders and customers tables:
This query will return all rows from both the orders and customers tables, including any rows that do not have matching values. Any rows with null values for either table will be included in the results.
You are given the customer records of Appian users. Write down a SQL query that filters out customers who have an active subscription (status='Active') and have used the product in the last month (last_login > CURRENT_DATE - INTERVAL '1 month').
customer_id | last_login | subscription_status |
---|---|---|
100 | 2022-10-06 00:00:00 | Active |
101 | 2022-08-12 00:00:00 | Active |
102 | 2022-09-30 00:00:00 | Inactive |
103 | 2022-10-10 00:00:00 | Active |
subscription_id | customer_id | status |
---|---|---|
500 | 100 | Active |
501 | 101 | Inactive |
502 | 102 | Active |
503 | 103 | Active |
customer_id | last_login | status |
---|---|---|
100 | 2022-10-06 00:00:00 | Active |
103 | 2022-10-10 00:00:00 | Active |
Filtering the data with above SQL query will return the customers who have an active subscription and have used the product in the last month. The JOIN keyword is used to combine rows from both the customers and subscriptions tables based on the common attribute- customer_id; WHERE clause filters out the required customers; AND operator allows to use multiple conditions, all of which must be met.
The primary key of a table is a column or combination of columns that serves to uniquely identify each row in the table. To define a primary key in a SQL database, you can use the constraint.
For instance, consider a table of :
In this example, the column is the primary key of the Appian employees table.
Primary keys are important in databases for several reasons:
As a data analyst at Appian, you are asked to evaluate the performance of various processes running in the company's application. Appian, as a Business Process Management and Low-code Automation software company, has numerous workflows or processes that are created and executed by their customers for business functionalities. You have been given a table that records the start and end time of each instance of a process. Your task is to calculate the average runtime of each process. The runtime of a process is calculated by finding the difference between the end time and start time.
process_id | start_time | end_time |
---|---|---|
p1 | 01/01/2022 08:00:00 | 01/01/2022 08:30:00 |
p1 | 01/01/2022 09:00:00 | 01/01/2022 09:15:00 |
p1 | 01/01/2022 10:00:00 | 01/01/2022 10:30:00 |
p2 | 01/01/2022 08:00:00 | 01/01/2022 08:45:00 |
p2 | 01/01/2022 09:00:00 | 01/01/2022 09:20:00 |
process_id | avg_runtime_minutes |
---|---|
p1 | 25.0 |
p2 | 32.5 |
Here, the SQL query first calculates the difference between the end time and start time of each process instance. To convert this into minutes we use the functionality. The average function is then used on these runtimes to calculate the average time taken for each different process denoted by . The resulting table includes process IDs along with their corresponding average runtime in minutes.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for dealing with time-based calculations or this Twitter Tweets' Rolling Averages Question which is similar for calculating averages over a time period.
{#Question-7}
In SQL, a value is NOT the same as a zero or a blank space. A NULL value represents the absence of a value or a missing value, whereas both zero and blank space are actually values.
To handle nulls, the function is often used to check for a null, and if there is a null, you can assign a default value.
As an Appian business analyst, you're tasked with assessing the performance of the company's various software products based on customer reviews. For each product, find out the average rating for each month. We have a 'reviews' table containing the following columns: 'review_id' (primary key), 'user_id', 'submit_date' (in MM/DD/YYYY format ), 'product_id', 'stars' (number of stars for product review given by a user).
Provide the result as a table with product_id, month, and average stars.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
1001 | 123 | 06/08/2022 00:00:00 | 10001 | 4 |
1002 | 265 | 06/10/2022 00:00:00 | 10002 | 4 |
1003 | 362 | 06/18/2022 00:00:00 | 10001 | 3 |
1004 | 192 | 07/26/2022 00:00:00 | 10002 | 5 |
1005 | 981 | 07/05/2022 00:00:00 | 10001 | 2 |
month | product_id | avg_stars |
---|---|---|
6 | 10001 | 3.5 |
6 | 10002 | 4.0 |
7 | 10001 | 2.0 |
7 | 10002 | 5.0 |
Here is a PostgreSQL query to solve this question:
This query first extracts the month from the 'submit_date' column. It then groups the table by 'month' and 'product_id' and calculates the average value of 'star' for all reviews per product per month. The result is ordered by month and then by product_id.
As a data analyst in Appian, your job is to analyze the trend and sales pattern of the sold products. Given two tables - and , you are required to write a SQL query that calculates the monthly sales of each product and then calculates the month over month growth in percentage (rounded to 2 decimal places). You also need to find the absolute difference between the maximum and minimum sales of each product across all months (use MOD() if required). Lastly, find the square root of the total sales of each product, round it to the closest integer (use SQRT() and ROUND()).
sale_id | product_id | sale_date | quantity |
---|---|---|---|
1001 | 25 | 06/01/2022 | 10 |
1002 | 30 | 06/05/2022 | 20 |
1003 | 25 | 07/01/2022 | 30 |
1004 | 25 | 07/20/2022 | 40 |
1005 | 30 | 07/25/2022 | 25 |
product_id | product_name |
---|---|
25 | Appian Essentials |
30 | Appian Advanced |
This query calculates the monthly sales of each product, then finds the month over month growth, uses MOD() to find the absolute difference in sales of each product across all months, and finds the square root of the total sales of each product. We use Window functions, CTEs, and aggregate functions to handle the complex calculations and data grouping required in this question.
To practice a very similar question try this interactive Wayfair Y-on-Y Growth Rate Question which is similar for requiring trend analysis using SQL functions or this Amazon Average Review Ratings Question which is similar for requiring monthly grouping and average calculations.
The best way to prepare for a Appian SQL interview is to practice, practice, practice. In addition to solving the above Appian SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Microsoft, Google, and Facebook.
Each DataLemur SQL question has multiple hints, detailed solutions and best of all, there's an online SQL code editor so you can right in the browser run your SQL query and have it checked.
To prep for the Appian SQL interview you can also be useful to solve interview questions from other tech companies like:
However, if your SQL query skills are weak, forget about diving straight into solving questions – improve your SQL foundations with this interactive SQL tutorial.
This tutorial covers SQL topics like CTE vs. Subquery and WHERE vs. HAVING – both of which pop up routinely in SQL job interviews at Appian.
In addition to SQL query questions, the other topics to prepare for the Appian Data Science Interview are:
To prepare for Appian Data Science interviews read the book Ace the Data Science Interview because it's got: