At Jack Henry, SQL is used often for analyzing transactional data to detect fraud patterns and is used for querying databases to generate customer behavior insights for personalized financial product reccomendations. Unsurprisingly this is why Jack Henry frequently asks SQL questions in interviews for Data Science, Analytics, and & Data Engineering jobs.
To help you ace the Jack Henry SQL interview, we've curated 8 Jack Henry & Associates SQL interview questions – can you solve them?
As a data analyst at Jack Henry, a company that provides technology solutions and payment processing services, one of your tasks involves analyzing transaction data.
Write a SQL query to calculate the total number of transactions () and the average transaction amount () for each customer.
Consider a scenario where these calculations need to be made only for a given time window, say the past 6 months. Assume you have access to a table with the following information:
transaction_id | customer_id | transaction_date | amount |
---|---|---|---|
1 | 1001 | 01/01/2022 | 200 |
2 | 1002 | 01/03/2022 | 500 |
3 | 1001 | 02/01/2022 | 300 |
4 | 1003 | 01/07/2022 | 100 |
5 | 1001 | 04/01/2022 | 400 |
6 | 1002 | 06/01/2022 | 600 |
7 | 1001 | 07/01/2022 | 200 |
8 | 1003 | 08/01/2022 | 100 |
9 | 1001 | 09/01/2022 | 100 |
10 | 1002 | 10/01/2022 | 400 |
Based on the given table, you may write your SQL query using the window function as follows:
This query uses the window function represented by the keyword , and to calculate the total count and average transaction value for each particular . The clause ensures that the data selected is only from the past 6 months. The query will output a table where each row corresponds to a transaction, and columns will include the , the total count of transactions for that customer (), and the average transaction amount for that customer ().
To practice another window function question on DataLemur's free interactive coding environment, try this Amazon SQL question asked in a BI Engineer interview:
For Jack Henry, a financial services company that provides software solutions, let's calculate the Click-Through Conversion rates for their software demonstrations. Customers lands on a product webpage, click to view a demonstration, and then decide to download the software (conversion).
Two tables are provided, and , structured as follows:
view_id | user_id | view_date | product_id |
---|---|---|---|
101 | A123 | 2022/09/01 | Prod001 |
102 | B234 | 2022/09/01 | Prod002 |
103 | A123 | 2022/09/02 | Prod001 |
104 | C345 | 2022/09/02 | Prod002 |
105 | D456 | 2022/09/02 | Prod002 |
download_id | user_id | download_date | product_id |
---|---|---|---|
201 | A123 | 2022/09/02 | Prod001 |
202 | B234 | 2022/09/02 | Prod002 |
203 | D456 | 2022/09/03 | Prod002 |
The question is to calculate the click-through conversion rate, i.e., the percentage of page views that led to a product download, for each product.
The PostgreSQL query to solve this would look like this:
This query first joins the and tables using the user_id and product_id. Then it uses a command on product_id to separate the data by product. Finally, it calculates the conversion rate as the ratio of distinct downloads to distinct page views, converted to percentage for easier interpretation. The use of distinct count ensures that every unique user view and download is considered, this way we avoid overcounting scenarios where a user views a page or downloads a software more than once.
To solve a similar problem on DataLemur's free online SQL coding environment, solve this Meta SQL interview question:
A full outer join returns all rows from both tables, including any unmatched rows, whereas an inner join only returns rows that match the join condition between the two tables.
For a tangible example, suppose you had a table of Jack Henry orders and Jack Henry customers.
Here's a SQL inner join using the orders and customers tables:
This query will return rows from the orders and customers tables that have matching values. Only rows with matching values will be included in the results.
Here is an example of a using the orders and customers tables:
This query will return all rows from both the orders and customers tables, including any rows that do not have matching values. Any rows with null values for either table will be included in the results.
Jack Henry's product team would like to know the average amount for each transaction per product in the previous year. This insight is necessary to understand the revenue contribution of each product.
As the data engineer, your task is to write a SQL query to solve for this. Specifically, the query should return each product, and the average transaction amount for each in the previous year.
Assume you have access to a table with the following schema:
transaction_id | customer_id | purchase_date | product_id | amount |
---|---|---|---|---|
2315 | 543 | 01/08/2021 00:00:00 | 78654 | 5000 |
3562 | 673 | 01/10/2021 00:00:00 | 87623 | 2000 |
5623 | 342 | 01/18/2021 00:00:00 | 78654 | 3000 |
6475 | 109 | 02/26/2021 00:00:00 | 78654 | 4000 |
7813 | 603 | 02/05/2021 00:00:00 | 87623 | 7000 |
The column is the transaction amount in dollars. The column represents the unique identifier for each product. is the date and time the transaction was made.
Below is the solution to the problem.
This query groups the transactions table by , and then calculates the average transaction amount for each group in the previous year (2021). It gives the product team the information they need.
product_id | average_transaction_amount |
---|---|
78654 | 4000 |
87623 | 4500 |
This implies that for the previous year, the average transaction amount for product 78654 was 4,500. This could potentially indicate product 87623 may have more revenue consistency than product 78654, although other factors would need to be considered for a more thorough analysis.
A value represents a missing or unknown value. It is different from a zero or a blank space, which are actual values.
It's important to handle values properly in SQL because they can cause unexpected results if not treated correctly. For example, comparisons involving values always result in . Also values are not included in calculations. For example, will ignore values in the column.
Jack Henry Associates has a large database of customer records. The data contains customer details including first name, last name, address, phone number, etc. There is a requirement to filter customer records where customer's or contains the string "Henry". Create a SQL query to fulfill this requirement. {#Question-6}
customer_id | first_name | last_name | address | phone_number |
---|---|---|---|---|
1234 | John | Doe | 123 Elm St. | (123)456-7890 |
2345 | Jack | Henry | 456 Pine St. | (234)567-8910 |
3456 | Henry | David | 789 Oak St. | (345)678-9123 |
4567 | Jane | Doe | 321 Maple St. | (456)789-0123 |
5678 | William | Wallace | 654 Willow St. | (567)890-1234 |
This query will scan through the table and filter out rows where either or contains the string "Henry". The '%' is a wildcard character in SQL that matches any string, so '%Henry%' would match any string that contains "Henry".
Imagine you've got giant AF jigsaw puzzle with thousands of pieces, and each piece represents a piece of data in your database. You've spent hours organizing the puzzle into neat little piles, with each pile representing a table in your database. This process is called normalization, and it's a great way to make your database efficient, flexible, and easy to maintain.
But what if you want to solve the puzzle faster (aka make your queries run faster?)?
That's where denormalization comes in – Denormalization is like the puzzle-solving equivalent of taking a shortcut!
Instead of putting all the pieces in separate piles, you might decide to clone some of the pieces, and then have that one puzzle piece be put into multiple piles. Clearly, we are breaking the rules of physics, but that's just like de-normalization because it breaks the normal rules of normalization (1st, 2nd, 3rd normal forms).
By adding redundant puzzle pieces, it can be easier to find the pieces you need, but it also means that you have to be extra careful when you're moving pieces around or adding new ones (aka INSERT/UPDATE commands become more complex).
On the plus side, denormalization can improve the performance of your database and make it easier to use. On the downside, it can make your database more prone to errors and inconsistencies, and it can be harder to update and maintain. In short, denormalization can be a helpful tool, but it's important to use it wisely!
Jack Henry, a software company, has multiple branch locations around the world and they would like to know the Euclidean distance between these branches. Assume that there are 2 branches in question at each calculation: branch A and branch B. The coordinates for each branch are recorded in a database table.
In this question, you're asked to calculate the Euclidean distance between branch A and branch B.
branch_id | branch_name | x_coordinate | y_coordinate |
---|---|---|---|
A | Dallas | 7.8 | 3.5 |
B | San Francisco | 4.2 | 8.9 |
Assume Euclidean distance is calculated by the following formula sqrt((xA - xB)² + (yA - yB)²)
branch_A | branch_B | euclidean_distance |
---|---|---|
Dallas | San Francisco | 6.8 |
The CROSS JOIN generates a cartesian product of rows from the tables involved in the join. The result is a table that's the multiplication of the number of rows in each table. This gives us all combinations of branches. We then eliminate rows where the branches are the same by using WHERE A.branch_id != B.branch_id.
Finally we calculate the Euclidean distance using the provided mathematical formula incorporating POWER() and SQRT() functions. The result is rounded to one decimal place.
To practice a very similar question try this interactive Google Odd and Even Measurements Question which is similar for calculating based on given data or this Amazon Server Utilization Time Question which is similar for calculating a total value from given parameters.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Besides solving the earlier Jack Henry SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like FAANG (FB, Apple, Amazon, Netflix, Google).
Each problem on DataLemur has hints to guide you, full answers and crucially, there is an online SQL coding environment so you can easily right in the browser your SQL query answer and have it graded.
To prep for the Jack Henry SQL interview you can also be helpful to practice SQL problems from other tech companies like:
In case your SQL skills are weak, forget about jumping right into solving questions – refresh your SQL knowledge with this interactive SQL tutorial.
This tutorial covers SQL topics like rank window functions and AND/OR/NOT – both of these show up routinely in Jack Henry SQL assessments.
In addition to SQL interview questions, the other types of problems tested in the Jack Henry Data Science Interview are:
To prepare for Jack Henry Data Science interviews read the book Ace the Data Science Interview because it's got: