At Informatica, SQL is used across the entire company, to support the wide variety of Data Integration & Cloud Data solutions they provide. For this reason Informatica almost always evaluates jobseekers on SQL problems during interviews for Data Analytics, Data Science, and Data Engineering jobs.
As such, to help you practice for the Informatica SQL interview, we've curated 11 Informatica SQL interview questions – able to solve them?
Informatica has a 'reviews' table, where every row indicates a review for a product on a certain day rated by a user. The star rating ranges from 1 to 5, where 1 is the worst and 5 is the best. Please write a SQL query to compute the average rating for each product per month.
When you execute the query, you should return a table with the month of the review, the product id, and the corresponding average star rating for that product for that given month.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
In this solution, based on the dataset, the SQL query extracts the month from the using PostgreSQL's function. This query groups ratings by both the and the , and calculates the average () rating for each product for each month. The function is used with two decimal precision to make the final result more readable. The result is then ordered by and in ascending order to achieve a clear view of each product's monthly average star rating.
To practice another window function question on DataLemur's free online SQL code editor, solve this Amazon BI Engineer interview question:
As an SQL developer at Informatica, you are tasked with analyzing the sales data of different product categories over several years. Your analysis will help the Sales team to understand the product performance and strategize their future operations.
Given the two tables: detailing the sales of different product categories in a specific year and describing the product and its category, generate a SQL query to return the product category along with the maximum sales amount for that category for each year. The returned result should be ordered by the year in an ascending order.
sale_id | product_id | year | sales_amount |
---|---|---|---|
1 | 101 | 2020 | 25000 |
2 | 102 | 2020 | 30000 |
3 | 103 | 2021 | 40000 |
4 | 104 | 2022 | 20000 |
5 | 105 | 2023 | 28000 |
product_id | product_name | category |
---|---|---|
101 | Product1 | Category1 |
102 | Product2 | Category1 |
103 | Product3 | Category2 |
104 | Product4 | Category3 |
105 | Product5 | Category4 |
year | category | max_sales |
---|---|---|
2020 | Category1 | 30000 |
2021 | Category2 | 40000 |
2022 | Category3 | 20000 |
2023 | Category4 | 28000 |
The SQL query joins the and tables using the 'product_id' common between them. It then groups the data by year and category. The 'MAX' function is used to find the maximum sales amount for each category in each year. Finally, the 'ORDER BY' clause sorts the result in ascending order by the year.
The constraint is used to establish a relationship between two tables in a database. This ensures the referential integrity of the data in the database.
For example, if you have a table of Informatica customers and an orders table, the customer_id column in the orders table could be a that references the id column (which is the primary key) in the Informatica customers table.
You are given a table named with columns , , , , , , and . Please write a SQL query to get all the customer records of the customers who registered in 2022 and live in the states 'New York' or 'California', and their lifetime value is more than 1000. Also, they shouldn't have 'test' in their email addresses.
customer_id | first_name | last_name | date_registered | state | lifetime_value | |
---|---|---|---|---|---|---|
1 | John | Doe | 01/15/2022 | New York | john_doe@gmail.com | 2000 |
2 | Jane | Doe | 03/05/2021 | California | jane_doe@gmail.com | 1500 |
3 | Jim | Beam | 05/22/2022 | Colorado | jim_beam_test@gmail.com | 2500 |
4 | Jack | Daniels | 06/12/2022 | California | jack_daniels@gmail.com | 1200 |
5 | Johnny | Walker | 07/03/2022 | New York | johnny_walker@gmail.com | 800 |
This query filters out the customers based on the given conditions. The clause is used to filter the records, and the operator is used to apply multiple conditions. The keyword is used to specify multiple possible values for the column. The operator together with the operator is used to exclude customers who have 'test' in their email addresses.
Both the and window functions are used to access a row at a specific offset from the current row.
However, the function retrieves a value from a row that follows the current row, whereas the function retrieves a value from a row that precedes the current row.
Often, the offset for both functions is 1, which gives access to the immediately following/preceding row. Here's a SQL query example:
For a company like Informatica, a question that would test business-relevant knowledge with AVG could be "What is the average processing time of dataflows over a given time period?"
Informatica works with data integration, data quality, data security, etc. Data engineers often need to work with dataflows, which move data from one end to another. Understanding the processing time (start to finish time) of these dataflows could let data engineers optimize processes and increase performance.
Let's consider a Dataflows table where each dataflow has an ID, Start_Time, and End_Time.
ID | Start_Time | End_Time |
---|---|---|
1 | 2022-09-25 13:00:00 | 2022-09-25 13:30:00 |
2 | 2022-09-25 13:15:00 | 2022-09-25 14:20:00 |
3 | 2022-09-25 14:00:00 | 2022-09-25 15:05:00 |
4 | 2022-09-26 09:00:00 | 2022-09-26 09:45:00 |
5 | 2022-09-26 10:00:00 | 2022-09-26 11:40:00 |
The SQL command to find this average would be:
This query works by taking the and of each dataflow, calculating the time difference, converting it into seconds using (as PostgreSQL does not support direct subtraction between Time fields), and finally using to find the average time.
This will return the average processing time in seconds. You can convert it to minutes or other suitable units as needed.
Database views are created to provide customized, read-only versions of your data that you can query just like a regular table. So why even use one if they're just like a regular table?
Views are useful for creating a simplified version of your data for specific users, or for hiding sensitive data from certain users while still allowing them to access other data.
Informatica has been running several digital marketing campaigns to increase their product visibility and user base. For each ad shown, the company tracks if it was clicked or not. They also keep a record of how many of the users added the product to their shopping carts and finally made a purchase. Given the and tables, write a SQL query that calculates the click-through-rate (CTR) i.e., (total clicks / total impressions) and the conversion rate i.e., (total purchases / total clicks) for each ad campaign.
ad_id | campaign_id | impressions | clicks |
---|---|---|---|
1001 | 1 | 50000 | 1250 |
1002 | 2 | 40000 | 800 |
1003 | 3 | 60000 | 1000 |
1004 | 4 | 35000 | 700 |
1005 | 5 | 70000 | 1800 |
transaction_id | user_id | ad_id | purchased |
---|---|---|---|
1111 | 5623 | 1001 | Yes |
2222 | 1343 | 1002 | No |
3333 | 5317 | 1003 | Yes |
4444 | 6771 | 1004 | Yes |
5555 | 9958 | 1005 | No |
This query joins the and tables on . It calculates the CTR by dividing total clicks by total impressions for each campaign. The conversion rate is calculated by dividing the total purchases (where equals 'Yes') by total clicks. Both percentages are presented as a float for better readability.
To solve a similar SQL interview question on DataLemur's free online SQL code editor, attempt this SQL interview question asked by Facebook:
Given the table which contains reviews submitted by users for different products, calculate the average rating (stars) for each product per month.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
mth | year | product | avg_stars |
---|---|---|---|
6 | 2022 | 50001 | 3.5 |
6 | 2022 | 69852 | 4.0 |
7 | 2022 | 69852 | 2.5 |
To solve this problem, we will use the PostgreSQL function and to get the month and year from . The clause is used to group the result by month and product. And the aggregation function is used to get the average rating for each product per month.
This query returns a table with each month of each year, the product id, and the average number of stars that product received in reviews that month. The output table is ordered by month and year for readability.
The keyword added to a statement can be used to get records without duplicates.
For example, say you had a table of Informatica customers:
name | city |
---|---|
Akash | SF |
Brittany | NYC |
Carlos | NYC |
Diego | Seattle |
Eva | SF |
Faye | Seattle |
Suppose you wanted to figure out which cities the customers lived in, but didn't want duplicate results, you could write a query like this:
Your result would be:
city |
---|
SF |
NYC |
Seattle |
At Informatica, we often need to apply specific filters to our customer records database. Write a SQL query that returns all the customers whose first name starts with 'A'.
Here's a small sample of the table:
customer_id | first_name | last_name | membership_date | |
---|---|---|---|---|
1 | Alex | Johnson | alex.johnson@example.com | 06/08/2019 |
2 | Michael | Smith | michael.smith@example.com | 10/11/2020 |
3 | Amanda | Brown | amanda.brown@example.com | 06/18/2021 |
4 | John | Doe | john.doe@example.com | 07/26/2020 |
5 | Alice | Williams | alice.williams@example.com | 10/05/2018 |
You can use the SQL keyword to accomplish this:
This SQL query will return all the records from the table where the starts with 'A'. The '%' in the LIKE clause is a wildcard that matches any sequence of characters. Therefore, 'A%' matches any string that starts with 'A'.
The best way to prepare for a Informatica SQL interview is to practice, practice, practice. Besides solving the earlier Informatica SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Netflix, Airbnb, and Amazon.
Each SQL question has multiple hints, fully explained answers along with a discussion board to see how others solved it and most importantly, there is an interactive SQL code editor so you can right in the browser run your SQL query answer and have it graded.
To prep for the Informatica SQL interview you can also be wise to solve SQL problems from other tech companies like:
In case your SQL coding skills are weak, don't worry about diving straight into solving questions – refresh your SQL knowledge with this DataLemur SQL Tutorial.
This tutorial covers SQL concepts such as functions like SUM()/COUNT()/AVG() and math functions in SQL – both of these come up frequently during Informatica interviews.
In addition to SQL interview questions, the other question categories to practice for the Informatica Data Science Interview are:
To prepare for Informatica Data Science interviews read the book Ace the Data Science Interview because it's got: