At IQVIA, SQL is used day-to-day for analyzing real-world health data, and for managing MDM processes for ensuring data quality and consistency. That's why IQVIA frequently asks SQL coding questions in interviews for Data Analyst, Data Science, and BI jobs.
As such, to help you practice for the IQVIA SQL interview, this blog covers 8 IQVIA SQL interview questions – how many can you solve?
IQVIA is a company that operates at the intersection of healthcare and analytics. For this company, a 'VIP Customer' could be defined as those clients who frequently purchase most or all of their healthcare-related services. Your task is to write a SQL query to identify these power users. Assume the database has two tables: and .
The table logs all purchases made by clients, including the client_id, service_id, and the date of purchase. The table lists the services provided by the company, including their service_id, service_name and cost.
purchase_id | client_id | service_id | purchase_date |
---|---|---|---|
1001 | 20 | 780 | 10/01/2022 |
1002 | 25 | 675 | 10/03/2022 |
1003 | 20 | 780 | 10/21/2022 |
1004 | 30 | 675 | 11/01/2022 |
1005 | 20 | 675 | 11/03/2022 |
service_id | service_name | cost |
---|---|---|
675 | Analytics Audit | 5000 |
780 | Healthcare Consulting | 8000 |
Your output should contain the client_id, the total number of purchases made by the client, and total cost of those purchases.
client_id | total_purchases | total_cost |
---|---|---|
20 | 3 | 21000 |
25 | 1 | 5000 |
30 | 1 | 5000 |
This query first joins the purchases and services tables on the service_id field. Then it groups the results by client_id, and for each client, it calculates the total number of purchases (using COUNT(*)) and the total cost of those purchases (using SUM(s.cost)). The results are ordered by total_cost and total_purchases in descending order, so the clients who have made the most purchases and spent the most are at the top.
To solve a related super-user data analysis question on DataLemur's free interactive coding environment, try this recently asked Microsoft SQL interview question:
Consider you're given two tables at IQVIA, a global provider of information, innovative technology solutions, and contract research services focused on using data and science to help healthcare clients find better solutions for their patients. The first table contains drugs and their details and the second table contains sales data for the drugs.
drug_id | drug_name | drug_type |
---|---|---|
1 | DrugA | Type1 |
2 | DrugB | Type2 |
3 | DrugC | Type3 |
4 | DrugD | Type1 |
5 | DrugE | Type2 |
sale_id | drug_id | sale_date | units_sold |
---|---|---|---|
101 | 1 | 06/08/2022 00:00:00 | 100 |
102 | 1 | 06/10/2022 00:00:00 | 150 |
103 | 2 | 06/18/2022 00:00:00 | 200 |
104 | 2 | 07/26/2022 00:00:00 | 250 |
105 | 3 | 07/05/2022 00:00:00 | 150 |
The question is: Can you write a SQL query to find out monthly sales and cumulative sales over time for each drug?
The result needs to contain following columns: drug_name, sale_month, monthly_sales (i.e., total units sold in the month), cumulative_sales (i.e., total units sold until that month) grouped by drug_name and sale_month and ordered by drug_name and sale_month.
The function can be used to extract the month from sale_date in PostgreSQL. can be used to calculate the cumulative sales.
The first part of the query calculates monthly sales for each drug. The second part of the query calculates cumulative sales for each drug. The final SELECT statement joins these two parts together to give the final result.
For more window function practice, solve this Uber SQL Interview Question on DataLemur's interactive SQL code editor:
A is a field in a table that references the of another table. It creates a link between the two tables and ensures that the data in the field is valid.
Say for example you had sales analytics data from IQVIA's CRM (customer-relationship management) tool.
In this example, the table has a foreign key field called that references the "account_id" field in the table (the primary key). This helps to link the data about each opportunity to the corresponding account information in the accounts table.
This makes sure the insertion of rows in the table that do not have corresponding entries in the table. It also helps to enforce the relationship between the two tables and ensures that data is not deleted from the table if there are still references to it in the table.
In your role at IQVIA, a company that provides technology solutions and contract research services to the life sciences industry, you might be asked to find the average duration of clinical trials.
The data you have available includes information on each clinical trial, such as the trial id, the disease area it is targeting, the drug being tested, and the start and end dates.
trial_id | disease_area | start_date | end_date | drug_being_tested |
---|---|---|---|---|
1255 | Oncology | 01/01/2021 00:00:00 | 07/10/2021 00:00:00 | Drug A |
2842 | Cardiology | 04/15/2021 00:00:00 | 10/25/2021 00:00:00 | Drug B |
5947 | Neurology | 06/20/2021 00:00:00 | 01/29/2022 00:00:00 | Drug C |
1729 | Oncology | 10/13/2021 00:00:00 | 04/22/2022 00:00:00 | Drug D |
8482 | Immunology | 12/01/2021 00:00:00 | 06/15/2022 00:00:00 | Drug E |
disease_area | avg_trial_duration_days |
---|---|
Oncology | 259.5 |
Cardiology | 193 |
Neurology | 223 |
Immunology | 196 |
This PostgreSQL query calculates the average duration of clinical trials for each disease area. The 'EXTRACT(DAY FROM (end_date - start_date))' function is used to calculate the duration of each clinical trial in days. These durations are then averaged for each disease area using the AVG() function. The data is grouped by the disease_area column so that the average is calculated separately for trials related to each disease area. The results could be used to understand the time efficiency of trials in different disease areas, which could affect planning and resource allocation for future trials.
To practice a very similar question try this interactive Facebook Average Post Hiatus (Part 1) Question which is similar for calculating duration between events or this Amazon Server Utilization Time Question which is similar for calculating duration of usage.
A cross join is a type of JOIN operation in SQL that creates a new table by combining each row from the first table with every row from the second table. It is also known as a cartesian join.
For example, say you worked on the Marketing Analytics team at IQVIA, and needed to understand what advertising copy (text) goes best with what advertising creative (the photo that gets posted with the advertisement copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
A natural join, on the other hand, is a type of JOIN that combines rows from two or more tables based on their common columns. It is called a "natural" join because it is based on the natural relationship that exists between the common columns in the tables being joined.
For an example of each one, say you had sales data exported from IQVIA's Salesforce CRM stored in a datawarehouse which had two tables: and .
An (which is a type of natural join) combines the two tables on the common
This query will return rows from the sales and tables that have matching customer id values. Only rows with matching values will be included in the results.
One main difference between cross joins and natural joins is that cross joins do not require any common columns between the tables being joined, while natural joins do. Another difference is that cross joins can create very large tables if the input tables have a large number of rows, while natural joins will only return a table with the number of rows equal to the number of matching rows in the input tables.
As part of the sales team at IQVIA, you are tasked with identifying the average purchase amount for customers who bought "Pharmaceutical Products" and "Hospital & Healthcare Products". You are provided with two tables, and .
The table has details of all sales to a customer including the , , and for each purchase.
The table includes , and .
customer_id | product_id | purchase_amount |
---|---|---|
1 | 200 | 3500 |
1 | 300 | 2500 |
2 | 200 | 4000 |
2 | 100 | 1500 |
3 | 300 | 2000 |
4 | 100 | 1000 |
5 | 200 | 4000 |
product_id | product_type | product_name |
---|---|---|
100 | "Hospital and Healthcare Products" | "Healthcare Equipment" |
200 | "Pharmaceutical Products" | "Analgesics" |
300 | "Pharmaceutical Products" | "Antipyretics" |
You need to write a SQL query to find the average purchase amount for each type of product ("Pharmaceutical Products", "Hospital & Healthcare Products") bought by each customer.
customer_id | product_type | avg_purchase_amount |
---|---|---|
1 | "Pharmaceutical Products" | 3000 |
2 | "Pharmaceutical Products" | 4000 |
2 | "Hospital and Healthcare Products" | 1500 |
3 | "Pharmaceutical Products" | 2000 |
4 | "Hospital and Healthcare Products" | 1000 |
5 | "Pharmaceutical Products" | 4000 |
This SQL query first joins the and tables on their shared column, . It only considers rows where the is either Pharmaceutical Products or Hospital & Healthcare Products. Finally, it groups by and , calculating the average for each group.
Because joins come up so often during SQL interviews, try this interactive Snapchat JOIN SQL interview question:
The SQL command merges the results of multiple statements and keeps only those rows that are present in all sets.
For example, say you were doing an HR Analytics project for IQVIA, and had access to IQVIA's employees and contractors data. Assume that some employees were previously contractors, and vice versa, and thus would show up in both tables. You could use operator to find all contractors who also show up in the employees table:
IQVIA helps healthcare and pharmaceutical companies understand market trends. One such use case could involve maintaining the overall sale ratio of a specific medication in each region and the average sale price per unit of that medication by month. In addition to this, the company might also want to know the square root of the total income received from each region during each month.
Consider the following table where is the ID of the region where the sale is made, is the ID of the product sold, is the number of units sold, is the price sold per unit, and is the date when the sale was made:
sale_id | region_id | product_id | quantity | price_per_unit | sale_date |
---|---|---|---|---|---|
105 | 1 | 9823 | 7 | 19.5 | 2021-08-01 |
204 | 1 | 6985 | 2 | 38.0 | 2021-08-02 |
221 | 2 | 6985 | 10 | 32.0 | 2021-08-02 |
354 | 1 | 9823 | 4 | 20.0 | 2021-08-02 |
408 | 2 | 9823 | 5 | 18.5 | 2021-08-03 |
Given that, your task is to write a PostgreSQL query to calculate the overall sale ratio for each product in each region, average sale price per unit of that product and the square root of the total income (product of quantity and price per unit) from each region for each month.
The query above works by first isolating the year and month from the sale_date with . The sale ratio is then computed by dividing the quantity sold for each product by the total quantity of products sold in each region each month (calculated using ). This is followed by calculating average sale price per unit using . This helps in establishing the mean value per unit sold. Lastly, the function is used for calculating the square root of the total income generated from a region, which is the product of quantity and price_per_unit. is used to keep the results looking clean and tidy. All results are then grouped by sale_month, region_id, and product_id in the clause.
To practice a very similar question try this interactive CVS Health Pharmacy Analytics (Part 1) Question which is similar for calculating profitability of sold items or this Amazon Average Review Ratings Question which is similar for calculating averages and grouping by month.
The best way to prepare for a IQVIA SQL interview is to practice, practice, practice. In addition to solving the above IQVIA SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like FAANG tech companies and tech startups.
Each interview question has multiple hints, detailed solutions and crucially, there's an online SQL coding environment so you can right online code up your SQL query and have it executed.
To prep for the IQVIA SQL interview you can also be wise to solve interview questions from other tech companies like:
In case your SQL query skills are weak, forget about diving straight into solving questions – improve your SQL foundations with this interactive SQL tutorial.
This tutorial covers SQL concepts such as WHERE vs. HAVING and joins – both of these show up frequently in SQL job interviews at IQVIA.
Beyond writing SQL queries, the other topics to prepare for the IQVIA Data Science Interview are:
The best way to prepare for IQVIA Data Science interviews is by reading Ace the Data Science Interview. The book's got: