Eli Lilly employees rely on SQL queries to analyze and interpret complex pharmaceutical datasets, helping them track the effectiveness of new drugs and monitor patient outcomes. They also use SQL to manage the relational databases that store vital clinical trial data, ensuring that researchers have access to accurate and up-to-date information. this is the reason why Eli Lilly asks SQL questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.
So, to help you prepare, we've collected 10 Eli Lilly SQL interview questions – can you solve them?
Eli Lilly is a global pharmaceutical company. Let's say they want to evaluate the sales performance of their products over time across different regions. You are given the following tables: and . The table logs every sale, including the product id, the sale date and the region of sale. The table has information about each product, including the product id and name.
sale_id | sale_date | product_id | region | units_sold |
---|---|---|---|---|
1 | 2022-01-05 00:00:00 | 101 | North America | 20 |
2 | 2022-01-21 00:00:00 | 102 | Europe | 15 |
3 | 2022-02-10 00:00:00 | 101 | Asia | 30 |
4 | 2022-02-15 00:00:00 | 103 | North America | 12 |
5 | 2022-03-01 00:00:00 | 102 | Europe | 25 |
product_id | product_name |
---|---|
101 | Drug A |
102 | Drug B |
103 | Drug C |
Create a PostgreSQL query to find the total units sold for each product for every month, alongside the average monthly units sold for each product across all regions. Order the result by product_name and the month of sale_date.
product | month | total_units | avg_units |
---|---|---|---|
Drug A | 1 | 20 | 17.5 |
Drug A | 2 | 30 | 17.5 |
Drug B | 1 | 15 | 20 |
Drug B | 3 | 25 | 20 |
Drug C | 2 | 12 | 12 |
This PostgreSQL query works by joining the and tables on the .
It then groups the results by name and the month of .
The function is used to calculate the total units sold, and the function, combined with the clause (a window function), is used to calculate the average monthly units sold for each product across all regions. The result is ordered by and the month of .
To practice a similar window function interview problem which uses RANK() on DataLemur's free online SQL code editor, try this Amazon SQL question asked in a BI Engineer interview:
Imagine there was a table of Eli Lilly employee salary data. Write a SQL query to find the 2nd highest salary among all employees.
employee_id | salary |
---|---|
1 | 2500 |
2 | 800 |
3 | 1000 |
4 | 1200 |
second_highest_salary |
---|
1200 |
Code your solution to this interview question and run your code right in the browser:
You can find a step-by-step solution here: 2nd Highest Salary.
A full outer join returns all rows from both tables, including any unmatched rows, whereas an inner join only returns rows that match the join condition between the two tables.
For an example of each one, say you had sales data exported from Eli Lilly's Salesforce CRM stored in a data warehouse which had two tables: and .
: retrieves rows from both tables where there is a match in the shared key or keys.
This query will return rows from the sales and tables that have matching customer id values. Only rows with matching values will be included in the results.
: retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
Here is an example of a SQL full outer join using the sales and tables:
Eli Lilly is a pharmaceutical company that often conducts clinical trials for their new drugs. The following tables describe a simplified view of the trial data. A can participate in multiple and a can be part of multiple trials also. However, a entry is unique and represents a particular drug in a specific trial. After trials, patients provide based on their experience.
Design tables using the following information and write a SQL query to find the average rating for each drug across all trials.
patient_id | name | age | gender |
---|---|---|---|
1 | John Doe | 55 | M |
2 | Jane Smith | 30 | F |
3 | Mary Johnson | 45 | F |
drug_id | name |
---|---|
1 | Drug A |
2 | Drug B |
3 | Drug C |
trial_id | start_date | end_date |
---|---|---|
1 | 01/01/2020 | 01/31/2020 |
2 | 02/01/2020 | 02/29/2020 |
3 | 03/01/2020 | 03/31/2020 |
trial_id | drug_id | dosage |
---|---|---|
1 | 1 | 50 |
1 | 2 | 60 |
2 | 2 | 60 |
3 | 1 | 70 |
3 | 2 | 80 |
3 | 3 | 90 |
patient_id | trial_drug_id | rating |
---|---|---|
1 | 1 | 4 |
1 | 2 | 3 |
2 | 2 | 5 |
2 | 3 | 4 |
3 | 4 | 4 |
3 | 5 | 2 |
This query first joins the table with the table based on the . It then joins the resulted table with the table based on the . Finally, it calculates the average rating for each drug by grouping the result based on .
A DBMS (database management system), in order to ensure transactions are relaible and don't ruin the integrity of the data, tries to mantain the following ACID properties: Atomicity, Consistency, Isolation, and Durability.
To make this concept more concrete, here is what each of the ACID properties would mean in the context of banking transactions:
Eli Lilly often is required to filter on its patient database for different studies. The database includes columns like , , , , , and .
Given below is the Patients' database.
patient_id | age | gender | stage_of_illness | medication | region |
---|---|---|---|---|---|
1001 | 67 | Male | 4 | Medication A | East |
1002 | 58 | Female | 3 | Medication B | West |
1003 | 70 | Male | 2 | Medication C | North |
1004 | 62 | Female | 3 | Medication D | East |
1005 | 55 | Male | 1 | Medication A | South |
Write a PostgreSQL query to filter the Patients' database which should fetch the records of patients who are older than 60 years and are in the 3rd or 4th stage of the illness and have been prescribed with either Medication A or B. Also, the selected patients should not be from the East region.
patient_id | age | gender | stage_of_illness | medication | region |
---|---|---|---|---|---|
1001 | 67 | Male | 4 | Medication A | South |
1002 | 58 | Female | 3 | Medication B | West |
In PostgreSQL, you can use a combination of , and clauses to filter the records. Here's the SQL query that meets the defined conditions:
This command will first select all columns in the table (). It will then filter the database () such that it only includes patients of age over 60 (), in stages 3 or 4 of their illness (), prescribed either with or (), and they are not from the region ().
Please note that the output table is fictional as none of the patients in the given table meets all these conditions.
In SQL, zero's are numerical values which can be used in calculations and comparisons just like any other number. A blank space, also known as an empty string, is a character value and can be used in character manipulation functions and comparisons.
NULLs aren't the same as zero's or blank spaces. NULLs represent unkonwn, missing, or not applicable values. They are not included in calculations and comparisons involving NULL values always result in NULL.
Eli Lilly is a global pharmaceutical company. For the business, it might be beneficial to monitor the average purchase price of their medications. For this scenario, assume Eli Lilly has a table where each row represents a transaction for a specific medication.
Please write a PostgreSQL query to find the average purchase price of each medication on a monthly basis.
Here's the sample data:
sale_id | medication_id | purchase_date | price |
---|---|---|---|
3171 | 123 | 02/20/2022 00:00:00 | 50 |
7802 | 123 | 02/21/2022 00:00:00 | 55 |
5293 | 123 | 02/24/2022 00:00:00 | 52 |
3962 | 265 | 03/01/2022 00:00:00 | 100 |
4517 | 265 | 03/05/2022 00:00:00 | 105 |
month | medication | avg_price |
---|---|---|
2 | 123 | 52.33 |
3 | 265 | 102.50 |
This query uses the function to calculate the average price for each medication month by month. The function is used to get the month from the . The clause groups the data by month and medication to calculate the average price for each medication in each month. The clause is used to sort the result by month and medication.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for requiring a monthly grouping and calculation on transaction data or this CVS Health Pharmacy Analytics (Part 1) Question which is similar for calculating profitability of products on a per-item basis.
Eli Lilly is a company in the pharmaceutical industry. As part of their business strategy, they want to better understand their sales pattern. They want to know the average sales volume for each of their products on a monthly basis.
sales_id | product_id | sales_date | quantity |
---|---|---|---|
1 | 101 | 2022-04-05 | 120 |
2 | 102 | 2022-04-10 | 200 |
3 | 101 | 2022-04-20 | 150 |
4 | 103 | 2022-04-28 | 300 |
5 | 101 | 2022-05-05 | 100 |
6 | 102 | 2022-05-15 | 150 |
7 | 103 | 2022-05-25 | 200 |
8 | 101 | 2022-06-05 | 50 |
9 | 102 | 2022-06-15 | 180 |
10 | 103 | 2022-06-25 | 240 |
month | product_id | avg_quantity |
---|---|---|
4 | 101 | 135.0 |
4 | 102 | 200.0 |
4 | 103 | 300.0 |
5 | 101 | 100.0 |
5 | 102 | 150.0 |
5 | 103 | 200.0 |
6 | 101 | 50.0 |
6 | 102 | 180.0 |
6 | 103 | 240.0 |
This SQL statement extracts the month part from the column and for each month and , it calculates the average quantity sold (using aggregate function). The results are ordered based on the month and . This helps Eli Lilly to understand how product sales fluctuate over the months.
To discover records in one table that are not present in another, you can utilize a and filter out any values in the right-side table.
For example, say you had a table of Eli Lilly customers and a 2nd table of all purchases made with Eli Lilly. To find all customers who did not make a purchase, you'd use the following
This query returns all rows from the customers table, along with any matching rows from the purchases table. If there is no matching row in the purchases table, values will be returned for all of the right table's columns. The clause then filters out any rows where the purchases.id column is , leaving only customers who have not made a purchase.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Eli Lilly SQL interview is to solve as many practice SQL interview questions as you can! In addition to solving the above Eli Lilly SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Microsoft, Google, and Meta.
Each exercise has multiple hints, fully explained answers along with a discussion board to see how others solved it and most importantly, there's an online SQL coding environment so you can right online code up your SQL query and have it graded.
To prep for the Eli Lilly SQL interview you can also be helpful to practice SQL problems from other healthcare and pharmaceutical companies like:
In case your SQL coding skills are weak, don't worry about going right into solving questions – strengthen your SQL foundations with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers SQL topics like RANK vs. DENSE RANK and UNION – both of which show up routinely during SQL job interviews at Eli Lilly.
In addition to SQL interview questions, the other types of questions to practice for the Eli Lilly Data Science Interview are:
To prepare for Eli Lilly Data Science interviews read the book Ace the Data Science Interview because it's got:
Don't ignore the behavioral interview – prepare for it using this Behavioral Interview Guide for Data Scientists.