At Amgen, SQL is used to analyze complex biomedical data from clinical trials, allowing researchers to evaluate treatment efficacy and safety across diverse patient populations. It also helps manage the relational databases of patient records, ensuring that all data is organized and compliant with regulations, which is why Amgen asks SQL coding questions during interviews for Data Analyst, Data Science, and BI jobs.
Thus, to help you prep, here's 9 Amgen SQL interview questions – can you solve them?
Given a table containing clinical trial data for various drugs, write a SQL query to calculate, for each drug, the average change in patient health score over all trials where the drug was used. Your output should rank the drugs in descending order of this average change, and include both drug name and average change.
The table has the following schema:
trial_id | patient_id | start_date | end_date | drug_name | start_health_score | end_health_score |
---|---|---|---|---|---|---|
4201 | 1001 | 06/01/2022 | 07/01/2022 | DrugA | 50 | 70 |
6510 | 1065 | 06/10/2022 | 07/10/2022 | DrugB | 65 | 85 |
6372 | 1578 | 06/18/2022 | 07/18/2022 | DrugA | 60 | 70 |
7694 | 1122 | 07/26/2022 | 08/26/2022 | DrugB | 80 | 90 |
9172 | 1581 | 07/05/2022 | 08/05/2022 | DrugC | 70 | 75 |
We define the health score change as .
This PostgreSQL query will group rows by the drug name, then for each group calculate the average change in health score from the start to the end of trials (as defined by the provided formula). It then orders the resulting rows in descending order of this average change.
For more window function practice, try this Uber SQL Interview Question within DataLemur's interactive SQL code editor:
You're given a table of Amgen employee and department salary information. Write a SQL query to find departments where the departments average salary is higher than the company's average salary.
Code your solution to this problem directly within the browser on DataLemur:
The answer is LONG – 30+ lines of SQL. You can find a detailed solution here: Department vs. Company Salary.
A database index is a way to optimize the performance of a database by reducing the amount of data that needs to be searched to retrieve a record.
There are several types of indexes:
Amgen is a major biopharmaceutical company that specializes in a wide range of both patented and generic drugs. They have a complex distribution network that includes multiple warehouses from where drugs are supplied to various pharmacies across the nation. Suppose that the company wants to keep track of its drug stock for the purpose of inventory management. The company wants an SQL script that can determine the quantity of each drug available in each warehouse at the end of each month.
The overall database design includes two tables: and .
warehouse_id | warehouse_location |
---|---|
'W101' | 'New York' |
'W102' | 'Los Angeles' |
'W103' | 'Chicago' |
drug_id | warehouse_id | stock | transaction_date |
---|---|---|---|
'D001' | 'W101' | 500 | '2022-09-30' |
'D002' | 'W102' | 400 | '2022-09-30' |
'D001' | 'W103' | 200 | '2022-09-30' |
'D002' | 'W101' | 150 | '2022-10-31' |
'D001' | 'W102' | 450 | '2022-10-31' |
'D002' | 'W103' | 350 | '2022-10-31' |
'D001' | 'W102' | 650 | '2022-11-30' |
'D002' | 'W103' | 250 | '2022-11-30' |
'D001' | 'W101' | 750 | '2022-11-30' |
'D002' | 'W102' | 550 | '2022-11-30' |
The task is to write an SQL query that would give us the quantity of each drug available in each warehouse at the end of each month.
This PostgreSQL query uses the function to get the first day of each month from the column. It then groups the records by month, , and , and calculates the total stock for each group. The clause ensures that the result is sorted by month, warehouse, and drug. Please note that this solution assumes that every transaction records the final stock of a drug in a warehouse at the end of the month. There will need to be a different formulation if this assumption does not hold.
Stored procedures in SQL are like recipes in a cookbook. Just like a recipe tells you the ingredients and instructions for making a particular dish, a stored procedure tells the DBMS the logic/statements needed to perform a specific task. Just like you can use a recipe to make the same dish over and over again, you can use a stored procedure to repeat the same task multiple times with different input parameters (which is why stored procedures are so damn useful!).
Say you were a Data Analyst working on a HR analytics project. A common sub-task you might have to do is calculate the average salary for a given department at Amgen, which would be perfect for a stored procedure:
To call this stored procedure and find the average salary for the Data Science department you'd execute a query like this:
Amgen is a multinational biopharmaceutical company. Let's assume the company has carried out several digital marketing campaigns for different drugs and wants to evaluate their performance.
Here’s the corrected version with appropriate use of backticks:
Use the two tables and . provides information about which ads have been clicked by users, while tells which ads have been viewed by the users. Our task is to calculate the click-through rate (CTR) for each ad. CTR is calculated as (total clicks on an ad/ total views of an ad)*100.
click_id | user_id | clicked_date | ad_id |
---|---|---|---|
1001 | 345 | 07/10/2022 00:00:00 | 123 |
2002 | 657 | 07/10/2022 00:30:00 | 456 |
3003 | 878 | 07/11/2022 00:00:00 | 123 |
4004 | 897 | 07/11/2022 00:30:00 | 789 |
view_id | user_id | viewed_date | ad_id |
---|---|---|---|
5001 | 345 | 07/10/2022 00:00:00 | 123 |
6002 | 657 | 07/10/2022 00:30:00 | 456 |
8003 | 654 | 07/11/2022 00:00:00 | 123 |
9004 | 897 | 07/11/2022 00:30:00 | 789 |
8005 | 238 | 07/11/2022 01:30:00 | 789 |
This query first calculates the total number of views per ad and total clicks per ad in two separate derived tables and , respectively. Then it performs a on these tables using the . In the clause, we calculate the CTR for each ad. If an ad has no clicks, the function is used to ensure the CTR is zero, instead of .
To practice a related problem on DataLemur's free online SQL code editor, try this Meta SQL interview question:
To discover records in one table that are not present in another, you can utilize a and filter out any values in the right-side table.
For example, say you had a table of Amgen customers and a 2nd table of all purchases made with Amgen. To find all customers who did not make a purchase, you'd use the following
This query returns all rows from the customers table, along with any matching rows from the purchases table. If there is no matching row in the purchases table, values will be returned for all of the right table's columns. The clause then filters out any rows where the purchases.id column is , leaving only customers who have not made a purchase.
Considering that Amgen is a multinational biopharmaceutical company, let's formulate a question related to the average sales of their drugs.
You are given two tables, and . The table has columns (the product's unique identifier), (the name of the drug), and (the category which drug belongs to). The table has columns (the unique identifiers for each sale), (the identifier of the drug sold), (the date of sale), and (the revenue from the sale).
Can you write a SQL query to find the average revenue per month for each drug category in 2021?
Let's use following data as sample:
product_id | product_name | category |
---|---|---|
101 | EPO | Biotech |
102 | Neupogen | Oncology |
103 | Enbrel | Immunology |
104 | Prolia | Bone |
sale_id | product_id | sale_date | revenue |
---|---|---|---|
551 | 101 | 01/05/2021 | 500.00 |
552 | 102 | 01/15/2021 | 300.00 |
553 | 102 | 02/20/2021 | 400.00 |
554 | 103 | 01/22/2021 | 700.00 |
The following SQL query can be used to answer the given question:
This query joins the sales and products tables together using the product_id. After the join is made, the query focuses on sales within the year 2021 (from January 1st to December 31st). The results are grouped by month and drug category, making it possible to calculate the average revenue for each category per month. The results are then ordered by month and category for easier reading and data interpretation.
Amgen is a biopharmaceutical company that develops and distributes pharmaceuticals to customers globally. You have been given a database table named . This table includes Full Name of the customers, along with other details. Your task is to write an SQL query to retrieve the records of all customers whose names start with the letter 'A'.
Below are the sample input and output.
customer_id | full_name | address | contact_number |
---|---|---|---|
1 | Jack Taylor | Beverly Hills, CA | 906-382-2837 |
2 | Anna Smith | Sunnyvale, CA | 217-850-6451 |
3 | Robert Johnson | Cupertino, CA | 785-618-3654 |
4 | Alice Johnson | San Francisco, CA | 973-372-0857 |
5 | Aiden Morales | Sacramento, CA | 859-283-9384 |
customer_id | full_name | address | contact_number |
---|---|---|---|
2 | Anna Smith | Sunnyvale, CA | 217-850-6451 |
4 | Alice Johnson | San Francisco, CA | 973-372-0857 |
5 | Aiden Morales | Sacramento, CA | 859-283-9384 |
Here’s the corrected version with appropriate use of backticks:
This SQL query functions by selecting all records from the table where the starts with the letter 'A'. The keyword is used in the clause to search for a specified pattern in a column. Here, the pattern 'A%' signifies any string that starts with 'A'.
The best way to prepare for a Amgen SQL interview is to practice, practice, practice. Beyond just solving the above Amgen SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Microsoft, Google, and Meta.
Each interview question has hints to guide you, full answers and best of all, there is an interactive coding environment so you can right online code up your SQL query answer and have it checked.
To prep for the Amgen SQL interview you can also be wise to practice SQL problems from other healthcare and pharmaceutical companies like:
In case your SQL skills are weak, forget about jumping right into solving questions – refresh your SQL knowledge with this SQL interview tutorial.
This tutorial covers things like transforming strings with CONCAT()/LOWER()/TRIM() and grouping by multiple columns – both of these come up frequently during Amgen SQL interviews.
In addition to SQL query questions, the other topics to prepare for the Amgen Data Science Interview include:
I think the optimal way to prepare for Amgen Data Science interviews is to read the book Ace the Data Science Interview.
It covers 201 data interview questions taken from Facebook, Google, & Amazon. The book's also got a refresher covering Python, SQL & ML. And finally it's helped thousands of people land their dream job in data, which is why it's earned 1000+ reviews on Amazon and a 4.5-star rating.
While the book is more technical, it's also crucial to prepare for the Amgen behavioral interview. A good place to start is by reading the company's values and company principles.