Data Scientists, Analysts, and Data Engineers at Zoetis write SQL to analyze pet health databases to track trends in animal wellness and treatment outcomes. They also use SQL to optimize supply chain data for animal pharmaceuticals, ensuring that products are delivered efficiently and meet the needs of veterinarians and pet owners, this is the reason why Zoetis asks jobseekers SQL interview problems.
So, to help you prep for the Zoetis SQL interview, we've curated 10 Zoetis SQL interview questions in this blog.
Zoetis is a major global animal health company that supplies a diverse array of veterinary medicines and vaccines. They would like to identify their top purchasing customers, the ones who have made the most purchases over the past year. These are their "VIP" customers and understanding who they are is crucial for the business strategy.
To make it more realistic, we have:
Let's see a sample transactions table:
transaction_id | customer_id | purchase_date | product_id | amount_spent |
---|---|---|---|---|
3501 | 789 | 2021-09-23 | 3002 | $150 |
2312 | 456 | 2021-08-14 | 2006 | $200 |
5634 | 123 | 2021-07-19 | 3002 | $250 |
4435 | 456 | 2021-06-22 | 1001 | $350 |
8896 | 789 | 2021-08-09 | 3002 | $400 |
The SQL query should be able to identify the customers who have purchased the most in terms of spending amount over the past year.
The following PostgreSQL query would identify the top purchasing customers for Zoetis:
This query returns a list of the top five customers (their IDs), ranked by the total amount they've spent on Zoetis products across all the transactions they made in 2021. We are ordering the summed up amounts of each customer in descending order and limiting the result to the top five.
To practice a related customer analytics question on DataLemur's free online SQL coding environment, try this Microsoft SQL Interview problem:
Given a table of Zoetis employee salary information, write a SQL query to find all employees who make more money than their direct manager.
employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns $8,000, surpassing her manager, William Davis who earns 7,800.
Try this interview question interactively on DataLemur:
First, we perform a SELF-JOIN where we treat the first table () as the managers' table and the second table () as the employees' table. Then we use a clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
If the code above is hard to understand, you can find a detailed solution with hints here: Employees Earning More Than Managers.
A cross-join, also known as a cartesian join, is a JOIN that produces the cross-product of two tables. In a cross-join, each row from the first table is matched with every row from the second table, resulting in a new table with a row for each possible combination of rows from the two input tables.
Suppose you were building a Neural Network ML model, that tried to score the probability of a customer buying a Zoetis product. Before you started working in Python and Tensorflow, you might want to do some Exploratory Data Analysis (EDA) in SQL, and generate all pairs of customers and Zoetis products.
Here's a cross-join query you could use to find all the combos:
Cross-joins are great for generating all possible combinations, but they can also create really big tables if you're not careful. For example, if you had 10,000 potential customers, and Zoetis had 500 different product SKUs, you'd get 5 million rows as a result!!
Working as a data scientist at Zoetis, you are asked to analyze the sales data.
Write a query to calculate the month-over-month growth in average sales for each product, sorted by the products with the greatest improvement in average sales from the previous month to the current month. We define the month-over-month growth as the difference in average sales between two consecutive months (current month - previous month).
The sales data is stored in a Postgres table named , with the following columns:
Assume that months are numbered as full integers from some arbitrary starting point (e.g., month 1, month 2, etc.), and that you will never have to deal with fractional-month differences.
The dataset contains the following data:
sale_id | product_id | sale_date | quantity |
---|---|---|---|
1051 | 310 | 01/15/2022 | 7 |
1082 | 240 | 01/21/2022 | 5 |
1193 | 310 | 02/06/2022 | 9 |
1152 | 240 | 02/22/2022 | 11 |
1307 | 240 | 03/05/2022 | 15 |
1482 | 310 | 03/18/2022 | 4 |
The expected output:
product_id | month | avg_sales_change |
---|---|---|
240 | 2 | 6 |
240 | 3 | 4 |
310 | 2 | 2 |
310 | 3 | -5 |
This query first computes the average sales quantity for each product for each month. It then uses the window function to calculate the difference in average sales between the current month and the preceding month for each product. The overall result is ordered by the products with the greatest improvement in average sales from the previous month to the current month, and also by and for products that have the same .
To solve another window function question on DataLemur's free interactive SQL code editor, solve this Google SQL Interview Question:
A foreign key is a column or group of columns in a table that refers to the primary key in another table. The foreign key constraint helps maintain referential integrity between the two tables.
Let's examine employee data from Zoetis's HR database:
employee_id | first_name | last_name | manager_id |
---|---|---|---|
1 | Aubrey | Graham | 3 |
2 | Marshal | Mathers | 3 |
3 | Dwayne | Carter | 4 |
4 | Shawn | Carter |
In this table, serves as the primary key and functions as a foreign key because it links to the of the employee's manager. This establishes a relationship between Zoetis employees and their managers, allowing for easy querying to find an employee's manager or see which employees report to a specific manager.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to connect each employee to their respective department and location.
Zoetis is a global animal health company that provides medicines, vaccines and diagnostic products. They have several categories of animal health products and regularly sell these to different stores. For a particular sales period, Zoetis would like to know the average sale price per category of products.
Suppose you have a table with the following structure and data:
sale_id | product_id | store_id | sale_date | price | category |
---|---|---|---|---|---|
1 | 8572 | 5 | 2022-07-10 | 20.00 | Vaccines |
2 | 7261 | 9 | 2022-07-12 | 15.00 | Medicines |
3 | 8572 | 3 | 2022-07-15 | 12.00 | Vaccines |
4 | 9123 | 7 | 2022-07-18 | 25.00 | Diagnostic Products |
5 | 7261 | 4 | 2022-07-20 | 14.00 | Medicines |
Your task is to write a PostgreSQL query to find the average sale price per product category for the sales period.
With this query, we're aggregating the table by the column and finding the average for each . This way, Zoetis can understand the average sale price per category and make better business decisions.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for needing to group by categories and calculate a metric, or this CVS Health Pharmacy Analytics (Part 1) Question which is similar for requiring the calculation of an aggregate measure per product.
The primary key of a table is a column or set of columns that serves as a unique identifier for each row. It ensures that all rows are distinct and does not allow null values.
For example, say you had stored some Facebook ad campaign data that Zoetis ran:
The column uniquely identifies each row in the table, and the PRIMARY KEY constraint ensures that no two rows have the same . This helps to maintain the integrity of the data in the table by preventing duplicate rows.
The primary key is also an important part of the table because it allows you to easily identify and reference specific campaigns in your Facebook Ad data. You can use it to join to other tables in the database, such as a table containing data on the results of the campaigns.
As a company that provides medicines, vaccines and diagnostic products for pets and livestock, Zoetis may be interested in data analysis involving per-country sales. The question could be:
"Can you write an SQL query that would find the maximum purchase amount for each country where Zoetis' products are sold?"
In this scenario, the company may have a table that would look something like this:
sale_id | product_id | country | purchase_date | purchase_amount |
---|---|---|---|---|
2100 | 4300 | USA | 06/20/2021 | 45.50 |
2701 | 6280 | Canada | 06/25/2021 | 30.00 |
3002 | 4300 | USA | 06/26/2021 | 50.00 |
4003 | 6280 | USA | 06/30/2021 | 25.00 |
5004 | 4300 | Canada | 07/01/2021 | 48.00 |
The SQL query to find the maximum purchase amount per country would be:
If run on the above table, this query will create a table that lists each country and the maximum amount spent in a single purchase in that country.
country | max_purchase_amount |
---|---|
USA | 50.00 |
Canada | 48.00 |
You work for the company Zoetis, a global animal health company. You are asked to filter the customer purchase records and find all records where the purchased product name contains the word "Dog".
Assume that the tables and data are as follows:
purchase_id | customer_id | purchase_date | product_name | quantity |
---|---|---|---|---|
112 | 6543 | 02/15/2021 00:00:00 | Dog Collar | 3 |
324 | 1659 | 04/20/2021 00:00:00 | Cat Food | 5 |
297 | 7604 | 06/10/2021 00:00:00 | Dog Leash | 1 |
455 | 9872 | 07/28/2021 00:00:00 | Fish Food | 2 |
570 | 3219 | 10/14/2021 00:00:00 | Dog Bed | 2 |
Your query results should display every purchase record where the product name contains the word "Dog".
Here is the PostgreSQL query you would use:
This query will fetch all records from the purchases table where the product_name field contains the string "Dog" anywhere within it. The '%' characters are wildcards, meaning they can stand in for any sequence of characters. The use of these wildcards on either side of 'Dog' ensures that the query will correctly match product names which begin, end with, or merely contain the word "Dog".
Some similarities between unique and non-unique indexes include:
Some differences between unique and non-unique indexes include:
The best way to prepare for a Zoetis SQL interview is to practice, practice, practice. Besides solving the above Zoetis SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Google, Facebook, Microsoft and Amazon.
Each DataLemur SQL question has hints to guide you, full answers and most importantly, there is an interactive coding environment so you can instantly run your SQL query answer and have it checked.
To prep for the Zoetis SQL interview it is also a great idea to practice interview questions from other healthcare and pharmaceutical companies like:
But if your SQL skills are weak, forget about jumping right into solving questions – go learn SQL with this DataLemur SQL tutorial.
This tutorial covers topics including Union vs. UNION ALL and aggregate window functions – both of these come up routinely in SQL interviews at Zoetis.
In addition to SQL interview questions, the other question categories covered in the Zoetis Data Science Interview include:
To prepare for Zoetis Data Science interviews read the book Ace the Data Science Interview because it's got:
Also focus on the behavioral interview – prep for it with this guide on behavioral interview questions.