At ADP, SQL Server is used across the company for extracting and analyzing payroll data associated with 90 million workers. Unsurprisingly this is why ADP often tests SQL query questions during interviews for Data Analyst, Data Science, and BI jobs.
So, if you're stressed about an upcoming SQL Interview, here’s 11 ADP SQL interview questions to practice – can you solve them?
ADP (Automatic Data Processing) offers various human resources management software and services. The goal is to identify the most important users for ADP's services, i.e., the users who use their HR tools most frequently.
Using SQL, write a query to identify the top 10 users with the highest tool usage in the last month. To gauge the frequency, consider the number of logins to any of ADP's HR tool/platform.
As an ADP's database schema, consider two tables, 'users' table to represent our users data, and 'logins' table to represent login activities.
user_id | user_name | company_id | role |
---|---|---|---|
1 | John Doe | 101 | Employee |
2 | Jane Doe | 101 | Manager |
3 | Alice Smith | 102 | Admin |
4 | Bob Johnson | 103 | Employee |
login_id | user_id | tool_id | login_timestamp |
---|---|---|---|
321 | 2 | 501 | 06/09/2022 12:30:00 |
145 | 1 | 402 | 06/10/2022 09:45:00 |
789 | 1 | 501 | 06/10/2022 15:15:00 |
111 | 2 | 501 | 06/11/2022 14:30:00 |
890 | 3 | 402 | 06/12/2022 10:45:00 |
570 | 4 | 301 | 06/12/2022 12:35:00 |
650 | 3 | 501 | 06/13/2022 14:30:00 |
261 | 2 | 301 | 06/14/2022 09:30:00 |
This SQL query joins the 'users' table and the 'logins' table on 'user_id'. It then filters out login activities that happened in the last month. The query groups by 'user_id' and 'user_name' to count the number of logins per user. The final output is sorted in descending order by login frequency to display top users first and limit the output to the top 10 users.
To work on a similar customer analytics SQL question where you can solve it right in the browser and have your SQL solution instantly executed, try this Walmart SQL Interview Question:
In ADP, a payroll company, you are given a dataset containing a record of all employee salaries. Each row in the dataset represents the salary of an employee for a specific day. Some employees might have salary changes over different days due to changes in their role, bonus, or other contributing factors.
Your task is to write an SQL query that will calculate the average daily salary of each department for the last 7 days, using a window function.
employee_id | department_id | date | salary |
---|---|---|---|
1 | 100 | 2022-09-01 | $7,500 |
2 | 200 | 2022-09-01 | $8,000 |
3 | 100 | 2022-09-02 | $7,700 |
1 | 100 | 2022-09-02 | $7,600 |
2 | 200 | 2022-08-27 | $8,200 |
1 | 100 | 2022-08-26 | $7,400 |
department_id | avg_salary |
---|---|
100 | $7,600 |
200 | $8,000 |
In this query, we use the window function to calculate the moving average of the salary for each department over the last 7 days. We restrict our analysis to the last 7 days by only selecting rows where the date is greater than the maximum date in our table minus 6 days. Finally, we group by and to get the daily average salary for each department, and order our results by .
For more window function practice, try this Uber SQL problem on DataLemur's interactive SQL code editor:
A database index is a data structure that improves the speed of data retrieval operations on a database table.
There are few different types of indexes that can be used in a database:
For a concrete example, say you had a table of ADP customer payments with the following columns:
Here's what a clustered index on the column would look like:
A clustered index on the column would determine the physical order of the records in the table based on the . This means that the records with the earliest values would be stored together physically in the table, followed by records with later payment_date values. This speeds up queries that filter or sort the data based on the , as the records are already physically sorted in the table. For example, if you want to retrieve all the payments made in the month of June, the database can use the clustered index to quickly locate and retrieve the desired records without having to search through the entire table.
You are a Data Analyst at ADP, a payroll processing company. One of the key metrics that the company focuses on for efficiency is the average time taken to process a payroll. Given the 'payroll_data' table, can you write a SQL query to compute the average payroll processing time for each client? Assume that the time taken to process a payroll is the difference in days between 'date_received' and 'date_processed'.
Use the following sample 'payroll_data' table:
payroll_id | client_id | date_received | date_processed |
---|---|---|---|
101 | 599 | 2022-06-01 | 2022-06-04 |
102 | 445 | 2022-06-02 | 2022-06-05 |
103 | 445 | 2022-06-06 | 2022-06-10 |
104 | 312 | 2022-06-05 | 2022-06-07 |
105 | 312 | 2022-06-01 | 2022-06-02 |
client_id | average_processing_time |
---|---|
599 | 3.0 |
445 | 3.5 |
312 | 1.5 |
This PostgreSQL query calculates the average processing time for each client by taking the difference between the 'date_processed' and 'date_received' (which gives the processing time for each payroll in days), then averages these across all payrolls for each client. The 'GROUP BY' clause groups the results by 'client_id'. The result is a table that shows the average processing time in days for each client.
The most similar questions to your prompt are "Average Review Ratings" and "Average Post Hiatus (Part 1)".
Both questions involve computation of average and dealing with date fields, and can be computed using the AVG() function and the DATE_DIFF() function similar to your question.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for requiring computation of averages and handling of date fields or this Facebook Average Post Hiatus (Part 1) Question which is similar for the same reasons.
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1st, 2nd, 3rd normal forms).
A few reasons to denormalize a database:
Improved performance: Denormalization often reduces the the number of costly join operations that are needed to retrieve data. This is helpful when the database is being used for OLAP (Online Analytical Processing) use cases at ADP, as joins can be expensive and slow.
Scalability: Denormalization can be like a turbo boost for your database's scalability! By requiring less data to be read and processed when executing a query, denormalization can help your database handle a ton of queries without breaking a sweat.
Simplification: One way to simplify the design of a database is by using denormalization to reduce the number of tables and relationships that need to be managed. This can make the database easier to understand and maintain.
Of course, don't over-do the database denormalization magic – too much redundancy can increase data storage costs, and get complicated to manage if there's frequent commands that need to be run.
ADP, a company that provides human resource management software, runs multiple online advertising campaigns. They collect data on ad impressions and clicks.
The question here is for you to calculate the click-through rate (CTR). The CTR is the ratio of users who clicked on an ad to the number of total users who viewed the ad (impressions).
We have two tables and .
The table has one row for each ad impression, and the table has one row for each click on the ad.
Assume , and as columns for both tables, with being the unique identifier for each user. We want to calculate the CTR for each unique in June 2022.
Here is the sample data for our problem:
ad_id | timestamp | user_id |
---|---|---|
100 | 06/01/2022 00:00:00 | 1 |
100 | 06/02/2022 00:00:00 | 2 |
101 | 06/03/2022 00:00:00 | 3 |
101 | 06/04/2022 00:00:00 | 4 |
100 | 06/05/2022 00:00:00 | 5 |
ad_id | timestamp | user_id |
---|---|---|
100 | 06/01/2022 00:00:00 | 1 |
101 | 06/04/2022 00:00:00 | 4 |
In PostgreSQL, we would use the following query:
This SQL query calculates the CTR by first counting the number of clicks per ad () and the total number of impressions per ad (), and then dividing these counts by each other to get the click-through rate. We also ensure not to have division by zero errors by using function. The date functions are used to filter data for June 2022 only.
The output would have each unique ad_id from June 2022 and its corresponding CTR.
To solve a similar SQL interview question on DataLemur's free online SQL code editor, solve this SQL interview question asked by Facebook:
An inner join only includes rows from both tables that satisfy the join condition, whereas a full outer join includes all rows from both tables, regardless of whether they match the join condition or not.
For a concrete example, imagine you had two database tables, an advertising campaigns table which had information on Google Ads keywords and how much was bid for each keyword, and a ADP sales table, which has data on how many products were sold and which Google Ads keyword drove that sale.
: retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the table and the table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the table matches the keyword in the table.
: retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, values will be returned for the columns of the non-matching table.
ADP provides human resources management software and services, including payroll management. Suppose you are given a table containing the monthly payroll details for each client. Each row represents the total payroll that a client uses ADP to manage in a given month.
Write a SQL query to calculate the average monthly payroll managed per client.
client_id | month_year | total_payroll |
---|---|---|
126 | 2022-06 | 10000 |
128 | 2022-06 | 7500 |
126 | 2022-07 | 10500 |
128 | 2022-07 | 8000 |
129 | 2022-07 | 7500 |
month_year | avg_payroll_per_client |
---|---|
2022-06 | 8750 |
2022-07 | 8666.67 |
To answer this question, we use the statement in PostgreSQL to group the data by . Inside this group, we calculate the average for each using the function. We round the average to two decimal places using the function. The result shows the average monthly payroll managed per client for each month. undefined
ADP stores the information of its customers including their names, emails, and the products they avail of. You are tasked to find all the customers who have a Gmail address. We are only interested in the , and .
The table is structured as follows:
customer_id | Name | products_id | |
---|---|---|---|
7632 | John Doe | john_doe@gmail.com | 20001 |
9802 | Bella Miles | bella_miles@yahoo.com | 80052 |
1077 | Stewart Little | stewart_little@gmail.com | 45002 |
4231 | Katie Bell | katie_bell@hotmail.com | 67023 |
5561 | Jane Smith | jane_smith@gmail.com | 12500 |
Write a SQL query to retrieve the required information.
This query will return all the customers that have a Gmail address. The operator along with the wildcards allows us to define a pattern that matches any email ending in .
From the sample input table, the output will be:
customer_id | Name | |
---|---|---|
7632 | John Doe | john_doe@gmail.com |
1077 | Stewart Little | stewart_little@gmail.com |
5561 | Jane Smith | jane_smith@gmail.com |
undefined |
The clause is used to filter the groups created by the clause. It's similar to the clause, but it is used to specify conditions on the groups created by the clause, rather than on the individual rows of the table.
For example, say you were analyzing salaries for analytics employees at ADP:
This query retrieves the total salary for each Analytics department at ADP and groups the rows by the specific department (i.e. ""Marketing Analytics"", ""Business Analytics"", ""Sales Analytics"" teams).
The clause then filters the groups to include only ADP departments where the total salary is greater than $1 million
ADP has two tables and . The table keeps track of all customer information, including , , , and .
The table keeps track of all contracts between ADP and its customers, including , , , and .
Write a SQL query to return all customer information along with any product they have a contract with, include those customers who don't have any contracts yet.
customer_id | first_name | last_name | city |
---|---|---|---|
101 | John | Doe | San Francisco |
102 | Jane | Smith | Los Angeles |
103 | Bob | Johnson | San Francisco |
104 | Alice | Williams | New York |
105 | Charlie | Brown | Chicago |
contract_id | customer_id | product_id | contract_date |
---|---|---|---|
5001 | 101 | P1 | 2022-05-01 |
5002 | 101 | P2 | 2022-06-01 |
5003 | 102 | P1 | 2022-07-01 |
5004 | 103 | P2 | 2022-08-01 |
5005 | 103 | P3 | 2022-09-01 |
This query uses a to combine the and tables. returns all the records from the left table (), and the matched records from the right table (). If there is no match, the result is NULL on the right side. In this case, customers without any contracts will still be included in the result with NULL as the .
Because joins come up frequently during SQL interviews, try this interactive Snapchat Join SQL question:
The best way to prepare for a ADP SQL interview is to practice, practice, practice. Beyond just solving the above ADP SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Netflix, Airbnb, and Amazon.
Each interview question has hints to guide you, step-by-step solutions and crucially, there is an online SQL coding environment so you can instantly run your SQL query answer and have it executed.
To prep for the ADP SQL interview you can also be useful to practice SQL questions from other payroll companies like:
However, if your SQL query skills are weak, don't worry about jumping right into solving questions – strengthen your SQL foundations with this DataLemur SQL tutorial.
This tutorial covers SQL concepts such as how window functions work and grouping by multiple columns – both of which come up frequently in ADP SQL assessments.
For the ADP Data Science Interview, besides SQL questions, the other types of questions to prepare for are:
The best way to prepare for ADP Data Science interviews is by reading Ace the Data Science Interview. The book's got: