At PayPal, SQL is used all the damn time for analyzing financial transactions to detect fraud, and for extracting customer behavior data to enhance product strategy. The PayPal engineering team even made a system called HERA to support the 100s of billions of SQL queries that PayPal runs per day . So, it shouldn't surprise you to hear that PayPal often tests SQL job interview questions for Data Science, Data Engineering and Data Analytics jobs.
So, to help you prepare for an upcoming PayPal SQL Assessment, this blog covers 11 PayPal SQL interview questions to practice, which are similar to recently asked questions at PayPal.
Given a table containing information about bank deposits and withdrawals made using Paypal, write a query to retrieve the final account balance for each account, taking into account all the transactions recorded in the table with the assumption that there are no missing transactions.account.
Column Name | Type |
---|---|
transaction_id | integer |
account_id | integer |
amount | decimal |
transaction_type | varchar |
transaction_id | account_id | amount | transaction_type |
---|---|---|---|
123 | 101 | 10.00 | Deposit |
124 | 101 | 20.00 | Deposit |
125 | 101 | 5.00 | Withdrawal |
126 | 201 | 20.00 | Deposit |
128 | 201 | 10.00 | Withdrawal |
account_id | final_balance |
---|---|
101 | 25.00 |
201 | 10.00 |
Using account ID 101 as an example, $30.00 was deposited into this account, while $5.00 was withdrawn. Therefore, the final account balance can be calculated as the difference between the total deposits and withdrawals which is $30.00 - $5.00, resulting in a final balance of $25.00.
To solve this question on DataLemur's free interactive coding environment, try this PayPal SQL Interview Question:
As a data scientist at PayPal, you have been asked to write a SQL query to analyze the transaction history of PayPal users. Specifically, management wants to know the average transaction amount for each user, and how they rank based on their averages. For this task:
Note: When the same average transaction amount is found for multiple users, they should have the same rank. And the user with the next higher average transaction amount should be given the rank number which comes after the consecutive rank.
For this question, the table is provided. This table has the following schema:
transaction_id | user_id | transaction_date | amount |
---|---|---|---|
1 | 1000 | 01/25/2021 | 50 |
2 | 1000 | 03/02/2021 | 150 |
3 | 2000 | 03/04/2021 | 300 |
4 | 3000 | 04/15/2021 | 100 |
5 | 2000 | 04/18/2021 | 200 |
6 | 3000 | 05/05/2021 | 100 |
7 | 4000 | 05/10/2021 | 500 |
In this query, a Common Table Expression (CTE) is first created to calculate the average transaction amount for every user using the AVG window function. Then, the RANK window function is used to rank the users based on their average transaction amount in descending order. The final output is ordered by the rank.
For more window function practice, solve this Uber SQL Interview Question on DataLemur's interactive SQL code editor:
You are given a table of PayPal payments showing the payer, the recipient, and the amount paid. A two-way unique relationship is established when two people send money back and forth. Write a query to find the number of two-way unique relationships in this data.
Assumption:
Column Name | Type |
---|---|
payer_id | integer |
recipient_id | integer |
amount | integer |
payer_id | recipient_id | amount |
---|---|---|
101 | 201 | 30 |
201 | 101 | 10 |
101 | 301 | 20 |
301 | 101 | 80 |
201 | 301 | 70 |
unique_relationships |
---|
2 |
To see an additional solution solve this PayPal SQL Interview Question on DataLemur!
Suppose you are a data analyst at PayPal, and you have been asked to create a report that identifies all users who have sent payments of more than 5000 in the last month. We want to filter out any user whose account is flagged as "fraudulent".
Below are sample inputs and expected output for your question.
transaction_id | user_id | transaction_date | transaction_type | amount |
---|---|---|---|---|
101 | 123 | 07/08/2022 00:00:00 | Sent | 750 |
102 | 265 | 07/10/2022 00:00:00 | Received | 6000 |
103 | 265 | 07/18/2022 00:00:00 | Sent | 1500 |
104 | 362 | 07/26/2022 00:00:00 | Received | 6000 |
105 | 981 | 07/05/2022 00:00:00 | Sent | 3000 |
user_id | username | is_fraudulent |
---|---|---|
123 | Jessica | false |
265 | Daniel | true |
362 | Michael | false |
981 | Sophia | false |
user_id | username |
---|---|
362 | Michael |
981 | Sophia |
To get the required output, follow the query below:
This query joins the Transactions and User tables using the user_id key. It then filters out results based on the conditions specified: the transaction date should be within the past month, the user should've either sent payments over 5000, and the user's account should not be flagged as fraudulent. Finally, it groups the results by user_id and username to remove duplicate rows. undefined
Imagine you've got giant AF jigsaw puzzle with thousands of pieces, and each piece represents a piece of data in your database. You've spent hours organizing the puzzle into neat little piles, with each pile representing a table in your database. This process is called normalization, and it's a great way to make your database efficient, flexible, and easy to maintain.
But what if you want to solve the puzzle faster (aka make your queries run faster?)?
That's where denormalization comes in – Denormalization is like the puzzle-solving equivalent of taking a shortcut!
Instead of putting all the pieces in separate piles, you might decide to clone some of the pieces, and then have that one puzzle piece be put into multiple piles. Clearly, we are breaking the rules of physics, but that's just like de-normalization because it breaks the normal rules of normalization (1st, 2nd, 3rd normal forms).
By adding redundant puzzle pieces, it can be easier to find the pieces you need, but it also means that you have to be extra careful when you're moving pieces around or adding new ones (aka INSERT/UPDATE commands become more complex).
On the plus side, denormalization can improve the performance of your database and make it easier to use. On the downside, it can make your database more prone to errors and inconsistencies, and it can be harder to update and maintain. In short, denormalization can be a helpful tool, but it's important to use it wisely!
Given a hypothetical situation where PayPal runs several online marketing campaigns, they want to closely monitor the click-through conversion rate of their campaigns for optimization. The click-through conversion rate is the number of users who click on the advertisement and proceed to add a product (in this case, setting up a new PayPal account) divided by the total number of users who have clicked the ad.
The following tables, named and , present sample data. Calculate the daily click-through conversion rate for the first week of September 2022.
click_id | user_id | click_time | ad_id |
---|---|---|---|
1 | 200 | 2022-09-01 10:14:00 | 4001 |
2 | 534 | 2022-09-01 11:30:00 | 4003 |
3 | 120 | 2022-09-02 14:43:00 | 4001 |
4 | 534 | 2022-09-03 16:15:00 | 4002 |
5 | 287 | 2022-09-04 17:20:00 | 4001 |
setup_id | user_id | setup_time |
---|---|---|
1 | 200 | 2022-09-01 10:30:00 |
2 | 287 | 2022-09-04 17:40:00 |
3 | 534 | 2022-09-01 11:45:00 |
The SQL logic above uses a left join to combine and on . The clause is used to filter the data within the first week of September, 2022. Then, it counts the number of distinct in both tables for each day. To get the click-through conversion rate, it divides the number of account setups by the number of clicks.
To practice a related problem on DataLemur's free online SQL coding environment, attempt this SQL interview question asked by Facebook:
The constraint is used to specify a condition that the data in a column must meet. If a row is inserted or updated and the data in the column doesn't meet the condition specified by the CHECK constraint, the operation will sadly fail.
For example, you might use a constraint to ensure that a column contains only positive numbers, or that a date is within a certain range.
For example, if you had a table of PayPal employees, here's an example of how to use the CHECK constraint in a CREATE TABLE statement:
As a data analyst at PayPal, your task is to identify the products which generate the highest total revenue for each month. Assume that each transaction on PayPal relates to a product purchased, and the revenue generated is the transaction amount. Each transaction is timestamped, and the product ID is also recorded.
transaction_id | user_id | transaction_date | product_id | transaction_amount |
---|---|---|---|---|
218 | 123 | 06/08/2022 00:00:00 | 50001 | 150.00 |
320 | 265 | 06/12/2022 00:00:00 | 69852 | 200.00 |
475 | 362 | 06/21/2022 00:00:00 | 50001 | 300.00 |
650 | 192 | 07/06/2022 00:00:00 | 69852 | 100.00 |
789 | 981 | 07/05/2022 00:00:00 | 69852 | 250.00 |
month | product | total_revenue |
---|---|---|
6 | 50001 | 450.00 |
6 | 69852 | 200.00 |
7 | 69852 | 350.00 |
This query starts by extracting the month from each transaction date using the function. It then groups the transactions by both month and product_id, and calculates the total revenue for each group using the aggregate function. The results are then ordered by total_revenue in descending order, so the highest revenue-generating products for each month appear at the top. undefined
As a PayPal data analyst, you are tasked with identifying user profiles that were created with corporate email addresses, specifically those that end with '@paypal.com'.
customer_id | first_name | last_name | create_date | |
---|---|---|---|---|
1 | John | Doe | johndoe@gmail.com | 2022-01-01 |
2 | Jane | Smith | janesmith@paypal.com | 2022-02-01 |
3 | Max | Lee | maxlee@yahoo.com | 2022-02-01 |
4 | Abby | Chen | abbychen@paypal.com | 2022-03-01 |
customer_id | first_name | last_name | create_date | |
---|---|---|---|---|
2 | Jane | Smith | janesmith@paypal.com | 2022-02-01 |
4 | Abby | Chen | abbychen@paypal.com | 2022-03-01 |
With the PostgreSQL LIKE keyword, the '%' symbol is used to define wildcards (missing letters) in the pattern.
This SQL query will scan through the entire "customer" table and filter for customer records where the email column matches the provided pattern, which in this case is any string ending with '@paypal.com'. The '%@paypal.com' pattern matches any value that ends with '@paypal.com'. In this specific case, it will return the profiles of customers Jane Smith and Abby Chen, since these individuals have email addresses that end with '@paypal.com'. undefined
{#Question-10}
The UNIQUE constraint makes sure that all values in a column are distinct. It is often paired with other constraints, like NOT NULL, to ensure that the data follows certain rules.
For example, say you were an analyst on the marketing team at PayPal, and had access to a database on marketing campaigns:
In this example, the UNIQUE constraint is applied to the "campaign_name" field to ensure that each campaign has a unique name. This helps to ensure the integrity of the data in the table and prevents errors that could occur if two campaigns had the same name.
You're given two tables - "Users" and "Transactions". The "Users" table records PayPal's user base. Each row represents a different user, and includes fields for the user_id and signup_date. The "Transactions" table records transactions made by these users. Each row represents a different transaction and includes fields for transaction_id, user_id, transaction_date and transaction_amount.
Write a SQL query that calculates the total and average transaction amount for all transactions for each user. Include only users who have made at least two transactions.
user_id | signup_date |
---|---|
1 | 2020-01-30 |
2 | 2020-02-15 |
3 | 2020-03-20 |
4 | 2020-04-01 |
transaction_id | user_id | transaction_date | transaction_amount |
---|---|---|---|
101 | 1 | 2020-02-01 | 50.00 |
102 | 1 | 2020-02-02 | 100.00 |
103 | 2 | 2020-02-20 | 200.00 |
104 | 2 | 2020-02-25 | 500.00 |
105 | 3 | 2020-03-25 | 100.00 |
106 | 4 | 2020-05-05 | 300.00 |
user_id | total_amount | average_amount |
---|---|---|
1 | 150.00 | 75.00 |
2 | 700.00 | 350.00 |
This query calculates the total and average transaction amount for all transactions for each user. It uses a GROUP BY clause to group the transactions by user_id, and a HAVING clause to only include users who have made at least two transactions.
Because joins come up frequently during SQL interviews, try this interactive Snapchat Join SQL question:
The key to acing a PayPal SQL interview is to practice, practice, and then practice some more! Beyond just solving the above PayPal SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like FAANG tech companies and tech startups.
Each exercise has hints to guide you, full answers and most importantly, there is an online SQL code editor so you can easily right in the browser your query and have it executed.
To prep for the PayPal SQL interview it is also wise to solve interview questions from other fintech companies like:
You should also look into PayPal's HERA system, in case you are a Data Engineer or Software Engineer that needs to have a better understanding of things for PayPal's System Design rounds:
For the PayPal Data Science Interview, in addition to SQL query questions, the other types of questions which are covered:
The best way to prepare for PayPal Data Science interviews is by reading Ace the Data Science Interview. The book's got:
Read more FinTech Data Science use cases and read how companies like PayPal and Zelle use Data Science!