Back to questions
You are given a table of PayPal payments showing the payer, the recipient, and the amount paid. A two-way unique relationship is established when two people send money back and forth. Write a query to find the number of two-way unique relationships in this data.
Assumption:
| Column Name | Type |
|---|---|
| payer_id | integer |
| recipient_id | integer |
| amount | integer |
| payer_id | recipient_id | amount |
|---|---|---|
| 101 | 201 | 30 |
| 201 | 101 | 10 |
| 101 | 301 | 20 |
| 301 | 101 | 80 |
| 201 | 301 | 70 |
| unique_relationships |
|---|
| 2 |
There are 2 unique two-way relationships between:
The dataset you are querying against may have different input & output - this is just an example!
In a two-way relationship, the payers must also be recipients at some point; therefore, the column must also contain the ID of a payer. Keeping this in mind, we can use the set operator to find the two-way relationships as shown below:
The query above shows the output for transaction between ID 101 and ID 201 as follows:
| payer_id | recipient_id |
|---|---|
| 101 | 201 |
| 201 | 101 |
The operator combines two statements and returns only the distinct results that are common to both queries (meaning there are no duplicates). That is, if there are many back-and-forth transactions between two people, we'll only obtain two rows, as displayed above. If two people have a one-way transaction relationship, those will be eliminated because the payer never becomes the recipient in this situation.
Because back-and-forth transactions between two people should be counted as a single unique relationship, we can simply wrap this query in a subquery (or CTE) and divide the number of rows by a factor of two.
The number of rows can be obtained with a function as shown below:
Output:
| unique_relationships |
|---|
| 1 |
You can also use a common table expression (CTE) instead of a subquery. Do you know the differences between a subquery and a CTE?
A CTE is a temporary data et to be used as part of a query that can be reused throughout the entire query session. A subquery is a nested query (a query within a query), and unlike a CTE, it can be used within that query only.
Both methods give the same output and perform fairly similarly. The difference is that a CTE is reusable, whereas a subquery can be used in and clauses.
Solution #1 Using Subquery
Solution #2 Using CTE