At Remitly, SQL is often used for analyzing transaction data for fraud detection and optimizing remittance flow patterns for improved customer service. Unsurprisingly this is why Remitly typically asks SQL query questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.
As such, to help you study for the Remitly SQL interview, we'll cover 8 Remitly SQL interview questions – how many can you solve?
As a data scientist at Remitly, you are tasked with performing an analysis on the company's historical money transfer transactions. Your job is to write a SQL query that will determine the total amount of money transferred for each sender, the average amount of money transferred per transaction for each sender, and the rank of each sender based on the total amount of money transferred.
Below is a sample data from the table:
transaction_id | sender_id | receiver_id | transfer_date | amount |
---|---|---|---|---|
1001 | 1 | 101 | 2022-06-08 | 500 |
1002 | 2 | 102 | 2022-06-10 | 200 |
1003 | 1 | 103 | 2022-06-18 | 300 |
1004 | 3 | 104 | 2022-07-26 | 1000 |
1005 | 3 | 105 | 2022-07-05 | 500 |
1006 | 2 | 106 | 2022-08-01 | 100 |
1007 | 2 | 107 | 2022-08-20 | 1000 |
1008 | 1 | 108 | 2022-09-04 | 1500 |
This query is broken down into two main parts:
The first part (total_transfers) is using an aggregate function to calculate the total and average amount of money transferred for each sender.
The second part (sender_ranks) is using a window function to assign a rank to each sender based on total amount transferred. The RANK() function is used to provide this ranking. The results are then selected from the ranking table.
For more window function practice, try this Uber SQL problem within DataLemur's online SQL code editor:
Remitly is a service that allows users to send money globally. You are given a table that contains fields such as , , , , , , and . The column can contain the values , , and .
Your task is to write a SQL query to filter all customers who have made a successful transaction of more than dollars from to between , and .
transaction_id | customer_id | send_date | send_country | receive_country | amount | status |
---|---|---|---|---|---|---|
101 | 1201 | 01/15/2020 00:00:00 | USA | India | 1024 | completed |
102 | 2434 | 02/07/2020 00:00:00 | USA | India | 540 | completed |
103 | 3544 | 03/21/2020 00:00:00 | USA | India | 780 | pending |
104 | 5500 | 12/26/2020 00:00:00 | USA | India | 777 | completed |
105 | 7322 | 01/31/2020 00:00:00 | USA | India | 430 | failed |
This query uses a combination of and conditions to filter the customers based on the given business requirements. It checks that the is 'USA', the is 'India', the is 'completed', the is greater than 500, and the is between '2020-01-01' and '2020-03-31'.
There are several advantages to normalizing a database, including less redundancy, more flexibility, and better performance.
Less Redundancy: Normalization reduces redundancy by breaking down a larger, more general table into smaller, more specific tables. This reduces the amount of data that needs to be accessed for queries.
More Flexibility: Normalization makes it easier to modify the structure of the database, as there is less redundancy, so it allows you to make changes to one table without affecting others. This makes it easier to adapt the database to changing business needs (a very real reality at Remitly!)
Better Performance: Normalization can improve the performance of the database by reducing the amount of data that needs to be stored and accessed in a single table. This can result in faster query times and better overall performance.
At Remitly, an online international money transfer service, it's important to understand the average amounts of transactions that are sent. For this question, let's consider a table that stores the details of every transaction made through Remitly. Write an SQL query to find the average transaction amount per country (receiving country) for the last month.
transaction_id | sender_id | receiver_id | transaction_date | amount | receiving_country |
---|---|---|---|---|---|
101 | 1 | 2 | 08/12/2022 | 200.00 | Philippines |
102 | 3 | 4 | 08/13/2022 | 150.00 | India |
103 | 5 | 6 | 08/15/2022 | 500.00 | Mexico |
104 | 7 | 8 | 08/17/2022 | 300.00 | Philippines |
105 | 9 | 10 | 08/18/2022 | 250.00 | India |
This query first filters out the transactions from the previous month using the WHERE clause. Then it groups transactions by receiving country and calculates the average transaction amount for each country. The average amount could provide insights into which countries are receiving larger sums of money on average from their inbound Remitly transactions.
The clause is used to filter the groups created by the clause. It's similar to the clause, but it is used to specify conditions on the groups created by the clause, rather than on the individual rows of the table.
For example, say you were analyzing salaries for analytics employees at Remitly:
This query retrieves the total salary for each Analytics department at Remitly and groups the rows by the specific department (i.e. ""Marketing Analytics"", ""Business Analytics"", ""Sales Analytics"" teams).
The clause then filters the groups to include only Remitly departments where the total salary is greater than $1 million
Remitly is a digital remittance service, enabling users to send money to others in different countries. Given a table of these transactions, can you write a SQL query to find the average value of transactions sent to each country per month?
Please identify the month in the format 'YYYY-MM'. Assume the table has the following schema:
transaction_id | user_id | send_date | send_country | receive_country | amount |
---|---|---|---|---|---|
1750 | 101 | 2022-08-02 00:00:00 | US | PH | 200.00 |
2206 | 202 | 2022-08-10 00:00:00 | US | MX | 150.00 |
3094 | 303 | 2022-08-18 00:00:00 | US | PH | 250.00 |
4582 | 404 | 2022-09-05 00:00:00 | US | MX | 200.00 |
5397 | 505 | 2022-09-08 00:00:00 | US | PH | 300.00 |
This query first groups the transactions by both month (with year) and the receiving country. It then calculates the average transaction value to each country per month using the AVG() function. Finally, it orders the output by month and average amount in descending order.
month | receive_country | avg_amount |
---|---|---|
2022-08 | PH | 225.00 |
2022-08 | MX | 150.00 |
2022-09 | PH | 300.00 |
2022-09 | MX | 200.00 |
Which shows that in August 2022, the average transaction amount sent to Philippines (PH) was 150, etc.
A database index is a way to optimize the performance of a database by reducing the amount of data that needs to be searched to retrieve a record.
There are several types of indexes that can be used in a database:
For a concrete example, say you had a table of Remitly customer payments with the following columns: payment_id, customer_id, payment_amount, and payment_date.
Here's what a clustered index on the column would look like:
A clustered index on the column would determine the physical order of the records in the table based on the . This means that the records with the earliest values would be stored together physically in the table, followed by records with later payment_date values.
Having a clustered index on the column can speed up queries that filter or sort the data based on the payment_date, as the records are already physically sorted in the table. For example, if you want to retrieve all the payments made in the month of January, the database can use the clustered index to quickly locate and retrieve the desired records without having to search through the entire table.
You currently work as a data analyst for Remitly, an American company that provides mobile payments services. One of your tasks is to analyze the transaction and customer data to draw useful insights that can inform business decisions.
Your customers table has the following schema:
customer_id | first_name | last_name | country |
---|---|---|---|
1 | John | Doe | USA |
2 | Jane | Smith | Canada |
3 | Ali | Ahmed | UK |
4 | Maria | Lopez | USA |
And your transactions table is:
transaction_id | customer_id | date | amount |
---|---|---|---|
1 | 2 | 2022-01-01 | 100.00 |
2 | 1 | 2022-02-15 | 250.00 |
3 | 3 | 2022-03-20 | 150.00 |
4 | 2 | 2022-04-05 | 200.00 |
5 | 4 | 2022-05-31 | 300.00 |
Write a SQL query to get total transaction amount for each customer and join the output with customer information.
With the above PostgreSQL query, we join the and tables on the column which is common to both tables. We then group the result by the customer's ID, first name, last name, and country, and calculate the sum of transaction amounts for each group, effectively giving us the total transaction amount for each customer.
customer_id | first_name | last_name | country | total_amount |
---|---|---|---|---|
1 | John | Doe | USA | 250.00 |
2 | Jane | Smith | Canada | 300.00 |
3 | Ali | Ahmed | UK | 150.00 |
4 | Maria | Lopez | USA | 300.00 |
Because joins come up so often during SQL interviews, take a stab at an interactive SQL join question from Spotify:
The best way to prepare for a Remitly SQL interview is to practice, practice, practice. In addition to solving the earlier Remitly SQL interview questions, you should also solve the 200+ SQL exercises on DataLemur which come from companies like Facebook, Google and unicorn tech startups.
Each SQL question has multiple hints, step-by-step solutions and best of all, there's an interactive SQL code editor so you can easily right in the browser your query and have it executed.
To prep for the Remitly SQL interview you can also be helpful to practice interview questions from other tech companies like:
However, if your SQL skills are weak, forget about going right into solving questions – go learn SQL with this DataLemur SQL tutorial.
This tutorial covers things like Union vs. UNION ALL and filtering on multiple conditions using AND/OR/NOT – both of which show up often during SQL job interviews at Remitly.
Besides SQL interview questions, the other question categories covered in the Remitly Data Science Interview are:
The best way to prepare for Remitly Data Science interviews is by reading Ace the Data Science Interview. The book's got: