At Marqeta, SQL is used across the company for analyzing payment transactions and managing customer datasets in the fintech industry. That's why Marqeta almost always asks SQL questions during interviews for Data Science and Data Engineering positions.
Thus, to help you practice for the Marqeta SQL interview, here’s 10 Marqeta SQL interview questions – can you solve them?
Marqeta, a global modern card issuing platform, has a number of individual cardholders who make transactions using their cards. Your goal as a Data Analyst is to write a SQL Query to identify the most active cardholders in terms of transaction frequency. Assume high activity to be more than 30 transactions per month.
To solve this, we have the following table named "transactions".
transaction_id | cardholder_id | transaction_date | transaction_amount |
---|---|---|---|
1567 | 231 | 06/07/2022 00:00:00 | 500 |
2546 | 342 | 06/12/2022 00:00:00 | 850 |
3564 | 231 | 06/14/2022 00:00:00 | 300 |
4892 | 415 | 07/20/2022 00:00:00 | 750 |
5011 | 342 | 07/24/2022 00:00:00 | 450 |
Here is an SQL PostgreSQL query to calculate the most active users by transaction count in a month.
This query will group transactions by and and count the number of transactions per cardholder per month. Then it filters out cardholders who made more than 30 transactions in a single month and returns them in descending order based on frequency of transactions.
Note: This is a hypothetical scenario and the threshold of 30 transactions is arbitrary. Calling a user who makes more than 30 transactions a "power user" might not apply in many situations and doesn't account for factors like the total amount of money transacted. For instance, a user who made 31 transactions totaling 3000. Adjust the threshold and the features such as based on your specific needs.
To practice another SQL customer analytics question where you can code right in the browser and have your SQL solution instantly executed, try this Walmart SQL Interview Question:
Suppose you work for Marqeta, a company that provides card issuance and digital payment solutions and services. You've been tasked with analyzing transaction data to understand card usage patterns. Specifically, you've been asked to write a SQL query that computes the average number of transactions by card on a daily basis for each user.
For data, you have two main tables: and
example input:
card_id | user_id |
---|---|
1 | A |
2 | A |
3 | B |
4 | B |
5 | C |
example input:
txn_id | card_id | txn_date |
---|---|---|
101 | 1 | 2022-08-01 |
102 | 1 | 2022-08-01 |
103 | 2 | 2022-08-02 |
104 | 1 | 2022-08-03 |
105 | 3 | 2022-08-01 |
106 | 4 | 2022-08-01 |
107 | 5 | 2022-08-01 |
108 | 3 | 2022-08-02 |
109 | 4 | 2022-08-02 |
110 | 2 | 2022-08-03 |
To compute the average daily transactions per card for each user, you can use the PostgreSQL window function. Here's an example query:
This query first joins the and table using the . It then groups by and to count the number of transactions each card has on every unique day (). Finally, it computes the average number of these daily transactions for each user by using the AVG() function over user_id. The clause is used to sort the output by .
p.s. Window functions show up pretty often during SQL interviews, so practice the 27+ window function questions on DataLemur
To discover records in one table that are not present in another, you can utilize a and filter out any values in the right-side table.
For example, say you had a table of Marqeta customers and a 2nd table of all purchases made with Marqeta. To find all customers who did not make a purchase, you'd use the following
This query returns all rows from the customers table, along with any matching rows from the purchases table. If there is no matching row in the purchases table, values will be returned for all of the right table's columns. The clause then filters out any rows where the purchases.id column is , leaving only customers who have not made a purchase.
Marqeta is a modern payment platform, enabling companies to create, manage, and optimize card-based payments. Imagine you are a part of the data engineering team at Marqeta. Your task is to design a database schema for tracking individual card transactions and retrieving aggregate expenditure for each user on a monthly basis.
Considerations:
How would you design your tables, relationships, and which columns belong to which tables, considering performance for an SQL query that retrieves a monthly breakdown of transactions per user?
Here are some sample transactions:
transaction_id | card_id | transaction_date | amount |
---|---|---|---|
1001 | 2001 | 06/08/2022 00:00:00 | 50 |
1002 | 2002 | 06/10/2022 00:00:00 | 120 |
1003 | 2003 | 06/18/2022 00:00:00 | 75 |
1004 | 2001 | 07/26/2022 00:00:00 | 100 |
1005 | 2002 | 07/05/2022 00:00:00 | 80 |
card_id | user_id |
---|---|
2001 | 3001 |
2002 | 3002 |
2003 | 3001 |
Get a report of total expenditure for each user per month:
This query gives a breakdown of total monthly transactions for each user. It starts by truncating the transaction_date to retain only the month and year. It then groups the results by this date and user_id. 'Sum' calculates the total transaction amount per user per month. It gives the total expenditure for each user on Marqeta's platform each month.
Database denormalization is when you add redundancy to a database, and break typical normalization rules (codified by the 1st, 2nd, 3rd normal forms).
Denormalization is typically used to improve the performance of a database, particularly when the database is being for OLAP (Online Analytical Processing) purposes. By denormalizing a database, you can reduce the number of joins that are required to retrieve data, which can greatly improve the speed of queries since joins are costly and slow. However, denormalization can also introduce some problems, such as increased data redundancy and the need for more complex update and delete operations since data has been duplicated into multiple tables.
In general, denormalization should be used with caution and only after careful consideration of the trade-offs involved. It's typically better to start with a well-normalized database design then denormalize only if your database isn't scaling as well as you want.
As a data analyst at Marqeta, your manager asked you to filter down the customer data to identify specific customers. Write a SQL query to find all customers who live in California and either have made more than 10 transactions or have spent over $2000 in total.
Here are your tables:
customer_id | first_name | last_name | state |
---|---|---|---|
001 | John | Doe | California |
002 | Jane | Smith | New York |
003 | Bob | Johnson | California |
004 | Alice | Davis | California |
transaction_id | customer_id | amount | transaction_date |
---|---|---|---|
0001 | 001 | 500 | 2022-09-01 |
0002 | 001 | 700 | 2022-09-02 |
0003 | 002 | 800 | 2022-09-02 |
0004 | 003 | 250 | 2022-09-03 |
0005 | 003 | 1850 | 2022-09-04 |
0006 | 004 | 1200 | 2022-09-05 |
0007 | 004 | 1800 | 2022-09-06 |
Here is your PostgreSQL solution:
This solution first joins the customers table with the transactions table on the customer_id column. Then, it filters to only customers in California. Finally, it counts the number of transactions and sums the amounts spent for each customer, filtering to only those who have made more than 10 transactions or have spent over $2000.
A cross join is a type of JOIN operation in SQL that creates a new table by combining each row from the first table with every row from the second table. It is also known as a cartesian join.
For example, say you worked on the Marketing Analytics team at Marqeta, and needed to understand what advertising copy (text) goes best with what advertising creative (the photo that gets posted with the advertisement copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
A natural join, on the other hand, is a type of JOIN that combines rows from two or more tables based on their common columns. It is called a "natural" join because it is based on the natural relationship that exists between the common columns in the tables being joined.
For an example of each one, say you had sales data exported from Marqeta's Salesforce CRM stored in a datawarehouse which had two tables: and .
An (which is a type of natural join) combines the two tables on the common
This query will return rows from the sales and tables that have matching customer id values. Only rows with matching values will be included in the results.
One main difference between cross joins and natural joins is that cross joins do not require any common columns between the tables being joined, while natural joins do. Another difference is that cross joins can create very large tables if the input tables have a large number of rows, while natural joins will only return a table with the number of rows equal to the number of matching rows in the input tables.
As part of Marqeta's internal audit system, they are interested in identifying all clients whose address contains the word "Avenue". Using the table provided below, write a SQL query to return all records where the client's address matches this criterion.
client_id | client_name | address |
---|---|---|
101 | James Smith | 123 Main Street |
102 | Jessica Johnson | 456 Maple Avenue |
103 | Roger Davis | 789 Oak Street |
104 | Laura Wilson | 1012 Pine Avenue |
105 | Bob Thomas | 345 Walnut Street |
This SQL query uses the keyword within the clause to filter the records from the table. Any record where the 'address' field contains the string "Avenue" will be returned. The '%' character is a wildcard that matches any number of characters, so it allows for the matching string to be at any position within the address field. The result will be a table containing all client records where the address contains "Avenue".
Please note that queries are case sensitive in PostgreSQL, so be careful when matching case-sensitive patterns.
Given two tables, and , the task is to write a SQL query that returns a list of top 5 customers who generated the highest revenue for Marqeta.
cust_id | first_name | last_name | sign_up_date | balance |
---|---|---|---|---|
101 | Tom | Holland | 2018-03-17 | 500 |
102 | Jasmine | Paul | 2016-02-26 | 2000 |
103 | Morgan | Freeman | 2015-01-01 | 10000 |
104 | James | Cameron | 2019-12-31 | 350 |
105 | Amelia | Vega | 2018-03-02 | 4500 |
trans_id | cust_id | trans_date | product_id | amount |
---|---|---|---|---|
8001 | 101 | 2021-06-01 | 201 | 300 |
8002 | 102 | 2021-06-20 | 202 | 700 |
8003 | 103 | 2021-07-17 | 203 | 500 |
8004 | 104 | 2021-08-31 | 204 | 700 |
8005 | 105 | 2021-10-10 | 205 | 650 |
In this query, we're using a JOIN statement to combine the and tables based on . We're then using an aggregation function (SUM) to calculate the total revenue for each customer (). The results are ordered in descending order of and limited to the top 5 customers.
Since join questions come up routinely during SQL interviews, practice this Snapchat JOIN SQL interview question:
Using a join in SQL, you can retrieve data from multiple tables and merge the results into a single table.
In SQL, there are four distinct types of JOINs. To demonstrate each kind, Imagine you had two database tables: an table that contains data on Google Ads keywords and their bid amounts, and a table with information on product sales and the Google Ads keywords that drove those sales.
: An INNER JOIN retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the table and the table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the table matches the keyword in the table.
: A LEFT JOIN retrieves all rows from the left table (in this case, the table) and any matching rows from the right table (the Sales table). If there is no match in the right table, values will be returned for the right table's columns.
: A RIGHT JOIN retrieves all rows from the right table (in this case, the Sales table) and any matching rows from the left table (the table). If there is no match in the left table, values will be returned for the left table's columns.
: A FULL OUTER JOIN retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, values will be returned for the columns of the non-matching table.
The key to acing a Marqeta SQL interview is to practice, practice, and then practice some more! Beyond just solving the earlier Marqeta SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like FAANG and tech startups.
Each DataLemur SQL question has multiple hints, detailed solutions and crucially, there's an online SQL coding environment so you can right in the browser run your query and have it checked.
To prep for the Marqeta SQL interview you can also be wise to solve SQL problems from other tech companies like:
However, if your SQL coding skills are weak, forget about diving straight into solving questions – strengthen your SQL foundations with this free SQL for Data Analytics course.
This tutorial covers things like RANK() window functions and UNION – both of these pop up often during SQL job interviews at Marqeta.
In addition to SQL query questions, the other topics tested in the Marqeta Data Science Interview are:
The best way to prepare for Marqeta Data Science interviews is by reading Ace the Data Science Interview. The book's got: