At Marqeta, SQL is used across the company for analyzing payment transactions and managing customer datasets in the fintech industry. That's why Marqeta almost always asks SQL questions during interviews for Data Science and Data Engineering positions.
Thus, to help you practice for the Marqeta SQL interview, here’s 10 Marqeta SQL interview questions – can you solve them?
Marqeta, a global modern card issuing platform, has a number of individual cardholders who make transactions using their cards. Your goal as a Data Analyst is to write a SQL Query to identify the most active cardholders in terms of transaction frequency. Assume high activity to be more than 30 transactions per month.
To solve this, we have the following table named "transactions".
transactions
Example Input:transaction_id | cardholder_id | transaction_date | transaction_amount |
---|---|---|---|
1567 | 231 | 06/07/2022 00:00:00 | 500 |
2546 | 342 | 06/12/2022 00:00:00 | 850 |
3564 | 231 | 06/14/2022 00:00:00 | 300 |
4892 | 415 | 07/20/2022 00:00:00 | 750 |
5011 | 342 | 07/24/2022 00:00:00 | 450 |
Here is an SQL PostgreSQL query to calculate the most active users by transaction count in a month.
SELECT cardholder_id, DATE_TRUNC('month',transaction_date) as month, COUNT(transaction_id) as transaction_count FROM transactions GROUP BY cardholder_id, month HAVING COUNT(transaction_id) > 30 ORDER BY transaction_count DESC ;
This query will group transactions by cardholder_id
and month
and count the number of transactions per cardholder per month. Then it filters out cardholders who made more than 30 transactions in a single month and returns them in descending order based on frequency of transactions.
Note: This is a hypothetical scenario and the threshold of 30 transactions is arbitrary. Calling a user who makes more than 30 transactions a "power user" might not apply in many situations and doesn't account for factors like the total amount of money transacted. For instance, a user who made 31 transactions totaling 3000. Adjust the threshold and the features such as transaction_amount
based on your specific needs.
To practice another SQL customer analytics question where you can code right in the browser and have your SQL solution instantly executed, try this Walmart SQL Interview Question:
Suppose you work for Marqeta, a company that provides card issuance and digital payment solutions and services. You've been tasked with analyzing transaction data to understand card usage patterns. Specifically, you've been asked to write a SQL query that computes the average number of transactions by card on a daily basis for each user.
For data, you have two main tables: cards
and transactions
cards
example input:
card_id | user_id |
---|---|
1 | A |
2 | A |
3 | B |
4 | B |
5 | C |
transactions
example input:
txn_id | card_id | txn_date |
---|---|---|
101 | 1 | 2022-08-01 |
102 | 1 | 2022-08-01 |
103 | 2 | 2022-08-02 |
104 | 1 | 2022-08-03 |
105 | 3 | 2022-08-01 |
106 | 4 | 2022-08-01 |
107 | 5 | 2022-08-01 |
108 | 3 | 2022-08-02 |
109 | 4 | 2022-08-02 |
110 | 2 | 2022-08-03 |
To compute the average daily transactions per card for each user, you can use the PostgreSQL window function. Here's an example query:
SELECT cards.user_id, COUNT(transactions.txn_id) OVER(PARTITION BY transactions.card_id, DATE(transactions.txn_date)) as daily_txns, AVG(count) OVER(PARTITION BY cards.user_id) as average_daily_txns FROM transactions JOIN cards ON transactions.card_id = cards.card_id ORDER BY cards.user_id
This query first joins the transactions
and cards
table using the card_id
. It then groups by card_id
and txn_date
to count the number of transactions each card has on every unique day (daily_txns
). Finally, it computes the average number of these daily transactions for each user by using the AVG() function over user_id. The ORDER BY
clause is used to sort the output by user_id
.
p.s. Window functions show up pretty often during SQL interviews, so practice the 27+ window function questions on DataLemur
To discover records in one table that are not present in another, you can utilize a LEFT JOIN
and filter out any NULL
values in the right-side table.
For example, say you had a table of Marqeta customers and a 2nd table of all purchases made with Marqeta. To find all customers who did not make a purchase, you'd use the following LEFT JOIN
SELECT customers.* FROM customers LEFT JOIN purchases ON customers.id = purchases.customer_id WHERE purchases.id IS NULL;
This query returns all rows from the customers table, along with any matching rows from the purchases table. If there is no matching row in the purchases table, NULL
values will be returned for all of the right table's columns. The WHERE
clause then filters out any rows where the purchases.id column is NULL
, leaving only customers who have not made a purchase.
Marqeta is a modern payment platform, enabling companies to create, manage, and optimize card-based payments. Imagine you are a part of the data engineering team at Marqeta. Your task is to design a database schema for tracking individual card transactions and retrieving aggregate expenditure for each user on a monthly basis.
Considerations:
How would you design your tables, relationships, and which columns belong to which tables, considering performance for an SQL query that retrieves a monthly breakdown of transactions per user?
Here are some sample transactions:
transactions
Example Input:transaction_id | card_id | transaction_date | amount |
---|---|---|---|
1001 | 2001 | 06/08/2022 00:00:00 | 50 |
1002 | 2002 | 06/10/2022 00:00:00 | 120 |
1003 | 2003 | 06/18/2022 00:00:00 | 75 |
1004 | 2001 | 07/26/2022 00:00:00 | 100 |
1005 | 2002 | 07/05/2022 00:00:00 | 80 |
cards
Example Input:card_id | user_id |
---|---|
2001 | 3001 |
2002 | 3002 |
2003 | 3001 |
Get a report of total expenditure for each user per month:
SELECT date_trunc('month', t.transaction_date) as mth, c.user_id, sum(t.amount) as total_amount FROM transactions t INNER JOIN cards c on t.card_id = c.card_id GROUP BY mth, c.user_id ORDER BY mth, c.user_id;
This query gives a breakdown of total monthly transactions for each user. It starts by truncating the transaction_date to retain only the month and year. It then groups the results by this date and user_id. 'Sum' calculates the total transaction amount per user per month. It gives the total expenditure for each user on Marqeta's platform each month.
Database denormalization is when you add redundancy to a database, and break typical normalization rules (codified by the 1st, 2nd, 3rd normal forms).
Denormalization is typically used to improve the performance of a database, particularly when the database is being for OLAP (Online Analytical Processing) purposes. By denormalizing a database, you can reduce the number of joins that are required to retrieve data, which can greatly improve the speed of queries since joins are costly and slow. However, denormalization can also introduce some problems, such as increased data redundancy and the need for more complex update and delete operations since data has been duplicated into multiple tables.
In general, denormalization should be used with caution and only after careful consideration of the trade-offs involved. It's typically better to start with a well-normalized database design then denormalize only if your database isn't scaling as well as you want.
As a data analyst at Marqeta, your manager asked you to filter down the customer data to identify specific customers. Write a SQL query to find all customers who live in California and either have made more than 10 transactions or have spent over $2000 in total.
Here are your tables:
customers
Example Input:customer_id | first_name | last_name | state |
---|---|---|---|
001 | John | Doe | California |
002 | Jane | Smith | New York |
003 | Bob | Johnson | California |
004 | Alice | Davis | California |
transactions
Example Input:transaction_id | customer_id | amount | transaction_date |
---|---|---|---|
0001 | 001 | 500 | 2022-09-01 |
0002 | 001 | 700 | 2022-09-02 |
0003 | 002 | 800 | 2022-09-02 |
0004 | 003 | 250 | 2022-09-03 |
0005 | 003 | 1850 | 2022-09-04 |
0006 | 004 | 1200 | 2022-09-05 |
0007 | 004 | 1800 | 2022-09-06 |
Here is your PostgreSQL solution:
SELECT c.customer_id, c.first_name, c.last_name, c.state, COUNT(t.transaction_id) AS transactions, SUM(t.amount) AS total_spent FROM customers c JOIN transactions t ON c.customer_id = t.customer_id WHERE c.state = 'California' GROUP BY c.customer_id HAVING COUNT(t.transaction_id) > 10 OR SUM(t.amount) > 2000;
This solution first joins the customers table with the transactions table on the customer_id column. Then, it filters to only customers in California. Finally, it counts the number of transactions and sums the amounts spent for each customer, filtering to only those who have made more than 10 transactions or have spent over $2000.
A cross join is a type of JOIN operation in SQL that creates a new table by combining each row from the first table with every row from the second table. It is also known as a cartesian join.
For example, say you worked on the Marketing Analytics team at Marqeta, and needed to understand what advertising copy (text) goes best with what advertising creative (the photo that gets posted with the advertisement copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
SELECT ad_copy.copy AS ad_copy, ad_creative.image_url AS ad_creative FROM ad_copy CROSS JOIN ad_creative;
A natural join, on the other hand, is a type of JOIN that combines rows from two or more tables based on their common columns. It is called a "natural" join because it is based on the natural relationship that exists between the common columns in the tables being joined.
For an example of each one, say you had sales data exported from Marqeta's Salesforce CRM stored in a datawarehouse which had two tables: sales
and marqeta_customers
.
An INNER JOIN
(which is a type of natural join) combines the two tables on the common customer_id field
SELECT * FROM sales INNER JOIN marqeta_customers ON sales.customer_id = marqeta_customers.id
This query will return rows from the sales and marqeta_customers
tables that have matching customer id values. Only rows with matching customer_id
values will be included in the results.
One main difference between cross joins and natural joins is that cross joins do not require any common columns between the tables being joined, while natural joins do. Another difference is that cross joins can create very large tables if the input tables have a large number of rows, while natural joins will only return a table with the number of rows equal to the number of matching rows in the input tables.
As part of Marqeta's internal audit system, they are interested in identifying all clients whose address contains the word "Avenue". Using the clients
table provided below, write a SQL query to return all records where the client's address matches this criterion.
clients
Example Input:client_id | client_name | address |
---|---|---|
101 | James Smith | 123 Main Street |
102 | Jessica Johnson | 456 Maple Avenue |
103 | Roger Davis | 789 Oak Street |
104 | Laura Wilson | 1012 Pine Avenue |
105 | Bob Thomas | 345 Walnut Street |
SELECT * FROM clients WHERE address LIKE '%Avenue%';
This SQL query uses the LIKE
keyword within the WHERE
clause to filter the records from the clients
table. Any record where the 'address' field contains the string "Avenue" will be returned. The '%' character is a wildcard that matches any number of characters, so it allows for the matching string to be at any position within the address field. The result will be a table containing all client records where the address contains "Avenue".
Please note that LIKE
queries are case sensitive in PostgreSQL, so be careful when matching case-sensitive patterns.
Given two tables, customers
and transactions
, the task is to write a SQL query that returns a list of top 5 customers who generated the highest revenue for Marqeta.
customers
Example Input:cust_id | first_name | last_name | sign_up_date | balance |
---|---|---|---|---|
101 | Tom | Holland | 2018-03-17 | 500 |
102 | Jasmine | Paul | 2016-02-26 | 2000 |
103 | Morgan | Freeman | 2015-01-01 | 10000 |
104 | James | Cameron | 2019-12-31 | 350 |
105 | Amelia | Vega | 2018-03-02 | 4500 |
transactions
Example Input:trans_id | cust_id | trans_date | product_id | amount |
---|---|---|---|---|
8001 | 101 | 2021-06-01 | 201 | 300 |
8002 | 102 | 2021-06-20 | 202 | 700 |
8003 | 103 | 2021-07-17 | 203 | 500 |
8004 | 104 | 2021-08-31 | 204 | 700 |
8005 | 105 | 2021-10-10 | 205 | 650 |
SELECT c.first_name, c.last_name, SUM(t.amount) as total_revenue FROM customers c JOIN transactions t ON c.cust_id = t.cust_id GROUP BY c.cust_id ORDER BY total_revenue DESC LIMIT 5;
In this query, we're using a JOIN statement to combine the customers
and transactions
tables based on cust_id
. We're then using an aggregation function (SUM) to calculate the total revenue for each customer (total_revenue
). The results are ordered in descending order of total_revenue
and limited to the top 5 customers.
Since join questions come up routinely during SQL interviews, practice this Snapchat JOIN SQL interview question:
Using a join in SQL, you can retrieve data from multiple tables and merge the results into a single table.
In SQL, there are four distinct types of JOINs. To demonstrate each kind, Imagine you had two database tables: an Advertising_Campaigns
table that contains data on Google Ads keywords and their bid amounts, and a Sales
table with information on product sales and the Google Ads keywords that drove those sales.
INNER JOIN
: An INNER JOIN retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the Advertising_Campaigns
table and the Sales
table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the Advertising_Campaigns
table matches the keyword in the Sales
table.
LEFT JOIN
: A LEFT JOIN retrieves all rows from the left table (in this case, the Advertising_Campaigns
table) and any matching rows from the right table (the Sales table). If there is no match in the right table, NULL
values will be returned for the right table's columns.
RIGHT JOIN
: A RIGHT JOIN retrieves all rows from the right table (in this case, the Sales table) and any matching rows from the left table (the Advertising_Campaigns
table). If there is no match in the left table, NULL
values will be returned for the left table's columns.
FULL OUTER JOIN
: A FULL OUTER JOIN retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL
values will be returned for the columns of the non-matching table.
The key to acing a Marqeta SQL interview is to practice, practice, and then practice some more!
Beyond just solving the earlier Marqeta SQL interview questions, you should also solve the 200+ SQL coding questions which come from companies like FAANG and tech startups.
Each DataLemur SQL question has multiple hints, detailed solutions and crucially, there's an online SQL coding environment so you can right in the browser run your query and have it checked.
To prep for the Marqeta SQL interview you can also be wise to solve SQL problems from other tech companies like:
However, if your SQL coding skills are weak, forget about diving straight into solving questions – strengthen your SQL foundations with this free SQL for Data Analytics course.
This tutorial covers things like RANK() window functions and UNION – both of these pop up often during SQL job interviews at Marqeta.
In addition to SQL query questions, the other topics tested in the Marqeta Data Science Interview are:
The best way to prepare for Marqeta Data Science interviews is by reading Ace the Data Science Interview. The book's got: