At Lufax, SQL is often used for analyzing large financial databases for trend prediction, and managing customer profile data for personalized marketing strategies. For this reason Lufax almost always asks SQL query questions in interviews for Data Analyst, Data Science, and BI jobs.
Thus, to help you ace the Lufax SQL interview, we'll cover 11 Lufax SQL interview questions – able to solve them?
As a part of the business intelligence team in Lufax, an online peer-to-peer lending platform, you are asked to identify the 'power users'. These users are defined as those who have borrowed money more than 5 times in any given quarter of a year. Additionally, the total amount they have borrowed in that quarter should be more than $10,000.
The 'Loans' table keeps track of all the user transactions. An entry is made in this table every time a user borrows money. The details included are the user ID of the person borrowing the loan, the date when the loan was borrowed, and the amount borrowed.
loan_id | user_id | loan_date | amount |
---|---|---|---|
101 | 332 | 2022-01-02 | 2000 |
102 | 456 | 2022-01-05 | 3000 |
103 | 332 | 2022-01-06 | 2500 |
104 | 789 | 2022-01-07 | 4000 |
105 | 332 | 2022-01-15 | 3500 |
106 | 456 | 2022-01-20 | 5000 |
107 | 332 | 2022-02-03 | 2000 |
108 | 789 | 2022-02-05 | 6000 |
109 | 332 | 2022-03-06 | 4500 |
110 | 456 | 2022-03-07 | 2500 |
This query groups records by each user as well as by the quarter and year of the loan date. It then counts the number of loans and calculates the sum of the amount for each of such groups.
Finally, it filters these groups to keep only those groups where the number of loans is more than 5, and the total amount borrowed is greater than $10,000. These are the power users for Lufax for each quarter and year.
To practice a similar power-user data analysis problem question on DataLemur's free interactive SQL code editor, try this Microsoft Teams Power User SQL Interview Question:
Lufax is one of the leading online personal financial services platforms and its business operations include retail lending and wealth management. It possesses large amount of customer transactions data.
Suppose you're given a transaction dataset and you're tasked to analyze the average investment (loan) amount per customer each month. Your aim is to determine the penetration of Lufax's services to its customers across different months.
Remember, for each customer, use only the latest transaction in each month.
Here's the transaction table, :
transaction_id | customer_id | transaction_date | product_id | loan_amount |
---|---|---|---|---|
1011 | 435 | 01/08/2022 | 69001 | 1000 |
2307 | 999 | 01/15/2022 | 69001 | 900 |
4249 | 435 | 01/18/2022 | 69001 | 1200 |
3796 | 860 | 02/13/2022 | 69001 | 2000 |
8386 | 999 | 02/14/2022 | 69001 | 1500 |
Here's the expected table returned by your query:
yr_mn | avg_loan |
---|---|
2022-01 | 1050.00 |
2022-02 | 1750.00 |
You can use a windowing function in conjunction with a subquery to first determine the latest transaction per month for each customer and then average these transactions.
Here's a PostgreSQL query to do this:
This query works by first creating a subquery that includes a row number where we partition by and transaction month. Within each partition, we order by the in descending order. This way, the latest transaction each month gets the row number 1.
Then we filter out only those rows (representing the latest transactions per month for each customer) in the outer query, and take the average over the for each . This gives us the desired answer.
p.s. Window functions show up pretty often during SQL interviews, so practice the 27+ window function questions on DataLemur
A foreign key is a column or group of columns in a table that refers to the primary key in another table. The foreign key constraint helps maintain referential integrity between the two tables.
For example, let's look at the Lufax sales database:
lufax_sales:
+------------+------------+------------+------------+ | order_id | product_id | customer_id| quantity | +------------+------------+------------+------------+ | 1 | 222 | 1 | 2 | | 2 | 333 | 1 | 1 | | 3 | 444 | 2 | 3 | | 4 | 555 | 3 | 1 | +------------+------------+------------+------------+
In this table, and could both be foreign keys. They reference the primary keys of other tables, such as a Products table and a Customers table, respectively. This establishes a relationship between the table and the other tables, such that each row in the sales database corresponds to a specific product and a specific customer.
"
Imagine that you're a data analyst at Lufax, a leading online wealth management firm based in China. You're tasked with a project that involves analyzing the performance of different types of investments. Assume the database has a table named that stores information of every investment made by users, and another table that stores users' information.
investment_id | user_id | investment_type | amount_invested | created_at |
---|---|---|---|---|
1001 | 321 | Stocks | 1000 | 2022-01-01 00:00:00 |
1002 | 456 | Bonds | 2000 | 2022-01-01 00:00:00 |
1003 | 789 | Stocks | 3000 | 2022-02-01 00:00:00 |
1004 | 321 | Bonds | 4000 | 2022-02-01 00:00:00 |
1005 | 456 | Stocks | 5000 | 2022-03-01 00:00:00 |
user_id | name |
---|---|
321 | Alice |
456 | Bob |
789 | Charlie |
The project's goal is to find out the total amount invested by each user for each investment type for the year 2022.
Write a SQL query that returns each user's name, the investment type, and the total amount invested by the user in that particular type of investment.
This PostgreSQL query joins the and tables on , where the investment was made in the year 2022. It groups the data by the user's name and the investment type, then provides the total amount invested by each user for each type of investment. The results are ordered by the user's name and the total amount invested in descending order.
A self-join is a type of join in which a table is joined to itself. To perform a self-join, you need to specify the table name twice in the FROM clause, and give each instance of the table a different alias. You can then join the two instances of the table using a JOIN clause, and use a WHERE clause to specify the relationship between the rows.
Self-joins are the go-to technique for any data analysis that involves pairs of the same thing, like identifying pairs of products that are frequently purchased together like in this Walmart SQL interview question.
For another example, say you were doing an HR analytics project and needed to analyze how much all Lufax employees in the same department interact with each other. Here's a self-join query you could use to retrieve all pairs of Lufax employees who work in the same department:
This query returns all pairs of Lufax employees who work in the same department, and excludes pairs where the employee's id is the same (since this would represent the same Lufax employee being paired with themselves).
Lufax Financial Services wishes to filter out high value clients from their customer database. Clients are considered high value if they have more than 3 loans whose each original loan amount is greater than 50000 and their credit score is above 800. Could you write a SQL query to filter out these customers?
customer_id | lastName | firstName | creditScore |
---|---|---|---|
101 | Smith | John | 850 |
102 | Johnson | Emily | 700 |
103 | Williams | David | 820 |
104 | Jones | Emma | 790 |
loan_id | customer_id | loanAmount |
---|---|---|
201 | 101 | 70000 |
202 | 101 | 80000 |
203 | 101 | 60000 |
204 | 102 | 50000 |
205 | 103 | 51000 |
206 | 103 | 52000 |
207 | 103 | 53000 |
208 | 103 | 54000 |
This query first filters the loans table for high value loans and groups them by customer id, then applies a having clause to retain customers who have more than 3 high value loans. This result is then inner joined with the customers table to get the full customer details, then a final where clause is applied to filter for customers with a high credit score. Hence, we get the desired high value clients.
The primary key of a table is a column or combination of columns that serves to uniquely identify each row in the table. To define a primary key in a SQL database, you can use the constraint.
For instance, consider a table of :
In this example, the column is the primary key of the Lufax employees table.
Primary keys are important in databases for several reasons:
Lufax, being an online platform for the origination and trading of financial assets, has various types of products for its clients like loans, insurance, etc. Let's assume we interested in analyzing the loans data. We want to find out for each month, what's the average loan amount per product type. The product types could be varied like home loan, student loan, business loan, auto loan, etc.
Below is the sample data in the table:
loan_id | user_id | loan_date | product_type | loan_amount |
---|---|---|---|---|
4601 | 521 | 03/12/2020 | student_loan | 50000 |
8521 | 754 | 03/23/2020 | business_loan | 200000 |
6213 | 215 | 04/05/2020 | student_loan | 70000 |
5312 | 981 | 04/30/2020 | home_loan | 400000 |
9821 | 365 | 04/12/2020 | auto_loan | 120000 |
5791 | 521 | 04/05/2020 | business_loan | 150000 |
6912 | 521 | 05/24/2020 | student_loan | 60000 |
We're interested in the answer to the following question: for each month and product type, what is the average loan amount?
month | product_type | avg_loan_amount |
---|---|---|
3 | student_loan | 50000.00 |
3 | business_loan | 200000.00 |
4 | student_loan | 70000.00 |
4 | home_loan | 400000.00 |
4 | auto_loan | 120000.00 |
4 | business_loan | 150000.00 |
5 | student_loan | 60000.00 |
This query extracts the month from the loan_date, and groups by both the month and the product type. The AVG function is then used to calculate the average loan amount for each product type in each month. The output is then displayed with the average loan amount for each month and product type.
As a Data Analyst at Lufax, you are tasked with analyzing the investment behaviors of customers. You have two tables at your disposal: and . The table stores information about each customer, while the table keeps a record of all the investments made by all customers.
The table has the following schema:
customer_id | first_name | last_name | date_of_birth | account_created |
---|---|---|---|---|
122 | John | Doe | 1980-12-10 | 2012-02-26 |
345 | Jane | Doe | 1976-06-15 | 2010-06-01 |
334 | Mary | Johnson | 1990-04-21 | 2014-03-15 |
And the table:
investment_id | customer_id | product_id | investment_amount | investment_date |
---|---|---|---|---|
101 | 122 | A001 | 5000.00 | 2019-04-25 |
124 | 345 | B002 | 7500.00 | 2020-03-26 |
212 | 334 | A001 | 6500.00 | 2021-05-10 |
The task is to write a SQL query that finds the total investment each customer has made, sorted in descending order of the total investment. Also, join this with the customer information - first name, last name and date of birth.
This query starts by selecting the columns we want from the ('c') table - , , and . It also specifies that we want the sum of the from the ('i') table for each customer. This sum is given an alias of for clarity.
The clause is used to combine rows from these two tables based on the common field between them - .
The clause groups rows that have the same values in specified columns into aggregated data, where we can use aggregated functions like . In our case, it groups by , , and .
Finally, the clause is used to sort the result-set in descending order by .
Because joins come up so often during SQL interviews, practice this interactive Snapchat SQL Interview question using JOINS:
Cross joins and natural joins are two types of JOIN operations in SQL that are used to combine data from multiple tables. A cross join creates a new table by combining each row from the first table with every row from the second table, and is also known as a cartesian join. On the other hand, a natural join combines rows from two or more tables based on their common columns, forming a new table. One key difference between these types of JOINs is that cross joins do not require common columns between the tables being joined, while natural joins do.
Here's an example of a cross join:
If you have 20 products and 10 colors, that's 200 rows right there!
Here's a natural join example using two tables, Lufax employees and Lufax managers:
This natural join returns all rows from Lufax employees where there is no matching row in managers based on the column.
Lufax is a leading finance company. They offer various investment products with different interest rates. Your task is to write a SQL query to calculate the final amount of money an investor will get after a specific period, using the compound interest formula in PostgreSQL. The formula for compound interest is: P(1 + r/n)^(nt), where
Assume that each product compounds interest annually (n=1).
For simplicity, round the final result to two decimal places.
investment_id | user_id | product_id | principal_amount | annual_interest_rate | years |
---|---|---|---|---|---|
101 | 1 | 1 | 1000 | 0.05 | 5 |
102 | 2 | 2 | 2000 | 0.03 | 3 |
103 | 3 | 3 | 1500 | 0.04 | 4 |
104 | 4 | 1 | 5000 | 0.05 | 2 |
105 | 5 | 2 | 3000 | 0.03 | 1 |
investment_id | final_amount |
---|---|
101 | 1283.36 |
102 | 2189.41 |
103 | 1947.49 |
104 | 5512.50 |
105 | 3090.00 |
This SQL block will calculate the final amount of each investment after the specified years using the compound interest formula. It uses the function to calculate the power of interest compounded over time and function to round the final amount to two decimal places.
To practice a very similar question try this interactive JPMorgan Chase Card Launch Success Question which is similar for calculating numerical results in SQL or this Microsoft Teams Power Users Question which is similar for calculating top values based on specified criteria.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the earlier Lufax SQL interview questions, you should also solve the 200+ SQL Interview Questions on DataLemur which come from companies like Netflix, Google, and Amazon.
Each SQL question has hints to guide you, step-by-step solutions and best of all, there is an online SQL code editor so you can right online code up your SQL query answer and have it graded.
To prep for the Lufax SQL interview it is also wise to solve SQL questions from other tech companies like:
But if your SQL skills are weak, don't worry about jumping right into solving questions – improve your SQL foundations with this free SQL tutorial.
This tutorial covers SQL concepts such as Self-Joins and AND/OR/NOT – both of these pop up frequently during Lufax interviews.
Besides SQL interview questions, the other types of questions covered in the Lufax Data Science Interview are:
To prepare for Lufax Data Science interviews read the book Ace the Data Science Interview because it's got: