Back to questions
Intuit, a company known for its tax filing products like TurboTax and QuickBooks, offers multiple versions of these products.
Write a query that identifies the user IDs of individuals who have filed their taxes using any version of TurboTax for three or more consecutive years. Each user is allowed to file taxes once a year using a specific product. Display the output in the ascending order of user IDs.
Column Name | Type |
---|---|
filing_id | integer |
user_id | varchar |
filing_date | datetime |
product | varchar |
filing_id | user_id | filing_date | product |
---|---|---|---|
1 | 1 | 4/14/2019 | TurboTax Desktop 2019 |
2 | 1 | 4/15/2020 | TurboTax Deluxe |
3 | 1 | 4/15/2021 | TurboTax Online |
4 | 2 | 4/07/2020 | TurboTax Online |
5 | 2 | 4/10/2021 | TurboTax Online |
6 | 3 | 4/07/2020 | TurboTax Online |
7 | 3 | 4/15/2021 | TurboTax Online |
8 | 3 | 3/11/2022 | QuickBooks Desktop Pro |
9 | 4 | 4/15/2022 | QuickBooks Online |
user_id |
---|
1 |
User 1 has consistently filed their taxes using TurboTax for 3 consecutive years. User 2 is excluded from the results because they missed filing in the third year and User 3 transitioned to using QuickBooks in their third year.
The dataset you are querying against may have different input & output - this is just an example!
Let's find the user IDs of individuals who have filed their taxes using any version of TurboTax for three or more consecutive years using the following steps:
Step 1: Filter TurboTax Filings
Start by filtering the data for users who filed their taxes using using TurboTax products.
The operator can be used to match the product names containing "TurboTax". To ensure case insensitivity, we convert the values in the column to lowercase using the function.
To filter the data by year, we can utilize the function which allows us to extract the specific components from a date. In this case, we want to extract only the year component from the .
Here, we're showing you the output for user id 1:
filing_id | user_id | filing_year |
---|---|---|
1 | 1 | 01/01/2019 00:00:00 |
2 | 1 | 01/01/2020 00:00:00 |
3 | 1 | 01/01/2021 00:00:00 |
Step 2: Retrieve Previous and Following Years
In the query, we create two additional columns, and using the and functions respectively. These functions allow us to access the previous and following years' filing dates for each user ordered by the filing date. The function is used to extract the year component from the column.
Displaying the results for user ID 1:
filing_id | user_id | filing_year | previous_year | following_year |
---|---|---|---|---|
1 | 1 | 01/01/2019 00:00:00 | 01/01/2020 00:00:00 | |
2 | 1 | 01/01/2020 00:00:00 | 01/01/2019 00:00:00 | 01/01/2021 00:00:00 |
3 | 1 | 01/01/2021 00:00:00 | 01/01/2020 00:00:00 |
We understand that it may seem complicated, but we're here to simplify it for you.
Let's break down the usage of the and functions and provide a clear interpretation of the output for user id 1.
The function takes the previous value for each row, while the function takes the subsequent value.
Now, let's interpret the output for user id 1:
By observing this pattern, we can see that the field represents the previous filing year, while the field represents the subsequent filing year.
Remember, the term "LAG" implies taking the value from the previous row, while "LEAD" suggests taking the value from the subsequent row.
Step 3: Implement Logic for Consecutive Filings
The query is organized within a CTE called .
In the main query, we focus on identifying cases of 3 or more consecutive filings. To achieve this, we compare the and values with the current within the WHERE clause.
It implies consecutive filings where:
We limit the results to users with a count of equal to or greater than 3 in the HAVING clause. This ensures we include only those users who filed taxes for at least 3 consecutive years.