In our earlier tutorial, you learned about window functions that help with ranking values. Now, we'll introduce you to another window function designed for analyzing time-series data.
LEAD()
and LAG()
are time-series window functions used to access data from rows that come after, or before the current row within a result set based on a specific column order.
Think of LEAD()
as a function that lets you peek into the future ๐ฎ, and LAG()
as a way to glance into the past โช. They're like time-travel functions for your data! We frequently use this to create a 7-day moving average metric, or a 28-day rolling count metric.
Here's the general syntax:
LEAD(column_name, offset) OVER ( -- Compulsory expression PARTITION BY partition_column -- Optional expression ORDER BY order_column) -- Compulsory expression LAG(column_name, offset) OVER ( -- Compulsory expression PARTITION BY partition_column -- Optional expression ORDER BY order_column) -- Compulsory expression
Let's use the following question as an example to illustrate the application of the LEAD()
and LAG()
functions in analyzing stock performance data.
To begin, execute this query to retrieve the foundational table that we'll be working with
SELECT * FROM stock_prices WHERE EXTRACT(YEAR FROM date) = 2023 AND ticker = 'GOOG' ORDER BY date;
Here's the data table:
date | ticker | open | high | low | close |
---|---|---|---|---|---|
01/01/2023 00:00:00 | GOOG | 89.83 | 101.58 | 85.57 | 99.87 |
02/01/2023 00:00:00 | GOOG | 99.74 | 108.82 | 88.86 | 90.30 |
03/01/2023 00:00:00 | GOOG | 90.16 | 107.51 | 89.77 | 104.00 |
04/01/2023 00:00:00 | GOOG | 102.67 | 109.63 | 102.38 | 108.22 |
05/01/2023 00:00:00 | GOOG | 107.72 | 127.05 | 104.50 | 123.37 |
Suppose you want to calculate the difference in closing prices between consecutive months of the year 2023 for the stock with ticker 'GOOG'. Here's the query to achieve that:
SELECT date, close, LEAD(close) OVER (ORDER BY date) AS next_month_close, LAG(close) OVER (ORDER BY date) AS prev_month_close FROM stock_prices WHERE EXTRACT(YEAR FROM date) = 2023 AND ticker = 'GOOG';
When we use the LEAD()
function, it's like taking a peek at the closing price of the next month (next_month_close
) and putting it alongside the current month's data (close
):
And, when we use the LAG()
function, it's like bringing back the closing price from the previous month (prev_month_close
) and placing it next to the current month's data (close
):
Now, since we have both values in the same row as the closing price, we can easily figure out the difference between closing prices of consecutive months and their dates:
SELECT date, close, LEAD(close) OVER (ORDER BY date) AS next_month_close, LEAD(close) OVER (ORDER BY date) - close as difference FROM stock_prices WHERE EXTRACT(YEAR FROM date) = 2023 AND ticker = 'GOOG';
Mathematically, this can be expressed as: next_month_close - close = difference
.
This tells us how much the closing price of the next month differs from the current month's closing price. For the last row, where there is no subsequent month, the difference
column contains NULL
.
Let's say we want to calculate the difference between the current month's closing price and the closing price from 3 months ago. Using the same question, we can do so using the offset
feature within the function itself:
SELECT date, close, LAG(close, 3) OVER (ORDER BY date) AS three_months_ago_close, close - LAG(close, 3) OVER (ORDER BY date) AS difference FROM stock_prices WHERE EXTRACT(YEAR FROM date) = 2023 AND ticker = 'GOOG';
The offset
value lets you decide how many rows you want to inspect, whether it's in the future or the past. By default, the offset
is set to 1, meaning it considers the very next or previous row.
For example, if you set the offset
to 1, it means you're looking 1 row ahead (future) or 1 row behind (past). If you set it to 2, you're looking 2 rows ahead or behind, and so on.
In the output below, the 3_mths_ago_close
column represents the closing price from 3 months ago calculated using the LAG()
function. The difference_from_past
column calculates the difference between the current month's closing price (close
) and the closing price from three months ago (3_mths_ago
).
Now that you know how LEAD()
and LAG()
functions work, here's some real-world scenarios where these functions might come in handy:
Scenario: Forecast inventory requirements by analyzing upcoming sales trends in order to adjust stock levels accordingly.
Example Query:
SELECT sales_date, product_id, sales_quantity, LEAD(sales_quantity) OVER (ORDER BY sales_date) AS next_day_sales FROM sales_data;
Results:
sales_date | product_id | sales_quantity | next_day_sales |
---|---|---|---|
2023-08-01 | A001 | 100 | 75 |
2023-08-02 | A001 | 75 | 50 |
2023-08-03 | A001 | 50 | 60 |
2023-08-04 | A001 | 60 | 80 |
2023-08-05 | A001 | 80 | 70 |
Test your understanding with a question below!
Scenario: Identify changes in user engagement metrics over time by analyzing page views.
Example Query:
SELECT visit_date, page_views, LAG(page_views) OVER (ORDER BY visit_date) AS prev_day_views, LEAD(page_views) OVER (ORDER BY visit_date) AS next_day_views FROM website_traffic;
Results:
visit_date | page_views | prev_day_views | next_day_views |
---|---|---|---|
2023-08-01 | 1200 | NULL | 1100 |
2023-08-02 | 1100 | 1200 | 1350 |
2023-08-03 | 1350 | 1100 | 980 |
2023-08-04 | 980 | 1350 | 1500 |
2023-08-05 | 1500 | 980 | 1050 |
Scenario: Optimize route planning based on historical travel times, assisting in identifying traffic patterns.
Example Query:
SELECT route_id, departure_time, arrival_time, LAG(arrival_time) OVER (PARTITION BY route_id ORDER BY departure_time) AS prev_arrival_time FROM route_schedule;
Results:
route_id | departure_time | arrival_time | prev_arrival_time |
---|---|---|---|
R001 | 08:00 AM | 10:30 AM | NULL |
R001 | 10:30 AM | 12:45 PM | 10:30 AM |
R001 | 12:45 PM | 03:00 PM | 12:45 PM |
R001 | 03:00 PM | 05:15 PM | 03:00 PM |
R001 | 05:15 PM | 07:30 PM | 05:15 PM |
Scenario: Monitor student performance and predict future academic outcomes, identifying areas of improvement or decline.
Example Query:
SELECT student_id, semester, final_grade, LAG(final_grade) OVER (PARTITION BY student_id ORDER BY semester) AS prev_semester_grade FROM student_scores;
Results:
student_id | semester | final_grade | prev_semester_grade |
---|---|---|---|
S001 | Fall 2022 | B | B |
S001 | Spring 2023 | A | B |
S001 | Fall 2023 | A | A |
S001 | Spring 2024 | B | A |
S001 | Fall 2024 | B | B |
Practice LEAD()
and LAG()
functions in this SQL interview question asked by furniture e-commerce company Wayfair!
We'll revisit the concept of a SQL JOIN, but this time join a table to itself using a self-join!
Next Lesson
SQL SELF-JOINS ๐ค