SQL Time-Series Window Functions: LEAD & LAG Tutorial

In our earlier tutorial, you learned about window functions that help with ranking values. Now, we'll introduce you to another window function designed for analyzing time-series data.

How do LEAD() and LAG() Window Functions work?

LEAD() and LAG() are time-series window functions used to access data from rows that come after, or before the current row within a result set based on a specific column order.

Think of LEAD() as a function that lets you peek into the future ๐Ÿ”ฎ, and LAG() as a way to glance into the past โช. They're like time-travel functions for your data! We frequently use this to create a 7-day moving average metric, or a 28-day rolling count metric.

Here's the general syntax:

LEAD(column_name, offset) OVER ( -- Compulsory expression PARTITION BY partition_column -- Optional expression ORDER BY order_column) -- Compulsory expression LAG(column_name, offset) OVER ( -- Compulsory expression PARTITION BY partition_column -- Optional expression ORDER BY order_column) -- Compulsory expression

LEAD AND LAG Example: Analyzing Stocks Data

Let's use the following question as an example to illustrate the application of the LEAD() and LAG() functions in analyzing stock performance data.

To begin, execute this query to retrieve the foundational table that we'll be working with

SELECT * FROM stock_prices WHERE EXTRACT(YEAR FROM date) = 2023 AND ticker = 'GOOG' ORDER BY date;

Here's the data table:

datetickeropenhighlowclose
01/01/2023 00:00:00GOOG89.83101.5885.5799.87
02/01/2023 00:00:00GOOG99.74108.8288.8690.30
03/01/2023 00:00:00GOOG90.16107.5189.77104.00
04/01/2023 00:00:00GOOG102.67109.63102.38108.22
05/01/2023 00:00:00GOOG107.72127.05104.50123.37

Suppose you want to calculate the difference in closing prices between consecutive months of the year 2023 for the stock with ticker 'GOOG'. Here's the query to achieve that:

SELECT date, close, LEAD(close) OVER (ORDER BY date) AS next_month_close, LAG(close) OVER (ORDER BY date) AS prev_month_close FROM stock_prices WHERE EXTRACT(YEAR FROM date) = 2023 AND ticker = 'GOOG';

When we use the LEAD() function, it's like taking a peek at the closing price of the next month (next_month_close) and putting it alongside the current month's data (close):

And, when we use the LAG() function, it's like bringing back the closing price from the previous month (prev_month_close) and placing it next to the current month's data (close):

Now, since we have both values in the same row as the closing price, we can easily figure out the difference between closing prices of consecutive months and their dates:

SELECT date, close, LEAD(close) OVER (ORDER BY date) AS next_month_close, LEAD(close) OVER (ORDER BY date) - close as difference FROM stock_prices WHERE EXTRACT(YEAR FROM date) = 2023 AND ticker = 'GOOG';

Mathematically, this can be expressed as: next_month_close - close = difference.

This tells us how much the closing price of the next month differs from the current month's closing price. For the last row, where there is no subsequent month, the difference column contains NULL.

SQL Window Function Example: Analyzing Stocks Data with Row Gaps

Let's say we want to calculate the difference between the current month's closing price and the closing price from 3 months ago. Using the same question, we can do so using the offset feature within the function itself:

SELECT date, close, LAG(close, 3) OVER (ORDER BY date) AS three_months_ago_close, close - LAG(close, 3) OVER (ORDER BY date) AS difference FROM stock_prices WHERE EXTRACT(YEAR FROM date) = 2023 AND ticker = 'GOOG';

The offset value lets you decide how many rows you want to inspect, whether it's in the future or the past. By default, the offset is set to 1, meaning it considers the very next or previous row.

For example, if you set the offset to 1, it means you're looking 1 row ahead (future) or 1 row behind (past). If you set it to 2, you're looking 2 rows ahead or behind, and so on.

In the output below, the 3_mths_ago_close column represents the closing price from 3 months ago calculated using the LAG() function. The difference_from_past column calculates the difference between the current month's closing price (close) and the closing price from three months ago (3_mths_ago).

Real-Life Scenarios with LEAD() and LAG()

Now that you know how LEAD() and LAG() functions work, here's some real-world scenarios where these functions might come in handy:

Retail Management: Forecasting Sales ๐Ÿ›๏ธ

Scenario: Forecast inventory requirements by analyzing upcoming sales trends in order to adjust stock levels accordingly.

Example Query:

SELECT sales_date, product_id, sales_quantity, LEAD(sales_quantity) OVER (ORDER BY sales_date) AS next_day_sales FROM sales_data;

Results:

sales_dateproduct_idsales_quantitynext_day_sales
2023-08-01A00110075
2023-08-02A0017550
2023-08-03A0015060
2023-08-04A0016080
2023-08-05A0018070

Test your understanding with a question below!

Web Analytics: Monitoring Web Traffic ๐ŸŒŽ

Scenario: Identify changes in user engagement metrics over time by analyzing page views.

Example Query:

SELECT visit_date, page_views, LAG(page_views) OVER (ORDER BY visit_date) AS prev_day_views, LEAD(page_views) OVER (ORDER BY visit_date) AS next_day_views FROM website_traffic;

Results:

visit_datepage_viewsprev_day_viewsnext_day_views
2023-08-011200NULL1100
2023-08-02110012001350
2023-08-0313501100980
2023-08-0498013501500
2023-08-0515009801050

Logistics and Transportation: Planning Routes ๐Ÿšš

Scenario: Optimize route planning based on historical travel times, assisting in identifying traffic patterns.

Example Query:

SELECT route_id, departure_time, arrival_time, LAG(arrival_time) OVER (PARTITION BY route_id ORDER BY departure_time) AS prev_arrival_time FROM route_schedule;

Results:

route_iddeparture_timearrival_timeprev_arrival_time
R00108:00 AM10:30 AMNULL
R00110:30 AM12:45 PM10:30 AM
R00112:45 PM03:00 PM12:45 PM
R00103:00 PM05:15 PM03:00 PM
R00105:15 PM07:30 PM05:15 PM

Education: Monitor Student Performance ๐Ÿ“š

Scenario: Monitor student performance and predict future academic outcomes, identifying areas of improvement or decline.

Example Query:

SELECT student_id, semester, final_grade, LAG(final_grade) OVER (PARTITION BY student_id ORDER BY semester) AS prev_semester_grade FROM student_scores;

Results:

student_idsemesterfinal_gradeprev_semester_grade
S001Fall 2022BB
S001Spring 2023AB
S001Fall 2023AA
S001Spring 2024BA
S001Fall 2024BB

LEAD LAG SQL Interview Question

Practice LEAD() and LAG() functions in this SQL interview question asked by furniture e-commerce company Wayfair!

What's Next: SQL Self-Joins

We'll revisit the concept of a SQL JOIN, but this time join a table to itself using a self-join!

SQL SELF-JOIN


Next Lesson

SQL SELF-JOINS ๐Ÿค