Mastering Feature Engineering for Time Series: A Comprehensive Guide

Understanding Feature Engineering in Time Series Data

The Essential Role of Lag Features in Time Series Analysis

Unpacking the Significance of Lag Features

Practical Python Implementation of Lag Features

Selecting Optimal Lag Values for Enhanced Performance

Exploring Rolling (Window) Features in Time Series

The Importance of Rolling Features for Time Series Models

Implementing Rolling Features in Python: A Step-by-Step Guide

Combining Lag and Rolling Features for Robust Feature Engineering

Avoiding Common Pitfalls in Time Series Feature Engineering

Lag vs. Rolling Features: When to Use Each

Conclusion: The Impact of Effective Feature Engineering on Time Series Forecasting

The Importance of Feature Engineering in Time Series Machine Learning

The success of machine learning pipelines largely hinges on feature engineering, which serves as the essential foundation for effective models. When it comes to handling time series data, two of the most powerful methods are lag features and rolling features. Mastering these techniques can significantly enhance your model’s performance for tasks such as sales forecasting, stock price prediction, and demand planning.

In this guide, we will delve into lag and rolling features, highlighting their significance and demonstrating Python implementation methods. We also aim to address potential challenges that may arise during implementation, complete with working code examples.

What is Feature Engineering in Time Series?

Feature engineering involves creating new input variables by transforming raw temporal data into features that enable machine learning models to recognize temporal patterns. Unlike static datasets, time series data holds a sequential structure, prompting analysts to understand how past observations influence future ones.

Most traditional machine learning models, such as XGBoost, LightGBM, and Random Forests, lack built-in capabilities for processing time. This underscores the necessity for specialized indicators that reflect the antecedent events. Lag and rolling features serve precisely this purpose.

What Are Lag Features?

A lag feature is essentially a past value of a variable that has been shifted forward in time to align with the current data point. For instance, today’s sales prediction can rely on yesterday’s sales data, as well as the sales data from the last seven and thirty days.

Why Lag Features Matter

Connection Across Time: Lag features illustrate relationships between different time periods.
Seasonality Representation: They allow the encoding of seasonal and cyclical patterns without complicated transformations.
Simplicity: Lag features are computationally straightforward while yielding clear results.
Model Compatibility: They work seamlessly with any machine learning model that utilizes tree structures or linear methods.

Implementing Lag Features in Python

import pandas as pd
import numpy as np

# Create a sample time series dataset
np.random.seed(42)
dates = pd.date_range(start="2024-01-01", periods=15, freq='D')
sales = [200, 215, 198, 230, 245, 210, 225, 260, 275, 240, 255, 290, 305, 270, 285]
df = pd.DataFrame({'date': dates, 'sales': sales})
df.set_index('date', inplace=True)

# Create lag features
df['lag_1'] = df['sales'].shift(1)
df['lag_3'] = df['sales'].shift(3)
df['lag_7'] = df['sales'].shift(7)

print(df.head(12))

The appearance of NaN values showcases a form of data loss that occurs due to lagging. This factor is crucial when determining how many lags to create.

Choosing the Right Lag Values

Selecting the optimal lags involves a systematic approach, avoiding random selections. Effective methods include:

Domain Knowledge: Understanding the nature of your data helps in determining appropriate lags. For instance, if dealing with weekly sales data, consider adding lags at 7, 14, and 28 days.
Autocorrelation Function (ACF): This statistical tool reveals which lags exhibit significant relationships to the target variable.
Training Model Evaluation: After training your model, identify which lags hold the highest significance.

What Are Rolling (Window) Features?

Rolling features operate as window features, moving through time to calculate various statistics. Instead of presenting a single past value, they provide aggregated metrics such as mean, median, standard deviation, minimum, and maximum values for the last N periods.

Why Rolling Features Matter

Noise Reduction: They eliminate noise while revealing fundamental growth patterns.
Short-term Observations: Rolling features help in observing price fluctuations within specific timeframes.
Anomaly Detection: They can identify unusual behavior when current values deviate from the established rolling averages.

Common rolling aggregation practices include:

Rolling Mean: Often used for trend smoothing.
Rolling Standard Deviation: Indicates the degree of variability over a specified time window.
Rolling Min/Max: Tracks the highest and lowest values during defined intervals.
Rolling Median: Provides accurate results for datasets with outliers and high noise levels.
Rolling Sum: Monitors total volumes or counts over time.

Implementing Rolling Features in Python

import pandas as pd
import numpy as np

np.random.seed(42)
dates = pd.date_range(start="2024-01-01", periods=15, freq='D')
sales = [200, 215, 198, 230, 245, 210, 225, 260, 275, 240, 255, 290, 305, 270, 285]
df = pd.DataFrame({'date': dates, 'sales': sales})
df.set_index('date', inplace=True)

# Rolling features with window sizes of 3 and 7
df['roll_mean_3'] = df['sales'].shift(1).rolling(window=3).mean()
df['roll_std_3'] = df['sales'].shift(1).rolling(window=3).std()
df['roll_max_3'] = df['sales'].shift(1).rolling(window=3).max()
df['roll_mean_7'] = df['sales'].shift(1).rolling(window=7).mean()

print(df.round(2))

It’s essential to utilize the .shift(1) function before the .rolling() function to ensure that rolling calculations rely solely on historical data.

Combining Lag and Rolling Features: A Production-Ready Example

In real-world machine learning workflows, combining both lag and rolling features into a hybrid feature set tends to yield the best results. Here’s a complete feature engineering function you can implement in your projects.

import pandas as pd
import numpy as np

def create_time_features(df, target_col, lags=[1, 3, 7], windows=[3, 7]):
    """
    Create lag and rolling features for time series ML.
    Parameters:
    df : DataFrame with datetime index
    target_col : Name of the target column
    lags : List of lag periods
    windows : List of rolling window sizes
    Returns:
    DataFrame with new features
    """
    df = df.copy()

    # Lag features
    for lag in lags:
        df[f'lag_{lag}'] = df[target_col].shift(lag)

    # Rolling features (shift by 1 to avoid leakage)
    for window in windows:
        shifted = df[target_col].shift(1)
        df[f'roll_mean_{window}'] = shifted.rolling(window).mean()
        df[f'roll_std_{window}'] = shifted.rolling(window).std()
        df[f'roll_max_{window}'] = shifted.rolling(window).max()
        df[f'roll_min_{window}'] = shifted.rolling(window).min()

    return df.dropna()  # Drop rows with NaN from lag/rolling

# Sample usage
np.random.seed(0)
dates = pd.date_range('2024-01-01', periods=60, freq='D')
sales = 200 + np.cumsum(np.random.randn(60) * 5)
df = pd.DataFrame({'sales': sales}, index=dates)
df_features = create_time_features(df, 'sales', lags=[1, 3, 7], windows=[3, 7])

print(f"Original shape: {df.shape}")
print(f"Engineered shape: {df_features.shape}")
print(f"\nFeature columns:\n{list(df_features.columns)}")
print(f"\nFirst few rows:\n{df_features.head(3).round(2)}")

Common Mistakes and How to Avoid Them

One of the most critical mistakes in time series feature engineering is data leakage, which can lead to misleading model performance. Here are some key pitfalls to be aware of:

Order of Operations: Always execute the .shift(1) command before implementing the .rolling() function. Otherwise, the current observation may be included in the rolling window.
Data Loss Through Lags: Each lag adds NaN rows. Be mindful of data loss, especially in smaller datasets.
Window Size Testing: Different data characteristics require different window sizes. Testing short windows (3 to 5) and long windows (14 to 30) is essential.
Production Data: Ensure rolling and lag features are computed from historical data used during inference, rather than training data.

When to Use Lag vs. Rolling Features

Use Case	Recommended Features
Strong autocorrelation in data	Lag features (lag-1, lag-7)
Noisy signal, need smoothing	Rolling mean
Seasonal patterns (weekly)	Lag-7, lag-14, lag-28
Trend detection	Rolling mean over long windows
Anomaly detection	Deviation from rolling mean
Capturing variability / risk	Rolling standard deviation, rolling range

Conclusion

Lag and rolling features form the backbone of effective time series machine learning infrastructure. They bridge the gap between unprocessed sequential data and the organized format necessary for model training. When executed with careful data handling and appropriate window selections, these features can significantly impact forecasting accuracy.

Best of all, they provide clear insights with minimal computing resources and are compatible with nearly any machine learning model. Whether you’re using XGBoost for demand forecasting or an LSTM for anomaly detection, these features can drive substantial improvements in performance.

Feel free to connect with me:
Gen AI Intern at Analytics Vidhya
Department of Computer Science, Vellore Institute of Technology, Vellore, India
I am working to innovate AI-driven solutions, empowering businesses to leverage data effectively. Let’s connect! [Your Email]

Login to continue reading and access expert-curated content.

Exclusive Content:

A Comprehensive Guide to Machine Learning for Time Series Analysis