Structural Break Detection — ADIA Lab Challenge Solution

Below I detail my solution that earned 7th place in the ADIA Lab Structural Break Challenge with an ROC AUC score of 88.34% on the out-of-sample dataset. This solution aims to build a general and interpretable feature-based classifier capable of detecting various types of structural breaks in univariate time series.

Acknowledgments

I would like to express my sincere gratitude to the ADIA Lab competition organizers and the CrunchDAO team for hosting such a challenging and intellectually stimulating competition. Their commitment to advancing the field of time series analysis and making this opportunity accessible to a global audience is deeply appreciated.

Competition Overview

The challenge, titled “Can You Predict the Break?”, focuses on detecting structural changes in univariate time series. Each time series contains a potential break point where the underlying data generating process may have changed. The goal is to build a model that outputs the likelihood (between 0 and 1) that a structural break has occurred.

This problem is highly relevant in multiple domains, such as:

  • Climatology: Detecting regime shifts in temperature or precipitation patterns.
  • Industrial monitoring: Identifying machinery degradation or fault onset.
  • Finance: Recognizing shifts in volatility, regime switching, or market anomalies.
  • Healthcare: Detecting sudden physiological changes from biomedical signals.

By leveraging a large labeled dataset, data-driven approaches such as feature extraction and machine learning can effectively model and anticipate structural changes across these contexts.

Solution Overview

My solution follows a feature-based supervised approach using a CatBoostClassifier trained on thousands of carefully engineered features extracted from multiple time series regions and transformations.

The solution is designed to handle a wide variety of break types, including abrupt, gradual, subtle, variance, persistence, regime switching, multiple, dependence, distributional, predictive, spatial, causal, and threshold-triggered breaks. To effectively capture these, I extract features across multiple temporal regions:

  • Full series
  • Before and After segments
  • Adaptive break regions (fractions of 0.1, 0.3, and 0.6)

For each region, I compute feature ratios (e.g., after_features/before_features) to emphasize relative shifts and remove scale-sensitive metrics, ensuring generalization across domains.

Feature Extraction Pipeline

The core idea is to capture both statistical and structural properties of the time series through layered extractors:

  • Frequency Features Extractor: Captures dominant periodicities and spectral changes.
  • Comparison Features Extractor: Performs statistical tests (e.g., Levene, Mann–Whitney, Bartlett) between “before” and “after” segments.
  • Comprehensive Features Extractor: Combines multiple feature groups, including:
    • Basic statistics (mean, variance, skewness, kurtosis, etc.)
    • Inequality metrics (Gini, Theil, entropy, etc.)
    • Entropy-based features (Shannon, Sample, Permutation, etc.)
    • Rolling and dispersion metrics (mean absolute deviation, std ratio, etc.)
    • Signal-based measures (energy, spectral entropy, turning points, etc.)
    • Quantile slope and tail concentration features, among others
    • Complexity and recurrence measures (RQA, Lempel–Ziv, etc.)
    • Linear trend and ADF stationarity statistics, and more

In total, the extractor computes several hundred time-series descriptors across different transformations and window segments.

These extractors are applied to multiple data variants—raw, cumulative sum, and absolute-transformed signals—to enhance sensitivity to both mean-level and volatility-based breaks.

The complete feature extraction workflow is illustrated below. It summarizes how the data flows through successive stages — from raw signal processing and regional feature extraction to ratio computation and model training.

flowchart TD classDef startEnd fill:#4f46e5,stroke:#3730a3,stroke-width:1px,color:#fff,rx:8,ry:8; classDef process fill:#eef2ff,stroke:#6366f1,stroke-width:1px,color:#111,rx:8,ry:8; classDef feature fill:#c7d2fe,stroke:#4f46e5,stroke-width:1px,color:#111,rx:8,ry:8; classDef model fill:#a5b4fc,stroke:#3730a3,stroke-width:1px,color:#111,rx:8,ry:8; A([Start]):::startEnd --> B["train()"]:::process B --> C["extract_enhanced_features(df)"]:::feature C --> D["Extract comprehensive features for full, before, and after segments"]:::feature D --> E["Compute ratios of before and after features"]:::feature E --> F["extract_comparison_features(before, after)"]:::feature F --> G["Split data into adaptive windows (10%, 30%, 60%)"]:::feature G --> H["Extract comprehensive features for each window"]:::feature H --> I["Compute before/after window ratios"]:::feature I --> J["extract_comparison_features(before_window, after_window)"]:::feature J --> K["Concatenate all extracted features"]:::process K --> L["Train CatBoostClassifier on final dataset"]:::model L --> M([End]):::startEnd

Code Snippet: Feature Extraction Pipeline

Below is a simplified excerpt of the core feature extraction logic. The full pipeline processes the time series in multiple segments and transformations, then computes ratio-based features to highlight relative changes:

def extract_comperhensive_features(x: np.ndarray, break_index=None):
    df_features = []
    data_variants = {
        "raw": x,
        "cumsum": np.cumsum(x),
        "abs": np.abs(x),
    }

    for name, v in data_variants.items():
        simple = False if name == "raw" else True
        features = extract_comperhensive_features_core(v, simple, break_index)
        df = pd.DataFrame([features])
        df.columns = [f"{name}_{col}" for col in df.columns]
        df_features.append(df)

    # Combine
    final_df = pd.concat(df_features, axis=1)
    return final_df
        
def extract_enhanced_features(df: pd.DataFrame) -> pd.DataFrame:
    
    full_series = df["value"].values

    # Split into before/after segments
    before = df[df["period"] == 0]["value"].values
    after  = df[df["period"] == 1]["value"].values

    # Extract features from full series, segments, and ratios
    full_feats   = extract_comprehensive_features(full_series)
    before_feats = extract_comprehensive_features(before)
    after_feats  = extract_comprehensive_features(after)
    ratio_feats  = after_feats / before_feats  # Emphasize relative change

    # Also process adaptive windows near the break (e.g., last 10% of 'before')
    for frac in [0.1, 0.3, 0.6]:
        win_before = before[-int(len(before)*frac):]
        win_after  = after[:int(len(after)*frac)]
        # ... extract features and ratios for these windows

    # Add statistical tests (e.g., Levene, Mann-Whitney) between segments
    test_feats = extract_comparison_features(before, after)

    return combine_all_features(full_feats, before_feats, after_feats, 
                               ratio_feats, test_feats, ...)

Each call to extract_comprehensive_features computes dozens of robust, scale-invariant metrics (e.g., dispersion ratios, entropy, tail concentration, recurrence statistics) on variants of the signal (raw, absolute, cumulative sum). This multi-view, ratio-based strategy enables the model to generalize across diverse break types and domains.

Model Training and Rationale

Extracted features are grouped by id and used as input to a CatBoostClassifier with the following configuration:

  • Iterations: 794
  • Learning rate: 0.023
  • Depth: 6
  • Evaluation metric: AUC

Initially, the model exhibited heavy reliance on a single feature — index of dispersion (full series) — contributing up to 42% of the total feature importance. Through iterative refinement and the addition of more diverse and domain-independent features, I successfully reduced this dominance to about 24%, achieving a more balanced and generalizable classifier.

Feature Importance

After training the CatBoostClassifier on the enhanced feature set, I analyzed the feature importances to identify which descriptors most influenced the model’s predictions. The results show a strong emphasis on dispersion and statistical comparison metrics.

Top 5 Most Important Features

FeatureImportance
val_raw_index_of_dispersion_mag23.37
bp1_fligner_s1.75
val_fligner_s1.68
segment_ratio_abs_ratio_value_number_to_time_series_length1.29
bp1_bhattacharyya_distance1.19

Total Importance per Segment

PrefixTotal Importance
val_39.72
segment_21.85
bp1_15.41
bp3_12.76
bp2_10.12

Features derived from the full-series statistics (val_*) contributed the most to predictive performance, followed by segment-level ratio and breakpoint-based comparisons (bp1_*, bp2_*, bp3_*). This suggests the model relies heavily on global dispersion and comparative stability features to detect structural breaks.

The feature space includes many additional metrics beyond the top 5 — covering statistical, spectral, entropy-based, and signal-processing domains — ensuring diverse coverage across different types of structural changes.

Reflections and Insights

This solution was developed through a mix of intuition, iteration, and experimentation rather than strict theoretical guarantees. Many design choices came from testing different feature combinations and observing what seemed to capture changes more effectively. While I don’t have formal statistical evidence for each component, the following are some practical insights I gathered along the way:

  • Using multiple time segments felt useful for covering different types of breaks — whether abrupt or gradual — since each region highlights a different aspect of the signal.
  • Taking ratios between “before” and “after” features seemed to stabilize the model across scales and domains, making the features more comparable across series.
  • Combining diverse types of features — from statistical summaries to entropy and trend measures — gave the model more flexibility to detect various break patterns.
  • CatBoost worked reliably without much preprocessing and handled mixed feature relationships well, which made it a solid final choice.

Overall, the final setup is the result of many small iterations rather than a single theoretical insight — guided more by curiosity and empirical tuning than by formal proofs.

Experiments and Iterations

This section summarizes several experimental setups I tested while refining the feature extraction and windowing strategies. Each variation aimed to better capture the most informative break patterns between the “before” and “after” periods.

  • Fixed Window Splits: The data was split into exact proportions of the segment length:before_win = before[-int(0.1 * len(after)):]
    after_win = after[:int(0.1 * len(after))]
    This process was repeated for window sizes of 0.1, 0.3, and 0.6.
  • Adaptive Informative Windows: Iterative tests were run for split ratios from 0.1 to 0.9 (step 0.1) to identify the most predictive region. The best performance came from:before_win = int(0.1 * len(before))
    after_win = int(0.6 * len(after))
    This setup reached a ROC AUC of 85–87%, though features like full_series_index_of_dispersion still ranked highly in importance.
  • Half-Segment Feature Interactions: The “before” segment was further split into halves:before_start = before[:len(before)//2]
    before_end = before[len(before)//2:]
    Features were extracted from before_start, before_end, and after, then combined through multiple interaction forms:interaction1 = after_features / before_start_features
    interaction2 = after_features / before_end_features
    interaction = interaction1 * interaction2
  • Signal Transformations: Applied advanced transformations to enhance sensitivity:
    • Hilbert Transform
    • Teager-Kaiser Energy Operator
    • Moving Average
    • Moving Energy Average

More experimental results will be added later as I continue refining and documenting the work.