Structural Break Detection — ADIA Lab Challenge Solution

Below I detail my solution that earned 7th place in the ADIA Lab Structural Break Challenge with an ROC AUC score of 88.34% on the out-of-sample dataset. This solution aims to build a general and interpretable feature-based classifier capable of detecting various types of structural breaks in univariate time series.

Acknowledgments

I would like to express my sincere gratitude to the ADIA Lab competition organizers and the CrunchDAO team for hosting such a challenging and intellectually stimulating competition. Their commitment to advancing the field of time series analysis and making this opportunity accessible to a global audience is deeply appreciated.

Competition Overview

The challenge, titled “Can You Predict the Break?”, focuses on detecting structural changes in univariate time series. Each time series contains a potential break point where the underlying data generating process may have changed. The goal is to build a model that outputs the likelihood (between 0 and 1) that a structural break has occurred.

This problem is highly relevant in multiple domains, such as:

Climatology: Detecting regime shifts in temperature or precipitation patterns.
Industrial monitoring: Identifying machinery degradation or fault onset.
Finance: Recognizing shifts in volatility, regime switching, or market anomalies.
Healthcare: Detecting sudden physiological changes from biomedical signals.

By leveraging a large labeled dataset, data-driven approaches such as feature extraction and machine learning can effectively model and anticipate structural changes across these contexts.

Solution Overview

My solution follows a feature-based supervised approach using a CatBoostClassifier trained on thousands of carefully engineered features extracted from multiple time series regions and transformations.

The solution is designed to handle a wide variety of break types, including abrupt, gradual, subtle, variance, persistence, regime switching, multiple, dependence, distributional, predictive, spatial, causal, and threshold-triggered breaks. To effectively capture these, I extract features across multiple temporal regions:

Full series
Before and After segments
Adaptive break regions (fractions of 0.1, 0.3, and 0.6)

For each region, I compute feature ratios (e.g., after_features/before_features) to emphasize relative shifts and remove scale-sensitive metrics, ensuring generalization across domains.

Feature Extraction Pipeline

The core idea is to capture both statistical and structural properties of the time series through layered extractors:

Frequency Features Extractor: Captures dominant periodicities and spectral changes.
Comparison Features Extractor: Performs statistical tests (e.g., Levene, Mann–Whitney, Bartlett) between “before” and “after” segments.
Comprehensive Features Extractor: Combines multiple feature groups, including:
- Basic statistics (mean, variance, skewness, kurtosis, etc.)
- Inequality metrics (Gini, Theil, entropy, etc.)
- Entropy-based features (Shannon, Sample, Permutation, etc.)
- Rolling and dispersion metrics (mean absolute deviation, std ratio, etc.)
- Signal-based measures (energy, spectral entropy, turning points, etc.)
- Quantile slope and tail concentration features, among others
- Complexity and recurrence measures (RQA, Lempel–Ziv, etc.)
- Linear trend and ADF stationarity statistics, and more

In total, the extractor computes several hundred time-series descriptors across different transformations and window segments.

These extractors are applied to multiple data variants—raw, cumulative sum, and absolute-transformed signals—to enhance sensitivity to both mean-level and volatility-based breaks.

The complete feature extraction workflow is illustrated below. It summarizes how the data flows through successive stages — from raw signal processing and regional feature extraction to ratio computation and model training.

flowchart TD classDef startEnd fill:#4f46e5,stroke:#3730a3,stroke-width:1px,color:#fff,rx:8,ry:8; classDef process fill:#eef2ff,stroke:#6366f1,stroke-width:1px,color:#111,rx:8,ry:8; classDef feature fill:#c7d2fe,stroke:#4f46e5,stroke-width:1px,color:#111,rx:8,ry:8; classDef model fill:#a5b4fc,stroke:#3730a3,stroke-width:1px,color:#111,rx:8,ry:8; A([Start]):::startEnd --> B["train()"]:::process B --> C["extract_enhanced_features(df)"]:::feature C --> D["Extract comprehensive features for full, before, and after segments"]:::feature D --> E["Compute ratios of before and after features"]:::feature E --> F["extract_comparison_features(before, after)"]:::feature F --> G["Split data into adaptive windows (10%, 30%, 60%)"]:::feature G --> H["Extract comprehensive features for each window"]:::feature H --> I["Compute before/after window ratios"]:::feature I --> J["extract_comparison_features(before_window, after_window)"]:::feature J --> K["Concatenate all extracted features"]:::process K --> L["Train CatBoostClassifier on final dataset"]:::model L --> M([End]):::startEnd

Code Snippet: Feature Extraction Pipeline

Below is a simplified excerpt of the core feature extraction logic. The full pipeline processes the time series in multiple segments and transformations, then computes ratio-based features to highlight relative changes:

def extract_comperhensive_features(x: np.ndarray, break_index=None):
    df_features = []
    data_variants = {
        "raw": x,
        "cumsum": np.cumsum(x),
        "abs": np.abs(x),
    }

    for name, v in data_variants.items():
        simple = False if name == "raw" else True
        features = extract_comperhensive_features_core(v, simple, break_index)
        df = pd.DataFrame([features])
        df.columns = [f"{name}_{col}" for col in df.columns]
        df_features.append(df)

    # Combine
    final_df = pd.concat(df_features, axis=1)
    return final_df
        
def extract_enhanced_features(df: pd.DataFrame) -> pd.DataFrame:
    
    full_series = df["value"].values

    # Split into before/after segments
    before = df[df["period"] == 0]["value"].values
    after  = df[df["period"] == 1]["value"].values

    # Extract features from full series, segments, and ratios
    full_feats   = extract_comprehensive_features(full_series)
    before_feats = extract_comprehensive_features(before)
    after_feats  = extract_comprehensive_features(after)
    ratio_feats  = after_feats / before_feats  # Emphasize relative change

    # Also process adaptive windows near the break (e.g., last 10% of 'before')
    for frac in [0.1, 0.3, 0.6]:
        win_before = before[-int(len(before)*frac):]
        win_after  = after[:int(len(after)*frac)]
        # ... extract features and ratios for these windows

    # Add statistical tests (e.g., Levene, Mann-Whitney) between segments
    test_feats = extract_comparison_features(before, after)

    return combine_all_features(full_feats, before_feats, after_feats, 
                               ratio_feats, test_feats, ...)

Each call to extract_comprehensive_features computes dozens of robust, scale-invariant metrics (e.g., dispersion ratios, entropy, tail concentration, recurrence statistics) on variants of the signal (raw, absolute, cumulative sum). This multi-view, ratio-based strategy enables the model to generalize across diverse break types and domains.

Model Training and Rationale

Extracted features are grouped by id and used as input to a CatBoostClassifier with the following configuration:

Iterations: 794
Learning rate: 0.023
Depth: 6
Evaluation metric: AUC

Initially, the model exhibited heavy reliance on a single feature — index of dispersion (full series) — contributing up to 42% of the total feature importance. Through iterative refinement and the addition of more diverse and domain-independent features, I successfully reduced this dominance to about 24%, achieving a more balanced and generalizable classifier.

Feature Importance

After training the CatBoostClassifier on the enhanced feature set, I analyzed the feature importances to identify which descriptors most influenced the model’s predictions. The results show a strong emphasis on dispersion and statistical comparison metrics.

Top 5 Most Important Features

Feature	Importance
val_raw_index_of_dispersion_mag	23.37
bp1_fligner_s	1.75
val_fligner_s	1.68
segment_ratio_abs_ratio_value_number_to_time_series_length	1.29
bp1_bhattacharyya_distance	1.19

Total Importance per Segment

Prefix	Total Importance
val_	39.72
segment_	21.85
bp1_	15.41
bp3_	12.76
bp2_	10.12

Features derived from the full-series statistics (val_*) contributed the most to predictive performance, followed by segment-level ratio and breakpoint-based comparisons (bp1_*, bp2_*, bp3_*). This suggests the model relies heavily on global dispersion and comparative stability features to detect structural breaks.

The feature space includes many additional metrics beyond the top 5 — covering statistical, spectral, entropy-based, and signal-processing domains — ensuring diverse coverage across different types of structural changes.

Reflections and Insights

This solution was developed through a mix of intuition, iteration, and experimentation rather than strict theoretical guarantees. Many design choices came from testing different feature combinations and observing what seemed to capture changes more effectively. While I don’t have formal statistical evidence for each component, the following are some practical insights I gathered along the way:

Using multiple time segments felt useful for covering different types of breaks — whether abrupt or gradual — since each region highlights a different aspect of the signal.
Taking ratios between “before” and “after” features seemed to stabilize the model across scales and domains, making the features more comparable across series.
Combining diverse types of features — from statistical summaries to entropy and trend measures — gave the model more flexibility to detect various break patterns.
CatBoost worked reliably without much preprocessing and handled mixed feature relationships well, which made it a solid final choice.

Overall, the final setup is the result of many small iterations rather than a single theoretical insight — guided more by curiosity and empirical tuning than by formal proofs.

Experiments and Iterations

This section summarizes several experimental setups I tested while refining the feature extraction and windowing strategies. Each variation aimed to better capture the most informative break patterns between the “before” and “after” periods.

Fixed Window Splits: The data was split into exact proportions of the segment length:before_win = before[-int(0.1 * len(after)):] after_win = after[:int(0.1 * len(after))]This process was repeated for window sizes of 0.1, 0.3, and 0.6.
Adaptive Informative Windows: Iterative tests were run for split ratios from 0.1 to 0.9 (step 0.1) to identify the most predictive region. The best performance came from:before_win = int(0.1 * len(before)) after_win = int(0.6 * len(after))This setup reached a ROC AUC of 85–87%, though features like full_series_index_of_dispersion still ranked highly in importance.
Half-Segment Feature Interactions: The “before” segment was further split into halves:before_start = before[:len(before)//2] before_end = before[len(before)//2:]Features were extracted from before_start, before_end, and after, then combined through multiple interaction forms:interaction1 = after_features / before_start_features interaction2 = after_features / before_end_features interaction = interaction1 * interaction2
Signal Transformations: Applied advanced transformations to enhance sensitivity:
- Hilbert Transform
- Teager-Kaiser Energy Operator
- Moving Average
- Moving Energy Average

More experimental results will be added later as I continue refining and documenting the work.