Structural Break Detection — ADIA Lab Challenge Solution
Below I detail my solution that earned 7th place in the ADIA Lab Structural Break Challenge with an ROC AUC score of 88.34% on the out-of-sample dataset. This solution aims to build a general and interpretable feature-based classifier capable of detecting various types of structural breaks in univariate time series.
Acknowledgments
I would like to express my sincere gratitude to the ADIA Lab competition organizers and the CrunchDAO team for hosting such a challenging and intellectually stimulating competition. Their commitment to advancing the field of time series analysis and making this opportunity accessible to a global audience is deeply appreciated.
Competition Overview
The challenge, titled “Can You Predict the Break?”, focuses on detecting structural changes in univariate time series. Each time series contains a potential break point where the underlying data generating process may have changed. The goal is to build a model that outputs the likelihood (between 0 and 1) that a structural break has occurred.
This problem is highly relevant in multiple domains, such as:
- Climatology: Detecting regime shifts in temperature or precipitation patterns.
- Industrial monitoring: Identifying machinery degradation or fault onset.
- Finance: Recognizing shifts in volatility, regime switching, or market anomalies.
- Healthcare: Detecting sudden physiological changes from biomedical signals.
By leveraging a large labeled dataset, data-driven approaches such as feature extraction and machine learning can effectively model and anticipate structural changes across these contexts.
Solution Overview
My solution follows a feature-based supervised approach using a CatBoostClassifier trained on thousands of carefully engineered features extracted from multiple time series regions and transformations.
The solution is designed to handle a wide variety of break types, including abrupt, gradual, subtle, variance, persistence, regime switching, multiple, dependence, distributional, predictive, spatial, causal, and threshold-triggered breaks. To effectively capture these, I extract features across multiple temporal regions:
- Full series
- Before and After segments
- Adaptive break regions (fractions of 0.1, 0.3, and 0.6)
For each region, I compute feature ratios (e.g., after_features/before_features) to emphasize relative shifts and remove scale-sensitive metrics, ensuring generalization across domains.
Feature Extraction Pipeline
The core idea is to capture both statistical and structural properties of the time series through layered extractors:
- Frequency Features Extractor: Captures dominant periodicities and spectral changes.
- Comparison Features Extractor: Performs statistical tests (e.g., Levene, Mann–Whitney, Bartlett) between “before” and “after” segments.
- Comprehensive Features Extractor: Combines multiple feature groups, including:
- Basic statistics (mean, variance, skewness, kurtosis, etc.)
- Inequality metrics (Gini, Theil, entropy, etc.)
- Entropy-based features (Shannon, Sample, Permutation, etc.)
- Rolling and dispersion metrics (mean absolute deviation, std ratio, etc.)
- Signal-based measures (energy, spectral entropy, turning points, etc.)
- Quantile slope and tail concentration features, among others
- Complexity and recurrence measures (RQA, Lempel–Ziv, etc.)
- Linear trend and ADF stationarity statistics, and more
In total, the extractor computes several hundred time-series descriptors across different transformations and window segments.
These extractors are applied to multiple data variants—raw, cumulative sum, and absolute-transformed signals—to enhance sensitivity to both mean-level and volatility-based breaks.
The complete feature extraction workflow is illustrated below. It summarizes how the data flows through successive stages — from raw signal processing and regional feature extraction to ratio computation and model training.
Code Snippet: Feature Extraction Pipeline
Below is a simplified excerpt of the core feature extraction logic. The full pipeline processes the time series in multiple segments and transformations, then computes ratio-based features to highlight relative changes:
def extract_comperhensive_features(x: np.ndarray, break_index=None):
df_features = []
data_variants = {
"raw": x,
"cumsum": np.cumsum(x),
"abs": np.abs(x),
}
for name, v in data_variants.items():
simple = False if name == "raw" else True
features = extract_comperhensive_features_core(v, simple, break_index)
df = pd.DataFrame([features])
df.columns = [f"{name}_{col}" for col in df.columns]
df_features.append(df)
# Combine
final_df = pd.concat(df_features, axis=1)
return final_df
def extract_enhanced_features(df: pd.DataFrame) -> pd.DataFrame:
full_series = df["value"].values
# Split into before/after segments
before = df[df["period"] == 0]["value"].values
after = df[df["period"] == 1]["value"].values
# Extract features from full series, segments, and ratios
full_feats = extract_comprehensive_features(full_series)
before_feats = extract_comprehensive_features(before)
after_feats = extract_comprehensive_features(after)
ratio_feats = after_feats / before_feats # Emphasize relative change
# Also process adaptive windows near the break (e.g., last 10% of 'before')
for frac in [0.1, 0.3, 0.6]:
win_before = before[-int(len(before)*frac):]
win_after = after[:int(len(after)*frac)]
# ... extract features and ratios for these windows
# Add statistical tests (e.g., Levene, Mann-Whitney) between segments
test_feats = extract_comparison_features(before, after)
return combine_all_features(full_feats, before_feats, after_feats,
ratio_feats, test_feats, ...)Each call to extract_comprehensive_features computes dozens of robust, scale-invariant metrics (e.g., dispersion ratios, entropy, tail concentration, recurrence statistics) on variants of the signal (raw, absolute, cumulative sum). This multi-view, ratio-based strategy enables the model to generalize across diverse break types and domains.
Model Training and Rationale
Extracted features are grouped by id and used as input to a CatBoostClassifier with the following configuration:
- Iterations: 794
- Learning rate: 0.023
- Depth: 6
- Evaluation metric: AUC
Initially, the model exhibited heavy reliance on a single feature — index of dispersion (full series) — contributing up to 42% of the total feature importance. Through iterative refinement and the addition of more diverse and domain-independent features, I successfully reduced this dominance to about 24%, achieving a more balanced and generalizable classifier.
Feature Importance
After training the CatBoostClassifier on the enhanced feature set, I analyzed the feature importances to identify which descriptors most influenced the model’s predictions. The results show a strong emphasis on dispersion and statistical comparison metrics.
Top 5 Most Important Features
| Feature | Importance |
|---|---|
| val_raw_index_of_dispersion_mag | 23.37 |
| bp1_fligner_s | 1.75 |
| val_fligner_s | 1.68 |
| segment_ratio_abs_ratio_value_number_to_time_series_length | 1.29 |
| bp1_bhattacharyya_distance | 1.19 |
Total Importance per Segment
| Prefix | Total Importance |
|---|---|
| val_ | 39.72 |
| segment_ | 21.85 |
| bp1_ | 15.41 |
| bp3_ | 12.76 |
| bp2_ | 10.12 |
Features derived from the full-series statistics (val_*) contributed the most to predictive performance, followed by segment-level ratio and breakpoint-based comparisons (bp1_*, bp2_*, bp3_*). This suggests the model relies heavily on global dispersion and comparative stability features to detect structural breaks.
The feature space includes many additional metrics beyond the top 5 — covering statistical, spectral, entropy-based, and signal-processing domains — ensuring diverse coverage across different types of structural changes.
Reflections and Insights
This solution was developed through a mix of intuition, iteration, and experimentation rather than strict theoretical guarantees. Many design choices came from testing different feature combinations and observing what seemed to capture changes more effectively. While I don’t have formal statistical evidence for each component, the following are some practical insights I gathered along the way:
- Using multiple time segments felt useful for covering different types of breaks — whether abrupt or gradual — since each region highlights a different aspect of the signal.
- Taking ratios between “before” and “after” features seemed to stabilize the model across scales and domains, making the features more comparable across series.
- Combining diverse types of features — from statistical summaries to entropy and trend measures — gave the model more flexibility to detect various break patterns.
- CatBoost worked reliably without much preprocessing and handled mixed feature relationships well, which made it a solid final choice.
Overall, the final setup is the result of many small iterations rather than a single theoretical insight — guided more by curiosity and empirical tuning than by formal proofs.
Experiments and Iterations
This section summarizes several experimental setups I tested while refining the feature extraction and windowing strategies. Each variation aimed to better capture the most informative break patterns between the “before” and “after” periods.
- Fixed Window Splits: The data was split into exact proportions of the segment length:
before_win = before[-int(0.1 * len(after)):]This process was repeated for window sizes of 0.1, 0.3, and 0.6.
after_win = after[:int(0.1 * len(after))] - Adaptive Informative Windows: Iterative tests were run for split ratios from 0.1 to 0.9 (step 0.1) to identify the most predictive region. The best performance came from:
before_win = int(0.1 * len(before))This setup reached a ROC AUC of 85–87%, though features like
after_win = int(0.6 * len(after))full_series_index_of_dispersionstill ranked highly in importance. - Half-Segment Feature Interactions: The “before” segment was further split into halves:
before_start = before[:len(before)//2]Features were extracted from before_start, before_end, and after, then combined through multiple interaction forms:
before_end = before[len(before)//2:]interaction1 = after_features / before_start_features
interaction2 = after_features / before_end_features
interaction = interaction1 * interaction2 - Signal Transformations: Applied advanced transformations to enhance sensitivity:
- Hilbert Transform
- Teager-Kaiser Energy Operator
- Moving Average
- Moving Energy Average
More experimental results will be added later as I continue refining and documenting the work.