Biathlon Big-Break Section Explanation
Overview
The big-break section is the application layer of the breakthrough prediction system in biathlon analysis. This section takes the trained models from feat-select-break and applies them to predict 2026 breakthrough candidates, generate historical comparisons, and export results to Excel files. It represents the culmination of the breakthrough analysis pipeline.
Main Functions and Components
1. Core Function: predict_2026_breakthroughs()
This is the central prediction function that applies trained models to identify 2026 breakthrough candidates.
Input Validation and Setup
if (!is.data.frame(current_data)) stop("current_data is not a data frame")
if (nrow(current_data) == 0) stop("current_data is empty")
if (is.null(breakthrough_model)) stop("breakthrough_model is NULL")
if (is.null(top_predictors) || length(top_predictors) == 0) stop("No top_predictors provided")
Biathlon-Specific Predictor Mapping
predictor_mapping <- c(
"Prev_Pelo" = "Pelo",
"Prev_Individual" = "Individual_Pelo",
"Prev_Sprint" = "Sprint_Pelo",
"Prev_Pursuit" = "Pursuit_Pelo",
"Prev_MassStart" = "MassStart_Pelo",
"Prev_Pct_of_Max_Points" = "Pct_of_Max_Points",
"Age" = "Age"
)
2. Career History Analysis
2025 Season Data Extraction
- Focuses on 2025 as the most recent complete season
- Takes most recent entry for each skier to avoid duplicates
- Validates data availability and reports skier counts
Career Maximum Calculation
career_maximums <- current_data %>%
filter(!is.na(Pct_of_Max_Points)) %>%
group_by(Skier) %>%
summarise(
Career_Max_Pct = max(Pct_of_Max_Points, na.rm = TRUE),
.groups = "drop"
)
Breakthrough Candidate Identification
Inclusion Criteria:
- Age ≥ 16 (junior age minimum)
- Career_Max_Pct < 0.4 (never achieved 40% breakthrough - biathlon specific)
- Pct_of_Max_Points > 0.01 (has competitive results in 2025)
- Complete data for key variables
Notable Features:
- No upper age limit: Recognizes breakthrough can occur at any career stage
- 40% threshold: Biathlon-specific breakthrough definition
- Career-based filtering: Excludes athletes who already achieved breakthrough
3. Debug Functionality
Athlete-Specific Debugging
if ("Campbell Wright" %in% prediction_data$Skier) {
wright_idx <- which(prediction_data$Skier == "Campbell Wright")
cat("\n=== DEBUG: Campbell Wright Breakthrough Model Input ===\n")
# Display detailed input data for verification
}
if ("Jeanne Richard" %in% prediction_data$Skier) {
# Similar debugging for ladies' representative athlete
}
Comprehensive Data Quality Checks
- Missing value analysis for each predictor
- Range validation and extreme value detection
- Factor level compatibility with training data
- Infinite value detection and replacement
4. Model Application and Prediction
Feature Mapping Process
for (prev_feature in names(feature_mapping)) {
current_feature <- feature_mapping[[prev_feature]]
if (prev_feature %in% top_predictors && current_feature %in% names(prediction_data)) {
prediction_data[[prev_feature]] <- prediction_data[[current_feature]]
cat(sprintf("✓ Mapped %s -> %s\n", current_feature, prev_feature))
}
}
Prediction Generation
breakthrough_probs <- tryCatch({
predict(breakthrough_model, newdata = prediction_clean, type = "prob")
}, error = function(e) {
# Fallback with na.action = na.pass
predict(breakthrough_model, newdata = prediction_clean, type = "prob", na.action = na.pass)
})
5. Results Processing and Classification
Probability-Based Classification
Likelihood = case_when(
is.na(Breakthrough_Prob) ~ "Unknown",
Breakthrough_Prob >= 0.6 ~ "Very High",
Breakthrough_Prob >= 0.4 ~ "High",
Breakthrough_Prob >= 0.2 ~ "Moderate",
Breakthrough_Prob >= 0.1 ~ "Low",
TRUE ~ "Very Low"
)
Performance Metrics
- Points to Threshold:
pmax(0, 0.4 - Pct_of_Max_Points, na.rm = TRUE) - Age-based filtering: Under-25 subset for young prospects
- Probability distributions: Summary statistics and likelihood categories
6. Excel Export System
Breakthrough Candidates Files
# Men's breakthrough candidates
men_file <- file.path(output_dir, "mens_breakthrough_candidates_2026.xlsx")
write.xlsx(men_breakthrough_workbook, men_file, rowNames = FALSE)
# Ladies breakthrough candidates
ladies_file <- file.path(output_dir, "ladies_breakthrough_candidates_2026.xlsx")
write.xlsx(ladies_breakthrough_workbook, ladies_file, rowNames = FALSE)
Historical Comparison Files
# Comparative analysis with historical breakthroughs
comparative_men_file <- file.path(output_dir, "mens_breakthrough_comparison_historical_vs_2026.xlsx")
comparative_ladies_file <- file.path(output_dir, "ladies_breakthrough_comparison_historical_vs_2026.xlsx")
7. Historical Breakthrough Comparison
Function: predict_historical_breakthrough()
- Applies 2026 models to historical breakthrough cases
- Maps current performance to “previous” variables for model compatibility
- Handles missing values with median imputation
- Generates “what would the model have predicted” probabilities
Comparative Analysis Structure
Excel Output Columns:
- Name, Nation, Age
- Pre-Breakthrough Pct (performance before breakthrough)
- Breakthrough Result (actual breakthrough performance)
- Predicted Prob (what model predicted)
- Season, Type (“Historical Success” vs “2026 Prediction”)
8. Comprehensive Result Validation
Multi-Level Validation
- Input validation: Model existence, predictor availability
- Data quality: Missing values, infinite values, extreme outliers
- Model compatibility: Factor levels, data structure alignment
- Output validation: Probability ranges, result completeness
- Export validation: File creation, data integrity
Error Handling Strategy
tryCatch({
# Main operation
}, error = function(e) {
cat("Detailed error context:", e$message, "\n")
# Fallback procedures or graceful failure
})
Key Technical Features
Sport-Specific Adaptations
Biathlon Breakthrough Definition
- 40% threshold: Represents significant competitive achievement
- Career-based exclusion: Athletes who already achieved breakthrough
- No age restrictions: Recognizes breakthrough can occur at various career stages
ELO Rating Integration
- Maps discipline-specific ELO ratings (Individual, Sprint, Pursuit)
- Handles MassStart exclusions due to data quality issues
- Maintains consistency with training data preparation
Advanced Data Processing
Missing Value Strategy
- Uses quartile imputation to match training preparation
- Preserves data distribution characteristics
- Reports imputation statistics for transparency
Feature Engineering
- Dynamic mapping between training and prediction features
- Handles temporal data structure (Prev_ → current mapping)
- Validates predictor availability and compatibility
Integration with Analysis Pipeline
Upstream Dependencies
- Models: From
feat-select-breaksection (logistic regression, random forest) - Predictors: Top-ranked features from importance analysis
- Thresholds: Sport-specific breakthrough definitions
- Data: Cleaned training datasets with career histories
Downstream Outputs
- Excel files: Formatted for analysis and decision-making
- Probability scores: Quantitative breakthrough likelihood
- Historical validation: Model reliability assessment
- Debug information: Comprehensive data lineage
Expected Outputs and Results
Console Reporting
- Candidate identification statistics
- Age distribution analysis
- Probability distribution summaries
- High-potential candidate identification
- Export confirmation and file locations
Excel Files Generated
mens_breakthrough_candidates_2026.xlsxladies_breakthrough_candidates_2026.xlsxmens_breakthrough_comparison_historical_vs_2026.xlsxladies_breakthrough_comparison_historical_vs_2026.xlsx- Debug files: Comprehensive predictor and prediction data
Data Objects Created
breakthrough_predictions_men: Complete men’s prediction resultsbreakthrough_predictions_ladies: Complete ladies’ prediction results- Age-stratified subsets (under-25 prospects)
- Summary statistics and performance metrics
Analytical Insights
- Quantitative Rankings: Probability-based candidate prioritization
- Historical Validation: Model performance on past breakthrough cases
- Age Patterns: Young prospect identification and analysis
- Performance Gaps: Points needed to reach breakthrough threshold
Quality Assurance Features
Comprehensive Debugging
- Athlete-specific tracking: Campbell Wright, Jeanne Richard verification
- Data lineage: Input → processing → output validation
- Statistical summaries: Distribution analysis and outlier detection
- Model diagnostics: Prediction quality and reliability assessment
Error Recovery
- Graceful degradation: Continues analysis when components fail
- Fallback procedures: Alternative approaches for edge cases
- Detailed logging: Error context and troubleshooting information
- Data validation: Multi-stage quality checks throughout pipeline
This section represents the practical application of the breakthrough prediction system, transforming statistical models into actionable insights for identifying future biathlon stars and validating the approach against historical breakthrough patterns.