Biathlon Weekly Prediction Methodology

Overview

The biathlon weekly prediction system delivers comprehensive weekend forecasts across multiple race formats, uniquely handling the sport’s complex multi-day competition structure. This methodology extends beyond single-race predictions to model entire weekend scenarios, incorporating both individual and team events while maintaining the sophisticated PELO (Penalty-adjusted ELO) shooting accuracy integration that distinguishes biathlon from other winter sports.

Data Flow Architecture

1. Weekend Race Processing and Multi-Format Coordination

Python Weekend Orchestration Script (startlist-scrape-weekend.py)

  • Weekend-Centric Processing: Identifies and processes all races scheduled for a specific weekend date
  • Multi-Race Coordination: Handles up to 2 individual races plus multiple relay formats simultaneously
  • Race Format Classification: Automatically separates individual, relay, mixed relay, and single mixed relay events
  • Priority-Based Selection: Prioritizes earliest race dates when multiple options exist within a weekend

Weekend Race Selection Logic

# Weekend race identification and prioritization
next_weekend_races = weekends_df.filter(Date == next_weekend_date)
sorted_races = valid_races.sort_values('Race_Date')  # Earliest first
races_to_process = sorted_races.head(2)  # Maximum 2 individual races per weekend

Specialized Relay Weekend Processing

  • Standard Relay Weekend: Gender-specific 4x7.5km relay processing with team ELO aggregation
  • Mixed Relay Weekend: Combined gender team analysis for 2+2 format competitions
  • Single Mixed Relay Weekend: Alternating format processing with individual-team hybrid metrics
  • Skip Control Environment: Prevents duplicate R script execution through environment variable SKIP_WEEKLY_PICKS

2. Enhanced Multi-Race Startlist Assembly

Comprehensive Athlete Pool Creation

# Season-wide athlete inclusion for complete coverage
all_season_skiers_df = create_season_startlist(
    elo_path=elo_path,
    race_info=races_df.iloc[0],
    gender=gender,
    host_nation=host_nation,
    prob_column="temp_prob"
)

# Merge with race-specific startlists
consolidated_df = merge_race_dataframes(consolidated_df, race_df, prob_column)

Advanced Startlist Integration

  • URL-Based Startlist Processing: IBU official startlist scraping with fuzzy name matching
  • Season Fallback Mechanism: Complete athlete pool from chronological data when startlists unavailable
  • Probability Column Management: Race-specific participation probabilities (Race1_Prob, Race2_Prob)
  • Host Nation Integration: Home advantage modeling through nation-specific flags

3. Weekend-Specific Probability Framework

Multi-Race Probability Architecture

  • Race-Specific Columns: Dynamic probability columns for each weekend race
  • Participation Modeling: 100% probability for confirmed startlist entries, 0% for season athletes not entered
  • Cross-Race Athlete Tracking: Maintains athlete records across multiple weekend events
  • Consolidated Startlist Generation: Single comprehensive file containing all weekend race probabilities

Weekend Race Probability Assignment

# Race probability assignment based on startlist presence
if url and not pd.isna(url):
    row_data[prob_column] = 1.0  # In startlist = 100% for this race
else:
    row_data[prob_column] = 0.0  # Not in startlist = 0% probability

Statistical Modeling Framework

1. Weekend-Adapted Feature Engineering

R Statistical Processing (weekly-picks2.R)

  • Current Date Filtering: Processes only races scheduled for the current UTC date
  • Weekend Race Structure: Handles multiple individual races plus relay events within unified framework
  • Enhanced Logging: Comprehensive process logging for weekend-specific operations
  • Race Classification: Separate processing tracks for men’s, ladies’, and mixed events

Weekend Race Variables

# Weekend race configuration
men_races <- next_weekend_races %>%
  filter(Sex == "M", !RaceType %in% c("Relay", "Mixed Relay", "Single Mixed Relay"))

ladies_races <- next_weekend_races %>%
  filter(Sex == "L", !RaceType %in% c("Relay", "Mixed Relay", "Single Mixed Relay"))

mixed_races <- next_weekend_races %>%
  filter(Sex == "Mixed")

2. Multi-Event Participation Integration

Weekend Participation Modeling

  • Cross-Race Participation: Athletes may compete in multiple events within the same weekend
  • Format-Specific Rates: Different participation patterns for Sprint, Individual, Pursuit, Mass Start
  • Relay Team Selection: Nation-based team participation separate from individual event entry
  • Historical Weekend Analysis: 5-year rolling windows for weekend-specific participation rates

Enhanced Feature Set for Weekend Predictions

# Weekend-specific explanatory variables
explanatory_vars <- c("Prev_Points_Weighted", 
                      "Sprint_Pelo_Pct", "Individual_Pelo_Pct", 
                      "MassStart_Pelo_Pct", "Pursuit_Pelo_Pct", 
                      "Pelo_Pct", "Period", "Elevation_Flag")

3. Weekend-Optimized PELO Integration

Multi-Race PELO Framework

  • Weekend Form Assessment: PELO ratings aggregated across weekend format variations
  • Shooting Condition Adaptation: Venue-specific range condition adjustments for weekend events
  • Cross-Format PELO Application: Shooting accuracy metrics adapted for different weekend race types
  • Weekend-Specific Normalization: PELO percentiles calculated within weekend competition context

Advanced Weekend-Specific Adjustments

1. Multi-Day Competition Modeling

Weekend Race Interaction Effects

  • Fatigue Modeling: Performance degradation across multiple weekend events
  • Strategic Pacing: Athletes’ race selection and intensity distribution over weekend
  • Recovery Windows: Time gaps between weekend events affecting performance
  • Equipment Optimization: Venue-specific equipment choices across weekend formats

Weekend Context Variables

# Weekend-specific race context
next_weekend_date <- min(next_races$Date, na.rm = TRUE)
weekend_race_count <- sum(next_weekend_races$Sex %in% c("M", "L"))
host_nation <- sorted_races$Country[1]

2. Enhanced Relay Weekend Processing

Weekend Relay Coordination

  • Team Selection Dynamics: Nation-specific team composition strategies for weekend events
  • Individual-Team Performance Correlation: Weekend individual results affecting relay selection probability
  • Mixed Format Integration: Coordination between standard and mixed relay weekend events
  • Team ELO Weekend Adjustment: Weekend-specific team chemistry and preparation factors

Weekend Team Processing Architecture

# Weekend relay processing with skip control
def process_relay_and_team_races(relay_races: pd.DataFrame) -> None:
    for race_type in race_types:
        env = os.environ.copy()
        env["SKIP_WEEKLY_PICKS"] = "1"  # Prevent duplicate R script execution
        
        subprocess.run([script_path, temp_file], env=env)

3. Weekend Output Generation and Integration

Comprehensive Weekend Results

  • Multi-Event Spreadsheets: Individual race predictions plus relay team forecasts
  • Weekend Summary Analysis: Aggregated performance expectations across all weekend events
  • Cross-Event Probability Matrices: Unified position probabilities accounting for multi-event participation
  • Weekend Performance Tracking: Historical weekend success rates for model validation

Weekend-Specific File Architecture

# Weekend output structure
output_path <- "~/ski/elo/python/biathlon/polars/excel365/startlist_weekend_{gender}.csv"
log_file <- "~/ski/elo/python/biathlon/polars/excel365/weekly-predictions/weekly_picks_processing.log"

Key Methodological Innovations for Weekend Predictions

1. Multi-Race Probability Integration

Handles complex weekend scenarios where athletes compete in multiple events with varying participation probabilities

2. Weekend-Optimized PELO Application

Adapts shooting accuracy metrics for the unique challenges of multi-day competition formats

3. Enhanced Team-Individual Coordination

Seamlessly integrates individual weekend performance with relay team selection and performance modeling

4. Dynamic Race Priority Management

Intelligently prioritizes race processing when multiple events occur within the same weekend timeframe

5. Comprehensive Season Athlete Pool

Ensures complete coverage by including all active season athletes with appropriate zero probabilities for non-participants

6. Weekend-Aware Relay Processing

Coordinates multiple relay formats (standard, mixed, single mixed) within unified weekend prediction framework

7. Advanced Weekend Context Integration

Incorporates host nation effects, elevation adjustments, and period effects specific to weekend competition dynamics

8. Cross-Format Performance Modeling

Bridges individual and team event predictions within cohesive weekend forecasting methodology

This weekend prediction methodology represents the most sophisticated approach to multi-day biathlon competition forecasting, uniquely addressing the sport’s complex weekend structure while maintaining the shooting accuracy integration and statistical rigor that characterizes elite biathlon performance analysis.