Biathlon Championships Predictions - Implementation Scope
Based on the Nordic Combined champs-predictions.R file, here’s what the biathlon version should include:
Core Structure (1519 lines in Nordic Combined)
1. Library Imports & Setup (Lines 1-32)
- Standard R libraries (dplyr, tidyr, openxlsx, arrow, mgcv, leaps, logger, purrr, lubridate, slider)
- Biathlon-specific points systems (regular_points, mass_start_points)
- Logging configuration to
~/ski/elo/python/biathlon/polars/excel365/champs-predictions/
2. Points Systems & Helper Functions (Lines 13-117)
- Biathlon Points: Regular (40 positions) and Mass Start (30 positions) - ✅ DONE
replace_na_with_quartile()function for missing value imputationget_points()function with race type logic for Mass Start vs other races - ✅ DONEenforce_probability_constraints()- ensures Win ≤ Podium ≤ Top5 ≤ Top10normalize_position_probabilities()- complex normalization with race participation scaling
3. Data Reading & Race Filtering (Lines 34-99)
- Read weekends.csv and filter for Championships (Championship == 1)
- Separate races by type:
- Individual races: Men/Ladies for Individual, Sprint, Pursuit, Mass Start
- Relay races: Men/Ladies for Relay, Mixed Relay, Single Mixed Relay
- Race dataframes with original race numbering for Excel sheet naming
4. Data Preprocessing Function (Lines 411-528)
preprocess_data()- handles historical race data- Points calculation using appropriate points system
- Weighted previous points calculation (last 5 races, recent weighted higher)
- ELO/PELO column creation and percentage normalization
- Race type filtering (exclude Offseason)
- Season period assignment (4 periods per season)
5. Startlist Data Preparation (Lines 229-409)
prepare_startlist_data()- prepares prediction data- Handles both individual and team startlists
- Team Prev_Points_Weighted calculation from individual member data
- ELO percentage column creation (Elo_Pct, Pelo_Pct)
- Race probability column preservation
- Team vs individual data path logic
6. Individual Championships Processing (Lines 530-1004)
process_gender_championships()- main individual prediction function- GAM model creation using regsubsets for feature selection
- Position probability models for thresholds (1, 3, 5, 10, 30)
- Training on PELO (pre-race), prediction on ELO (post-race)
- Period adjustments using t-tests
- Brier score evaluation for model quality
- Excel output generation with race-by-race sheets
7. Team Championships Processing (Lines 1030-1348)
process_team_championships()- handles relay predictions- Team-specific ELO columns (Avg_Individual_Elo, etc.)
- Team position models (fewer thresholds: 1, 3, 5, 10)
- Team startlist handling for different relay types
- Team summary and race-by-race Excel outputs
8. Race Probability Calculation (Lines 1350-1486)
calculate_championships_race_probabilities()- calculates participation probabilities- Historical participation analysis (5-year lookback)
- Biathlon-specific: 6-person quota constraint per nation per race (vs 3 for Nordic Combined)
- Base probability scaling and normalization
- Updates startlist files with race probabilities
9. Team Member Points Calculation (Lines 1006-1028)
calculate_team_prev_points()- calculates team Prev_Points_Weighted- Individual member analysis from chrono data
- Weighted average of last 5 individual races
- Biathlon-specific: Include Individual, Sprint, Pursuit, Mass Start (exclude Offseason)
Biathlon-Specific Adaptations Needed
1. Points Systems
- ✅ Already adapted: regular_points (40 positions), mass_start_points (30 positions)
- ✅ get_points() function with Mass Start logic
2. File Paths
- Change all paths from
nordic-combinedtobiathlon - Output directory:
~/blog/daehl-e/content/post/biathlon/drafts/champs-predictions/
3. Race Types
- Individual: Individual, Sprint, Pursuit, Mass Start (vs Individual, IndividualCompact, Mass Start)
- Relay: Use “Relay” in RaceType (vs “Team”)
- Mixed: Mixed Relay (2M+2L), Single Mixed Relay (1M+1L)
4. ELO Column Names
- ✅ Use MassStart_Elo not Mass_Start_Elo
- Individual_Elo, Sprint_Elo, Pursuit_Elo, MassStart_Elo
5. Quota System
- 6-person quota per nation (vs 3 for Nordic Combined)
- Update quota constraints in race probability calculation
6. Relay Processing
- 4 separate relay types: men_relay, ladies_relay, mixed_relay, single_mixed_relay
- Team composition logic for mixed relays (2M+2L vs 1M+1L)
- Different chrono file paths for each relay type
7. Output Files
- 6 Excel workbooks: men, ladies, men_relay, ladies_relay, mixed_relay, single_mixed_relay
- Race-by-race sheets with proper naming
Implementation Priority
-
High Priority (Core functionality):
- Data preprocessing function with biathlon race types
- Individual championships processing (men/ladies)
- Race probability calculation with 6-person quota
- Basic Excel output generation
-
Medium Priority (Relay functionality):
- Relay championships processing for all 4 types
- Team member points calculation
- Mixed relay logic (2M+2L, 1M+1L)
-
Low Priority (Advanced features):
- Period adjustments and t-test analysis
- Advanced GAM model diagnostics
- Brier score calculations
Key Functions to Implement
preprocess_data()- adapt for biathlon race typesprepare_startlist_data()- handle biathlon ELO columnsprocess_gender_championships()- individual predictionsprocess_relay_championships()- relay predictions (new function)calculate_championships_race_probabilities()- 6-person quotacalculate_team_prev_points()- biathlon race type filtering
The file should be approximately 1500+ lines to match the complexity and functionality of the Nordic Combined version.