Derby 2026

Methodology

About

How these predictions are built — and what they're not.

The model in 60 seconds

  1. Training data: every horse in this weekend's BRIS Single PP file carries up to 10 prior races. Across 552 runners, that's ~3,000 labeled (horse, prior race) observations.
  2. Features: 18 per-horse signals — recent form, pace style, class trend, surface/distance fit, layoff days. Computed strictly from races dated before the target race (no leakage).
  3. Regressor: a LightGBM with Huber loss predicts each horse's expected speed figure for the upcoming race.
  4. Field-level conversion: predicted speed figures are softmaxed via Plackett-Luce with a learned temperature (10.95 fig units) → Win / Place / Show / ITM probabilities.
  5. SHAP attributions explain each individual prediction — see the "Why this prediction?" panel on any horse page.

Validation

5-fold time-based walk-forward CV on the 2,533-row training set. Mean absolute error on held-out speed figures is 4.95 fig points — a 1.05-point improvement over the naive baseline (predict next race = last race's speed figure). The model wins on all 5 folds.

Limitations

  • Predictions are baked at build time. Live odds, scratches, and changes are not reflected.
  • First-time and second-time starters have very limited prior data; their predictions are inherently noisy.
  • The "value edge" overlay uses morning-line odds, not live-money tote prices.
  • Trip notes (the "comment" column) are not yet ingested as features.
  • The model has no concept of pace shape, traffic, or post-position bias beyond what's in the training data.

Last build: 2026-04-29 19:27 UTC. Track: Churchill Downs (CD).