John-Z's Overlay · Trifecta ATM

Methodology

About

Research-grade methodology. Transparent backtest. Tipsheet for handicappers — not a betting service.

App Developer John B. Clayton · Lexington, KY · Built with Claude Code

For a deeper-dive technical guide on reading the predictions and constructing exotic wagers, see How to Use.

The model in 60 seconds

  1. Training data: 157,869 labeled (horse, prior race) observations from a 2024-2025 archive of BRIS Single PP files spanning 7 tracks (AQU, CDX, DMR, GPX, KEE, OPX, SAR). Plus 7,188 matched (race, horse) rank labels from Equibase XRD result charts.
  2. Features: 63 per-horse signals computed strictly from races dated before the target race (no temporal leakage). The major families:
    • Form — recent BRIS Speed Rating distribution (last_3, last_5, career mean, career max, trend slope, consistency)
    • Pace & class — BRIS 2f / 4f / Late Pace figures (last 5), BRIS Speed Par for class level (rolling mean + trend)
    • Fitness & surface fit — days since last race, starts in last 60/180 days, surface and distance averages
    • Track context — today_track_code as a categorical, plus per-track speed-fig aggregates
    • Finish form — mean / last / won / placed / btn-lengths over recent races
    • Race design — post position, field size, weight, log-purse, change-from-last-race (distance / surface / field-size / weight)
    • Trainer / jockey — meet win rates, log-starts, 365-day combo win-rate and ROI
    • Today's-context constants — BRIS Prime Power, dirt/mud/turf/distance Pedigree Ratings, Quirin Speed Points, run-style (E / EP / P / S)
  3. Regressor: a LightGBM with Huber loss predicts each horse's expected BRIS Speed Rating ŷi for the upcoming race (num_leaves=127, n_estimators=5000, min_data_in_leaf=100). NaN-safe — missing features are routed in-tree without imputation.
  4. Rank model blend: a separate LambdaRank model trained on the XRD result-chart corpus learns within-race ordering directly. Per-race z-scored scores from the regressor and the rank model are blended (alpha=0.5) before probabilities are computed.
  5. Field-level conversion: blended speed figures are softmaxed via Plackett-Luce with a learned temperature (11.11 fig units) to produce Win / Place / Show / ITM marginals. Temperature is fit by maximum likelihood, with a per-track multiplicative correction applied.
  6. Closing-odds model: a separate LightGBM trained on historical ML→closing-odds drift predicts where each horse's final tote odds will land. ~13% better MAE at predicting closing odds than raw morning-line. Used as a pre-tote fallback in the EV simulator and surfaced as the "Pred Close" column in the field grid.
  7. Explainability: SHAP TreeExplainer values are computed per upcoming-race row at export time and embedded in each horse's JSON. The "Why this prediction?" panel surfaces the top-5 positive and top-5 negative contributions in fig-point units.

Validation

5-fold time-based walk-forward cross-validation. The model is retrained at each fold cut on rows with prior-race date ≤ cut, then evaluated against the next-window held-out rows. The naive baseline is "next race ≈ horse's most-recent speed fig" — what an honest by-eye handicapper would do without a model.

Stage Model MAE Naive MAE Lift Test N/fold
Original CD-only baseline (3K rows) 5.70 8.00 +2.30 ~200
Archive baseline, n_estimators=500 6.575 8.69 +2.115 ~14,000
+ tuned hyperparameters 6.138 8.69 +2.552 ~14,000
+ track-aware features (today_track_code categorical, per-track aggregates) 6.108 8.69 +2.582 ~14,000
Current — final tuned, honest CV (today-snapshot features excluded) 6.57 8.69 +2.12 ~14,000

Correction (May 2026): the intermediate rows above were measured with the model's today-snapshot features (prime power, trainer/jockey stats, run-style) left in the cross-validation. Because those values are constant for a horse, the same horse could appear in both train and test folds carrying an identical fingerprint, inflating the measured lift. The Current row excludes them from the CV for an honest out-of-sample number. The production model still uses every feature — they are all available before a race; they simply can't be honestly back-tested on history we only have a single snapshot of.

The right metric is lift over naive: ~24% reduction in MAE (2.12 / 8.69) vs. the last-speed-fig baseline, on a held-out 70K-row test set across the 5 folds. The model still wins on every fold.

The XRD-trained rank model (LambdaRank) was evaluated separately on 7,188 matched (race, horse) rows: top-1 hit rate 0.296 / NDCG@3 0.5902, beating BRIS Prime Power alone by +1.3pp top-1 and +0.019 NDCG@3. Wired into production at alpha=0.5 blend with the speed-fig regressor.

The honest punchline. Walk-forward backtests on 2,922 out-of-sample races show flat $2 WIN bets on the model's top-1 horse return roughly −17%; PLACE −15%; SHOW −17%. The 29.6% top-1 hit rate (vs. ~12.5% random on 8-horse fields) is real handicapping skill, but pari-mutuel takeout (~16% on WPS, 19–26% on trifecta) is too steep a floor for that edge to clear. This is why the site is positioned as a tipsheet, not a betting service.

Pipeline architecture

Two decoupled halves: an offline Python pipeline that emits a JSON tree, and an Astro static site that pre-renders pages from the JSON at build time. Predictions are baked at build, not generated on the server.

brisnet/parse.py       → BRIS Single PP CSV → races / runners / history DataFrames
brisnet/results.py     → Equibase XRD result charts → rank-model training labels
brisnet/features.py    → leakage-safe feature builders for train + predict (63 features)
brisnet/model.py       → LightGBM Huber regressor + LambdaRank model + closing-odds model
brisnet/calibrate.py   → Plackett-Luce W/P/S + per-track temperature correction
brisnet/validate.py    → walk-forward CV + markdown/PNG report
brisnet/ev_sim.py      → Plackett-Luce trifecta EV simulator (50K Gumbel-max samples)
brisnet/export.py      → write per-meet JSON tree consumed by Astro
brisnet/finalize.py    → cross-track index files (index.json, tracks.json, horse-*.json)
                       ↓
web/public/data/       → per-meet JSON tree under track-{code}/{meet-id}/
                       ↓
web/  (Astro + Tailwind 4 + ECharts island for charts; React 19 islands)
                       ↓
deploy/refresh.sh      → npm run build + tar over ssh + nginx atomic-swap

Research foundations

Each component of the pipeline maps to a well-established line of research. Grouped by what the citations back:

Probability model — speed fig to W/P/S/ITM

  • Plackett, R. L. (1975). "The analysis of permutations." Journal of the Royal Statistical Society, Series C. — foundational paper for the rank model that converts predicted speed figs into joint finish-order probabilities.
  • Luce, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley. — the choice axiom underpinning Plackett-Luce.
  • Bolton, R. N. & Chapman, R. G. (1986). "Searching for positive returns at the track: A multinomial logit model for handicapping horse races." Management Science 32(8). — canonical application of conditional logit (≈ PL win-marginal) to racing data.
  • Benter, W. (1994). "Computer based horse race handicapping and wagering systems: A report." (in Hausch & Ziemba, eds.) — the famous Hong Kong practitioner paper showing PL + conditional logit can systematically beat the pools; same architecture as ours.

Pari-mutuel market structure — where mispricings live

  • Hausch, D. B., Ziemba, W. T. & Rubinstein, M. (1981). "Efficiency of the market for racetrack betting." Management Science 27(12). — the "Dr. Z" paper. Shows the win pool is roughly efficient but place/show (and by extension trifecta) pools systematically misprice.
  • Thaler, R. H. & Ziemba, W. T. (1988). "Anomalies: Parimutuel betting markets — Racetracks and lotteries." Journal of Economic Perspectives 2(2). — the favorite/longshot bias the model partially exploits.
  • Snowberg, E. & Wolfers, J. (2010). "Explaining the favorite-longshot bias: Is it risk-love or misperceptions?" Journal of Political Economy 118(4). — modern empirical confirmation across millions of races.

Regressor — gradient boosted trees

  • Friedman, J. H. (2001). "Greedy function approximation: A gradient boosting machine." Annals of Statistics 29(5). — original gradient boosting.
  • Chen, T. & Guestrin, C. (2016). "XGBoost: A scalable tree boosting system." KDD '16. — the modern boosted-tree paradigm.
  • Ke, G. et al. (2017). "LightGBM: A highly efficient gradient boosting decision tree." NeurIPS. — the specific implementation used here, with leaf-wise growth and histogram-based split finding.

Learning to rank — within-race ordering from XRD result charts

  • Burges, C. J. C. (2010). "From RankNet to LambdaRank to LambdaMART: An overview." Microsoft Research Technical Report MSR-TR-2010-82. — the LambdaRank algorithm we use to learn within-race ordering directly from finish positions.
  • Järvelin, K. & Kekäläinen, J. (2002). "Cumulated gain-based evaluation of IR techniques." ACM TOIS 20(4). — NDCG, the rank-quality metric we evaluate the rank model on (NDCG@3 = 0.5902 in production).

Feature attributions — the "Why this prediction?" panel

  • Lundberg, S. M. & Lee, S.-I. (2017). "A unified approach to interpreting model predictions." NeurIPS. — original SHAP framework.
  • Lundberg, S. M. et al. (2020). "From local explanations to global understanding with explainable AI for trees." Nature Machine Intelligence 2. — TreeSHAP, the polynomial-time exact algorithm we call.

Monte Carlo finish-order sampling — the EV simulator

  • Yellott, J. I. (1977). "The relationship between Luce's choice axiom, Thurstone's theory of comparative judgment, and the double exponential distribution." Journal of Mathematical Psychology 15(2). — connects PL to the Gumbel distribution; basis for the Gumbel-max trick we use to vector-sample 50,000 finish orders per race.
  • Maddison, C. J., Mnih, A. & Teh, Y. W. (2016). "The Concrete distribution: A continuous relaxation of discrete random variables." ICLR. — modern treatment of the same Gumbel-perturbation sampler.

Speed-figure handicapping framework

  • Beyer, A. (1975). Picking Winners: A Horseplayer's Guide. Houghton Mifflin. — popular not academic, but the seminal text on speed-figure handicapping. BRIS Speed Rating descends from the same family of pace-and-final-time-adjusted figures.
  • Quirin, W. L. (1979). Winning at the Races: Computer Discoveries in Thoroughbred Handicapping. William Morrow. — original Quirin Speed Points (one of our features).
  • Lessmann, S., Sung, M.-C. & Johnson, J. E. V. (2010). "A new methodology for generating and combining statistical forecasting models with application to UK horse racing." European Journal of Operational Research 200(2). — modern ensemble-based handicapping benchmark.

Integrated reference

  • Hausch, D. B., Lo, V. S. Y. & Ziemba, W. T., eds. (2008). Efficiency of Racetrack Betting Markets. World Scientific. — the standard anthology covering the entire research lineage above in one volume; includes the Benter chapter.

Combining these into a deployed pari-mutuel-EV pipeline is bespoke, but each building block has decades of peer-reviewed support. The model isn't novel statistical research — it's a careful integration of known-good techniques applied to a single meet.

Limitations & honest claims

This site is a tipsheet, not a betting service. Walk-forward backtests across 2,922 out-of-sample races (7 tracks, 2024–2025) confirm the model picks horses well — top-1 hit rate of 29.6% on 8-horse fields vs. ~12.5% random — but pari-mutuel takeout (16% on WPS, 19–26% on trifecta) is too steep a hurdle for that edge to clear. Flat $2 WIN/PLACE/SHOW bets on the top-1 horse return roughly −15% to −17%. The product surfaces these results transparently rather than burying them.

  • The speed-fig predictions are baked at build time. Weather / track-surface condition changes and late jockey replacements after the morning's BRIS file was downloaded are not re-fed to the model. (Scratches and live tote odds can be applied between builds, and when shown on the site they reflect the most recent update — but the per-horse speed predictions don't recompute.)
  • First/second-time starters depend almost entirely on the BRIS Pedigree Ratings; their predictions are inherently noisier than horses with a few starts on the books.
  • The payout estimates use a closing-odds prediction model + per-track empirical payout calibration re-fit on closing-odds-based predictions. They're meaningfully better than raw morning-line, but still residually optimistic on longshot triples; treat the displayed payouts as informational ranges, not commitments.
  • The model has no explicit pace-shape interaction at the prediction level — a lone-speed horse and a contested-pace horse with the same Quirin score get the same speed-fig prediction. The Pace Shape panel on each race page is informational, not a feature input.
  • The model uses trainer/jockey meet stats and 365d combo stats as features, but doesn't have access to richer angle-specific splits (first-off-claim, 2nd-off-layoff, etc.). Adding those would likely give modest lift.
  • The takeout floor is a structural constraint, not a calibration bug. Bigger training corpora and richer features can sharpen rank order and reduce variance, but at the takeout levels in US pari-mutuel pools (16–26%), no realistic data scale guarantees a +EV automated strategy. Bill Benter's published account of his Hong Kong syndicate took 20 years and a team — and Hong Kong has materially lower takeout than US tracks.

Last build: 2026-06-06 22:28 UTC. Track: Churchill Downs (CD). Source on the developer's machine; not yet a public repo.