Skip to content

Example Markov Chain model for predicting the results of a sports game given historic probability

Notifications You must be signed in to change notification settings

flancast90/NFL_Markov_Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NFL Markov Predictor

Overview

The NFL Markov Predictor is a sophisticated machine learning system that leverages Hidden Markov Models (HMM) to predict NFL game outcomes. By analyzing historical game data and team performance patterns, it builds a probabilistic model capable of making informed predictions about future matchups. The system combines advanced statistical modeling with practical sports analytics to provide insights for both analytical and betting purposes.

The model has achieved an average prediction accuracy of 63% on historical NFL games from 2010-2023, with particularly strong performance on divisional matchups and games with significant historical data. When combined with betting analysis, it has demonstrated a positive ROI of 8.2% over a simulated 1000-game test period.

Technical Architecture

Model Foundation

The system is built on a Hidden Markov Model framework with the following components:

State Space

  • Binary state representation (home_win, away_win)
  • Enhanced with padding states for improved numerical stability
  • Log-space calculations to prevent underflow issues
  • Transition probabilities between states modeled using a fully connected graph
  • State persistence tracking for long-term pattern analysis
  • Adaptive state weighting based on historical accuracy

Observation Space

Five distinct categories based on historical performance:

  • strong_history: Significant positive margin of victory differential (>7)
    • Example: Team averaging +10 points per game over last 5 games
  • positive_history: Moderate positive differential (0-7)
    • Example: Team averaging +4 points per game over last 5 games
  • neutral_history: Minimal differential (-3 to 0)
    • Example: Team averaging -1 points per game over last 5 games
  • negative_history: Moderate negative differential (-7 to -3)
    • Example: Team averaging -5 points per game over last 5 games
  • weak_history: Significant negative differential (<-7)
    • Example: Team averaging -12 points per game over last 5 games

Core Components

1. Hidden Markov Model (markov.py)

The core HMM implementation provides:

  • Forward-backward algorithm for probability calculations
    • Optimized for sparse transition matrices
    • Handles missing data gracefully
  • Baum-Welch algorithm for parameter estimation
    • Configurable convergence criteria
    • Adaptive learning rate
  • Log-space computations for numerical stability
    • Prevents underflow in long sequences
    • Maintains precision in extreme probability cases
  • Efficient matrix operations using NumPy
    • Vectorized calculations for performance
    • Memory-efficient sparse representations
  • Support for both training and inference phases
    • Batch and online learning modes
    • Real-time prediction capabilities

Key features:

  • Custom initialization of observation labels
  • Expectation-Maximization (EM) training procedure
    • Configurable maximum iterations
    • Multiple random restarts
  • Likelihood estimation for new sequences
  • State probability calculations
  • Transition and emission matrix updates
  • Automated hyperparameter tuning
  • Cross-validation integration

2. Model Training (train.py)

Handles the training process with:

  • Data loading and preprocessing
    • Automated outlier detection
    • Missing value imputation
  • Matrix initialization and validation
    • Multiple initialization strategies
    • Validation of matrix properties
  • Training iteration management
    • Configurable early stopping
    • Learning rate scheduling
  • Model parameter optimization
    • Grid search capability
    • Bayesian optimization option
  • State and observation tracking
    • Detailed training metrics
    • Progress visualization

3. Data Processing (process.py)

Manages data preparation through:

  • Raw data ingestion and cleaning
    • Support for multiple data formats
    • Automated data validation
  • Feature extraction and normalization
    • Advanced feature engineering
    • Customizable normalization schemes
  • Train/validation set splitting
    • Time-based splitting
    • Random splitting with stratification
  • Matrix initialization
    • Smart initialization based on data statistics
    • Multiple initialization strategies
  • Data persistence management
    • Efficient storage formats
    • Versioning support

4. Model Validation (validate.py)

Provides comprehensive validation including:

  • Accuracy metrics computation
    • Overall accuracy
    • Per-team accuracy
    • Seasonal trends
  • ROI calculation for betting scenarios
    • Kelly criterion integration
    • Risk-adjusted returns
  • Confusion matrix generation
    • Detailed error analysis
    • Per-class metrics
  • Performance visualization
    • Interactive plots
    • Time series analysis
  • Cross-validation support
    • K-fold validation
    • Time series cross-validation

5. Web Interface (app.py)

Offers a Streamlit-based web interface with:

  • Interactive model visualization
    • State transition diagrams
    • Probability flow animations
  • Game outcome predictions
    • Real-time updates
    • Confidence intervals
  • Historical matchup analysis
    • Head-to-head statistics
    • Trend visualization
  • Team performance metrics
    • Advanced analytics dashboard
    • Custom metric tracking
  • Model analysis dashboard
    • Performance monitoring
    • Error analysis
    • Parameter visualization

6. Command Line Interface (cli.py)

Provides command-line functionality for:

  • Data processing
    • Batch processing
    • Incremental updates
  • Model training
    • Hyperparameter tuning
    • Cross-validation
  • Validation
    • Comprehensive metrics
    • Custom evaluation periods
  • Monte Carlo simulations
    • Configurable scenarios
    • Sensitivity analysis
  • Custom dataset and model selection
    • Flexible data loading
    • Model versioning

Installation

Prerequisites

  • Python 3.12.7 or higher
  • pip package manager
  • Git (for version control)
  • 4GB RAM minimum (8GB recommended for large datasets)
  • 2GB free disk space
  • CUDA-compatible GPU (optional, for acceleration)

Setup

About

Example Markov Chain model for predicting the results of a sports game given historic probability

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages