The goal of this project is to build a machine learning model which can predict wildfire ignition risk in California from publicly available meteorology and fire activity data.
Wildfires are common, destructive and deadly natural disasters. Current meteorology based wildfire risk prediction methods can be improved upon by:
- The application of modern data pipeline automation and machine learning techniques
- Use of historical wildfire data for model training and validation
This project uses a parallel LSTM neural network to predict geospatially resolved wildfire ignition risk in California. The model was trained on a combined dataset produced from the USDA historical wildfire activity dataset(1) and meterological data from NOAA's North American Regional Reanalysis(2). This project is currently in the deployment phase. Live prediction data will be available for 7 days into the future via API. For more background information please see the full project proposal
First, clone the repo:
git clone https://github.com/gperdrizet/firecast.ai.git
Next, you have two options to install required packages:
This will install a complete copy of the development environment, including all dependencies.
cd firecast.ai
conda env create -f environment.yml
python3 -m venv firecast.ai
source firecast.ai/bin/activate
cd firecast.ai
pip install -r requirements.txt
Due to size and space constraints, only the final training dataset and its derivatives are included in this repo. Raw and intermediate data files created by the training data pipeline are not hosted on github, but can be found here. Note: total size on disk is 326G, ~2500 files.
- Exploratory data analysis
- Classifier model evaluation
- Feature engineering
- Fully stratified sampling
- XGBoost optimization
- Deep neural network optimization
- Single LSTM optimization
- Geospatially parallel LSTM
- Historical wildfire activity: United States Department of Agriculture Research Data Archive, Spatial wildfire occurrence data for the United States, 1992-20151
- Historical metrology data: National Oceanic and Atmospheric Administration, North American Regional Reanalysis2
- Machine Learning
- Gradient boosted decision trees
- Deep neural networks
- Long short term memory neural networks
- Cartographic Projection
- Time Series Analysis
- Feature Engineering
- Hyperparameter optimization
- Metaparameter optimization
- Gaussian process optimization
- Cox-Box quantile normalization
- Kolmogorov–Smirnov
- Recursive sample stratification
- Python
- PySpark
- Luigi
- Flask
- Tensorflow
- Keras
- Scikit-Learn
- Pandas
- NumPy
- Shaply
- GeoPandas
- Xarray
- Matplotlib
- Seaborn
Team Lead (Contact) : George Perdrizet
- Short, Karen C. 2017. Spatial wildfire occurrence data for the United States, 1992-2015 [FPA_FOD_20170508]. 4th Edition. Fort Collins, CO: Forest Service Research Data Archive. https://doi.org/10.2737/RDS-2013-0009.4
- NCEP Reanalysis data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at https://www.esrl.noaa.gov/psd/