Skip to content

Latest commit

 

History

History
95 lines (70 loc) · 4.58 KB

README.md

File metadata and controls

95 lines (70 loc) · 4.58 KB

Firecast.ai - machine learning wildfire risk forecasting

Heatmap banner

The goal of this project is to build a machine learning model which can predict wildfire ignition risk in California from publicly available meteorology and fire activity data.

-- Project Status: [Active]

Project Description

Wildfires are common, destructive and deadly natural disasters. Current meteorology based wildfire risk prediction methods can be improved upon by:

  1. The application of modern data pipeline automation and machine learning techniques
  2. Use of historical wildfire data for model training and validation

This project uses a parallel LSTM neural network to predict geospatially resolved wildfire ignition risk in California. The model was trained on a combined dataset produced from the USDA historical wildfire activity dataset(1) and meterological data from NOAA's North American Regional Reanalysis(2). This project is currently in the deployment phase. Live prediction data will be available for 7 days into the future via API. For more background information please see the full project proposal

Using this repository

First, clone the repo:

git clone https://github.com/gperdrizet/firecast.ai.git

Next, you have two options to install required packages:

A) Conda.

This will install a complete copy of the development environment, including all dependencies.

cd firecast.ai
conda env create -f environment.yml

B) using pip and venv.

python3 -m venv firecast.ai
source firecast.ai/bin/activate
cd firecast.ai
pip install -r requirements.txt

Due to size and space constraints, only the final training dataset and its derivatives are included in this repo. Raw and intermediate data files created by the training data pipeline are not hosted on github, but can be found here. Note: total size on disk is 326G, ~2500 files.

Featured notebooks

  1. Exploratory data analysis
  2. Classifier model evaluation
  3. Feature engineering
  4. Fully stratified sampling
  5. XGBoost optimization
  6. Deep neural network optimization
  7. Single LSTM optimization
  8. Geospatially parallel LSTM

Data Sources

  1. Historical wildfire activity: United States Department of Agriculture Research Data Archive, Spatial wildfire occurrence data for the United States, 1992-20151
  2. Historical metrology data: National Oceanic and Atmospheric Administration, North American Regional Reanalysis2

Methods Used

  • Machine Learning
  • Gradient boosted decision trees
  • Deep neural networks
  • Long short term memory neural networks
  • Cartographic Projection
  • Time Series Analysis
  • Feature Engineering
  • Hyperparameter optimization
  • Metaparameter optimization
  • Gaussian process optimization
  • Cox-Box quantile normalization
  • Kolmogorov–Smirnov
  • Recursive sample stratification

Technologies

  • Python
  • PySpark
  • Luigi
  • Flask
  • Tensorflow
  • Keras
  • Scikit-Learn
  • Pandas
  • NumPy
  • Shaply
  • GeoPandas
  • Xarray
  • Matplotlib
  • Seaborn

Contributing Members

Team Lead (Contact) : George Perdrizet

References

  1. Short, Karen C. 2017. Spatial wildfire occurrence data for the United States, 1992-2015 [FPA_FOD_20170508]. 4th Edition. Fort Collins, CO: Forest Service Research Data Archive. https://doi.org/10.2737/RDS-2013-0009.4
  2. NCEP Reanalysis data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at https://www.esrl.noaa.gov/psd/