Avalanche prediction for 22 massifs of French Alps

by Kamila Hamalcikova

View on Mont Blanc massif from Lac Blanc during sunrise, source: Kamila Hamalcikova

Dataset

My dataset is compilation of 3 sources:

1. multiple Excel and pdf reports about occurrence of avalanche accidents in France from winter season 2010/11 till 2019/20. Original reports can be found on ANENA (Association Nationale pour l’Étude de la Neige et des Avalanches) website.
1. Excel report about avalanche events from Data-avalanche.org
1. netCDF4 files and shapefiles for various meteo and snow variables from The S2M meteorological and snow cover reanalysis in the French mountainous areas (1958 - present)

Installation

Anaconda
Jupyter Notebook
Python 3.6
Python libraries (Pandas, Numpy, Geopandas, Scikit-Learn, Bokeh, Matplotlib, Seaborn, Glob, Json, Datetime, Pathlib, NetCDF4, Xarray)
Flask

Main findings

Data exploration:

Avalanche accidents (with people injured, dead or in need of rescue team) and avalanche events (natural events where people might or might not be involved) are most frequent in March, in altitude from 2000 to 3000 meters. Avalanche accidents happen most often to groups of 2 people while hiking. Avalanche accidents happen most in Vanoise massif, while avalanche events were most frequent in Haute Maurienne massif. Dataviz of avalanche events is also accessible via Tableau and dataviz of avalanche accidents is here image of avalanche events (left) and avalanche accidents (right) on the map of French Alps, source: Kamila Hamalcikova/Bokeh

Avalanche prediction with machine learning:

Random Forest (RF) was machine learning model used for prediction of avalanche events and accidents. Results were better when predicting avalanche events because independent variables in dataset were more suitable for it. I was dealing with strongly imbalanced dataset (99,6 % of cases without avalanche and only 0.4 % with avalanche). I focused mainly on recall for days with avalanche and weighted F1 score as performance metrics of RF models, instead of accuracy that is misleading for imbalanced datasets. The best choice was RF model when removing summer months from the sample. It resulted in 0.58 recall for avalanche days and very high F1 score 0.9976. It was minor improvement over baseline values 0.55 from simple RF without feature selection. Other options were RF only on data with limited altitude (1500- 3600 meters), without division into 22 massifs and modification of RF model (Balanced Random Forest), but these variants of machine learning models provided less satisfying results. More details about machine learning part can be found in my article on TowardsDataScience.com.

XGBoost (XGB) was used only for prediction of avalanche events. This model again showed that the best option for predcition is removing summer months from sample. Moreover, this XGB provided significantly better recall for avalanche days over Random Forest (0.8-0.9 instead of 0.58 from RF). F1-score of XGB was lower (0.992-0.995 instead of 0.997-0.998 for RF). This was due too slightly lower metrics like recall for days without avalanche and precision of avalanche days. But these metrics has way lower importance then recall for avalanche days, therefore XGBoost is far better in prediction of avalanches than Random Forest.

How to run the code

Data exploration: in files avalanche_events_EDA.ipynb and avalanche_accidents_EDA.ipynb, or png.images can be displayed without checking ipynb files.
Avalanche prediction with machine learning: In order to repeat data wrangling of my final dataset, you need to download source nc files from The S2M meteorological and snow cover reanalysis (link above) and then run folowing files in this order:
1. scripts in year_select.py and one_year_meteo.py to get data about meteo variables into csv files for different years
2. snow_variables.ipynb to get data about snow variables into csv file, meteo_variables.ipynb to get data about meteo variables into final csv file and merge it with snow variables data
3. final_merge.ipynb to get final dataset for machine learning
4. random_forest_avalanche_events_v2_final.ipynb and random_forest_avalanche_accidents.ipynb to get results of Random Forest model and xgboost_avalanche_events.ipynb to get results of XGBoost model.
Flask application: in folder Flask app (if you are not familiar with Flask, you can view video of the app here or simply check this website, where app is stored.

License

This project is licensed under the MIT License - see the license file for details

Sources

ANENA: for avalanche accidents reports
The S2M meteorological and snow cover reanalysis in the French mountainous areas (1958 - present) : for snow and meteo variables
Data-avalanche.org: for avalanche events reports
Bagging and Random Forest for Imbalanced Classification: for dealing with imbalanced dataset
Growing RForest - 97% Recall and 100% Precision: for ways how to improve recall
Implementing a Random Forest Classification Model in Python: for Random forest model
Adding CSS styling to your website: for changes in CSS styles
Build a Python Web Server with Flask: for building Flask app
How to Configure XGBoost for Imbalanced Classification: XGBoost for imbalanced dataset
A Gentle Introduction to XGBoost for Applied Machine Learning: Introduction to XGBoost ML model

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
__pycache__		__pycache__
flask_app		flask_app
readme.assets		readme.assets
shapefile		shapefile
.gitignore		.gitignore
5041-bilan-des-accidents - cleaned.csv		5041-bilan-des-accidents - cleaned.csv
README.md		README.md
accidents_massifs.xlsx		accidents_massifs.xlsx
aval_accidents_for_t.csv		aval_accidents_for_t.csv
aval_alps.csv		aval_alps.csv
aval_alps_2010_2019_org.xls		aval_alps_2010_2019_org.xls
avalanche_accidents_EDA.ipynb		avalanche_accidents_EDA.ipynb
avalanche_events_EDA.ipynb		avalanche_events_EDA.ipynb
dangerous_activity_accidents.png		dangerous_activity_accidents.png
dangerous_altitude_accidents.png		dangerous_altitude_accidents.png
dangerous_altitudes_events.png		dangerous_altitudes_events.png
dangerous_group_size_accidents.png		dangerous_group_size_accidents.png
dangerous_hours_of_day_accidents.png		dangerous_hours_of_day_accidents.png
dangerous_months_accidents.png		dangerous_months_accidents.png
dangerous_months_events.png		dangerous_months_events.png
debug.log		debug.log
final_merge.ipynb		final_merge.ipynb
group_size.png		group_size.png
meteo_variables.ipynb		meteo_variables.ipynb
one_year_meteo.py		one_year_meteo.py
preprocessing_of_accidents_data.ipynb		preprocessing_of_accidents_data.ipynb
preprocessing_of_avalanche_data.ipynb		preprocessing_of_avalanche_data.ipynb
random_forest-avalanche_accidents.ipynb		random_forest-avalanche_accidents.ipynb
random_forest_avalanche_events_v1.ipynb		random_forest_avalanche_events_v1.ipynb
random_forest_avalanche_events_v2_final.ipynb		random_forest_avalanche_events_v2_final.ipynb
shapefile.csv		shapefile.csv
snow_variables.ipynb		snow_variables.ipynb
xgboost_avalanche_events.ipynb		xgboost_avalanche_events.ipynb
year_select.py		year_select.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Avalanche prediction for 22 massifs of French Alps

by Kamila Hamalcikova

Dataset

Installation

Main findings

How to run the code

License

Sources

About

Releases

Packages

Contributors 3

Languages

Kamila-Hamalcikova/Avalanche_prediction_in_massifs_of_French_Alps

Folders and files

Latest commit

History

Repository files navigation

Avalanche prediction for 22 massifs of French Alps

by Kamila Hamalcikova

Dataset

Installation

Main findings

How to run the code

License

Sources

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages