Skip to content

jurjoroa/pmx_gasoline_theft_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation



MSc. Data Science for Public Policy

First Semester
Machine Learning
Professor: Slava Jankin, PhD
Teaching Assistant: Paulina García Corral, MSc

Machine Learning Project

Description

Pemex is a Mexican state-owned company that produces, transports, refines and markets oil and natural gas. It has pipelines throughout the 32 states in Mexico for the distribution of oil. Clandestine intakes detected in Pemex pipelines have increased by 2,197% from 2008 to 2015. These attacks have been reported in 24 federal entities of the country and the database is available here.

Violence and criminal activities have risen in Mexico in the past 20 years. One problem that has been increasing is fuel theft committed by criminal groups. This is a serious problem because this activity could increase violence among criminal groups to have control over different pipelines. The main problem of clandestine gasoline theft is predicting which areas are more likely to have gasoline theft by criminal groups. With this problem, this research aims to predict which municipalities are more susceptible to experiencing gasoline theft. In this sense, our contribution is first to understand which features influence pipeline thefts in Mexico; second, to use a machine learning approach to classify which municipalities are more susceptible to be subject to gasoline theft by criminal groups; third, to promote quantitative research on the phenomenon of gasoline theft in Mexico. The project analyzed 839 Municipalities that have pipelines and classified them as "susceptible" or not. We implemented a Logistic Regression model, a Decision Tree Classifier and a Random Forest Classifier (RF). According to the results, the RF was the best with 82.14% in accuracy; 82.35% in both precision, recall, and F1 scores, and 64.28% in MCC Score.

REPO

Our repository is divided as follows:

📁 analysis: Codes that goes from 01 to 10 showing wrangle data, merge and final analysis and test of the different models.

📁 data: Processed data files used for our analysis.

📁 data-raw: Raw data to be wrangled.

📁 figures: figures generated for the analysis.

📁 presentation: Quanteda presentation

📁 video: video of 3 minutes presenting our results

About

Machine Learning Project to predict gasoline theft in Mexico

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published