Machine Learning Project

MSc. Data Science for Public Policy

First Semester
Machine Learning
Professor: Slava Jankin, PhD
Teaching Assistant: Paulina García Corral, MSc

Machine Learning Project

Description

Pemex is a Mexican state-owned company that produces, transports, refines and markets oil and natural gas. It has pipelines throughout the 32 states in Mexico for the distribution of oil. Clandestine intakes detected in Pemex pipelines have increased by 2,197% from 2008 to 2015. These attacks have been reported in 24 federal entities of the country and the database is available here.

Violence and criminal activities have risen in Mexico in the past 20 years. One problem that has been increasing is fuel theft committed by criminal groups. This is a serious problem because this activity could increase violence among criminal groups to have control over different pipelines. The main problem of clandestine gasoline theft is predicting which areas are more likely to have gasoline theft by criminal groups. With this problem, this research aims to predict which municipalities are more susceptible to experiencing gasoline theft. In this sense, our contribution is first to understand which features influence pipeline thefts in Mexico; second, to use a machine learning approach to classify which municipalities are more susceptible to be subject to gasoline theft by criminal groups; third, to promote quantitative research on the phenomenon of gasoline theft in Mexico. The project analyzed 839 Municipalities that have pipelines and classified them as "susceptible" or not. We implemented a Logistic Regression model, a Decision Tree Classifier and a Random Forest Classifier (RF). According to the results, the RF was the best with 82.14% in accuracy; 82.35% in both precision, recall, and F1 scores, and 64.28% in MCC Score.

REPO

Our repository is divided as follows:

📁 analysis: Codes that goes from 01 to 10 showing wrangle data, merge and final analysis and test of the different models.

📁 data: Processed data files used for our analysis.

📁 data-raw: Raw data to be wrangled.

📁 figures: figures generated for the analysis.

📁 presentation: Quanteda presentation

📁 video: video of 3 minutes presenting our results

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
analysis		analysis
blogspot		blogspot
data-raw		data-raw
data		data
figures		figures
presetation		presetation
video		video
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Project

Description

REPO

About

Releases

Packages

Contributors 2

Languages

License

jurjoroa/pmx_gasoline_theft_prediction

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Project

Description

REPO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages