Skip to content

Use Python, Pandas, Spark etc to demontrate that correlation can be used as a basis for decision making

License

Notifications You must be signed in to change notification settings

masanchis/python-data-driven-decisions

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

python-data-driven-decisions

Use Python, Pandas, Spark etc to demontrate that correlation can be used as a basis for decision making.

This project consists of finding the correlation between the GDP (Gross Domestic Product) and social and economical indicators, such as population growth, fertility rates, investment in specific sectors or prices.

The study is going to be based on the data from FAOSTAT (https://www.fao.org/faostat/en/#home), updated on May 30, 2022.

The tools used are going to be Python, Pandas and Spark.

The main steps to follow are going to be:

1- Extract the data from FAO.

2- Join the different files.

3- Filter and clean the unseful information.

4- Build a complete data table organized by categories (year, country, indicator...).

5- Perform statistical analysis (through coefficients and graphs).

6- Basing on previous results, find more specific correlations grouping by year(s) or region(s).

7- Get quantitative and qualitative conclusions.

The record of this project is going to be done weekly (and can be found in the Wiki section). Besides, an in-depth explanation of the programming part can be found on the code file itself.

About

Use Python, Pandas, Spark etc to demontrate that correlation can be used as a basis for decision making

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%