python-data-driven-decisions

Use Python, Pandas, Spark etc to demontrate that correlation can be used as a basis for decision making.

This project consists of finding the correlation between the GDP (Gross Domestic Product) and social and economical indicators, such as population growth, fertility rates, investment in specific sectors or prices.

The study is going to be based on the data from FAOSTAT (https://www.fao.org/faostat/en/#home), updated on May 30, 2022.

The tools used are going to be Python, Pandas and Spark.

The main steps to follow are going to be:

1- Extract the data from FAO.

2- Join the different files.

3- Filter and clean the unseful information.

4- Build a complete data table organized by categories (year, country, indicator...).

5- Perform statistical analysis (through coefficients and graphs).

6- Basing on previous results, find more specific correlations grouping by year(s) or region(s).

7- Get quantitative and qualitative conclusions.

The record of this project is going to be done weekly (and can be found in the Wiki section). Besides, an in-depth explanation of the programming part can be found on the code file itself.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
FAO attempt.ipynb		FAO attempt.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

python-data-driven-decisions

About

Releases

Packages

Languages

License

masanchis/python-data-driven-decisions

Folders and files

Latest commit

History

Repository files navigation

python-data-driven-decisions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages