Use Python, Pandas, Spark etc to demontrate that correlation can be used as a basis for decision making.
This project consists of finding the correlation between the GDP (Gross Domestic Product) and social and economical indicators, such as population growth, fertility rates, investment in specific sectors or prices.
The study is going to be based on the data from FAOSTAT (https://www.fao.org/faostat/en/#home), updated on May 30, 2022.
The tools used are going to be Python, Pandas and Spark.
The main steps to follow are going to be:
1- Extract the data from FAO.
2- Join the different files.
3- Filter and clean the unseful information.
4- Build a complete data table organized by categories (year, country, indicator...).
5- Perform statistical analysis (through coefficients and graphs).
6- Basing on previous results, find more specific correlations grouping by year(s) or region(s).
7- Get quantitative and qualitative conclusions.
The record of this project is going to be done weekly (and can be found in the Wiki section). Besides, an in-depth explanation of the programming part can be found on the code file itself.