Skip to content

Latest commit

 

History

History
113 lines (68 loc) · 7.6 KB

README.md

File metadata and controls

113 lines (68 loc) · 7.6 KB

leia em português

GitHub Insights - Exploring the Mozambican Developer Ecosystem


This project aims to perform data analysis based on publicly available information from the GitHub API, focusing on users in Mozambique. We seek to extract valuable insights from user data, their repositories, and starred repositories to understand patterns, identify development trends, and explore the dynamics of the developer community in Mozambique.

Methodology

The data analysis in the project follows a methodological approach that involves the following steps:

  1. Data Collection: The data used in this project is obtained through the GitHub API. Through the API endpoints, we collect information about users, their repositories, their interactions with other repositories, and other GitHub elements. It is important to note that the analyses are based solely on the publicly available data in the GitHub API, and the availability and accessibility of this data are subject to the platform's policies and limitations.

  2. Data Preprocessing: After collection, we perform a preprocessing step to clean and structure the data. This involves removing irrelevant data, handling null values, converting formats, and other necessary transformations to make the data suitable for analysis.

  3. Analysis: We explore trends and other insights from the collected data. This includes analyzing the entry of new users over time, identifying trending technologies, analyzing the popularity of specific programming languages, and other relevant trends for the GitHub ecosystem.

  4. Interactive Visualizations: We create interactive visualizations to facilitate data exploration and understanding. We use visualization libraries such as Matplotlib to create charts, plots, and other visual representations of the analysis results.

Results and Insights

Some examples of obtained insights include:

  • Distribution of users by province: Allows identifying which provinces have a significant presence of users on the platform.

  • Popularity of programming languages in Mozambique over time: This analysis helps us understand trends and preferences regarding programming languages in the country.

  • Entry of Mozambicans on GitHub over time: Displays the growth and adoption trends of GitHub by the developer community in Mozambique.

  • Trending topics in Mozambique: Involves identifying trending topics within the developer community in Mozambique, using topics analysis on repositories. This helps us understand the areas of interest and focus of the country's developer community.

  • Percentage of users who liked national repositories: Allows us to evaluate the involvement and support of Mozambican developers in relation to local projects and initiatives.

Limitations

It is important to acknowledge the limitations of the project in order to interpret the results carefully and understand its constraints. The main limitations include:

  1. Data Limitations: The analyses are based on the data available in the GitHub API. Therefore, any limitations or restrictions imposed by the API, such as request limits or specific data availability, can affect the scope and accuracy of the analyses. Additionally, the quality and consistency of the data depend on the accuracy and updating of the information provided by GitHub users.

  2. Public Access Restrictions: When using public GitHub data, it is important to remember that not all repositories are publicly available. Therefore, the analysis may be limited only to accessible public data. This results in limitations in analyzing private repositories.

  3. Assumptions and Generalizations: During the analysis, assumptions and generalizations may be made based on the available data. These assumptions may not be applicable to all contexts or may not fully reflect the complexity and diversity of projects and contributions on GitHub.

  4. Analysis Biases: Data analysis is subject to inherent biases, such as selection biases or sampling biases. For example, the choice of specific repositories or contributors for analysis may introduce biases in interpreting the results. It is important to be aware of these biases and interpret the results with caution.

  5. Location: It is necessary to take into account that the analyses performed in this project are restricted to users from Mozambique who have registered their location on GitHub. There may be other Mozambican users with outdated or missing location information, which may impact the representativeness of the obtained results.

Prerequisites

Before running the project, make sure you meet the following prerequisites:

Execution Sequence:

Step Description Notebook Path
1 Data Collection: Get Users IDs data_collect/get_users_ids.ipynb
2 Cleaning and Structuring: Structure IDs cleaning_and_structuring/structure_ids.ipynb
3 Data Collection: Get Users Data data_collect/get_users_data.ipynb
4 Data Collection: Get Repos Data data_collect/get_repos_data.ipynb
5 Data Collection: Get Starred Data data_collect/get_starred_data.ipynb
6 Visualization: Users Insights visualization/users_insights.ipynb
7 Visualization: Repos Insights visualization/repos_insights.ipynb
8 Visualization: Starred Insights visualization/starred_insights.ipynb

Contribution

If you wish to contribute to this project, feel free to open an issue with your suggestions or submit a pull request with your changes. Your contribution will be greatly appreciated!


Made with

Python Jupyter Notebook Pandas Matplotlib NumPy

Author