collect-data-from-github

A little tool collection to help you collecting data from GitHub for research. This tool is based on my blog post: Systematic review of repositories on GitHub with python (Game Dev Style)

Note: This repository is inspired by the work of: Department of Information and Computing Sciences, Utrecht University: A Systematic Review of Open Source Clinical Software on GitHub for Improving Software Reuse in Smart Healthcare by Zhengru Shen and Marco Spruit.

Install

$ git clone https://github.com/simonrenger/collect-data-from-github.git
$  pip install PyGithub
$ pip install pandas

How to use tool `collect.py`

Call the help function:

python collect.py --help

You need to provide a config.json file:

Field	Type	Optional	Description
token	string	Yes	If present it should contain a valid GitHub Token. You can obtain it here: Settings/Token. Scopes: `repos`. If not provided `--token {TOKEN}` needs to be used
readme_dir	string	Yes	If present the tool will automatically download GitHub readme files into this location.
output	string	Yes	If present the tool will store the found data in this location. Default: `./`
format	string	yes	If present it determines the output format. Valid input: `JSON`, `CSV`, `HTML`, `MARKDOWN`. Default: `CSV`
criteria	object	No	Must contain a entry called `time` with the fields `min` or `max`
terms	array	No	List of search terms in accordance to the GitHub Syntax API: Understanding the search syntax
attrs	array	No	List of attributes from the repo GitHub REST API object

Note: There is a sample config in the samples folder

The previous command will give you some ideas on how to run it. But there is a faster way:

python collect.py config.json

And if you want to pass a token along:

python collect.py --token my_token config.json

Roadmap

Add more criteria to filter repos on e.g. Languages
Add possibility to avoid archived repos if wanted

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
samples		samples
LICENSE		LICENSE
README.md		README.md
collect.py		collect.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

collect-data-from-github

Install

How to use tool `collect.py`

Roadmap

About

Languages

License

simonrenger/collect-data-from-github

Folders and files

Latest commit

History

Repository files navigation

collect-data-from-github

Install

How to use tool collect.py

Roadmap

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

How to use tool `collect.py`