GitHub - TestRoots/travistorrent-tools: Tools used to create the data on TravisTorrent (http://travistorrent.testroots.org).

This repository contains the tools used to generate the data on TravisTorrent: These include the

Travis Poker (bin/travis_poker.rb), which pokes en-mass whether a project has a Travis build history,
Travis Harvester which downloads Travis build logs (bin/travis_harvester.rb),
Travis BuildLog Analyzer (bin/buildlog_analysis.rb)
Build Metadata extractor (bin/build_data_extraction.rb)

Installing required dependencies

The following works on Debian Jessie

$ apt-get install ruby ruby-dev bundler pkg-config libmysqlclient-dev
$ git clone git@github.com:TestRoots/travistorrent-tools.git
$ cd travistorrent-tools
$ bundle install

Running the data extraction process

The file projects.txt contains a list of non-toy, non-fork, active GitHub projects. It was retrieved from GHTorrent by running the query:

select u.login, p.name, p.language, count(*)
from projects p, users u, watchers w
where
    p.forked_from is null and
    p.deleted is false and
    w.repo_id = p.id and
    u.id = p.owner_id
group by p.id
having count(*) > 50
order by count(*) desc

You can then call the Travis Poker to see whether these projects use Travis CI or not. Projects will be annotated with a binary flag indicating this.

To further process the list generated by Travis Poker, do

grep "true" results.csv > travis_enabled
sed -i 's/\([^,]*\),\([^,]*\).*/\1 \2/' travis_enabled

This list can now be passed to the Travis Harvester, for which we use parallel.

Retrieve build logs of 20 GH project simultaneously (beware, depending on your network connection this puts a heavy load on Travis-CI!)

cat travis-enabled | parallel -j 20 --colsep ' ' ruby bin/travis_harvester.rb

Extracting GitHub features about each build

To extract features for one project, do

ruby -Ibin bin/build_data_extraction.rb stripe brushfire github-token

where github-token is a valid GitHub OAuth token used to download information about commits. To configure access to the required GHTorrent MySQL and MongoDB databases, copy config.yaml.tmpl to config.yaml and edit accordingly. You can have direct access to the GHTorrent MySQL and MongoDB databases using this link.

To extract features for multiple projects in parallel, you need

A file (project-list) of projects, in the format specified above
A file (token-list) of one or more Github tokens, one token per line

Then, run

./bin/project_token.rb project-list token-list | sort -R > projects-tokens
./bin/all_projects.sh -p 4 -d data projects-tokens

this will create a file with tokens equi-distributed to projects a directory data, and start 4 instanced of the build_data_extraction.rb script

Analyzing Buildlogs

Our buildlog dispatcher handles everything that you typically want: It generates one convenient output file (a CSV) per project directory, and invokes an automatically dispatched correct buildlog analyzer. You can start the per-project analysis (typically on a directory structured checkedout through travis-harvester) via

ruby bin/buildlog_analysis.rb directory-of-project-to-analyze

To start to analyze all buildlogs, parallel helps us again:

ls build_logs | parallel -j 5 ruby bin/buildlog_analysis.rb "build_logs/{}"

Travis Breaking the Build

http://docs.travis-ci.com/user/customizing-the-build/

broken <- (errored|failed) errored <- infrastructure failed <- tests canceled <- user abort

Breaking the Build

If any of the commands in the first four stages returns a non-zero exit code, Travis CI considers the build to be broken.

When any of the steps in the before_install, install or before_script stages fails with a non-zero exit code, the build is marked as errored.

When any of the steps in the script stage fails with a non-zero exit code, the build is marked as failed.

Note that the script section has different semantics to the other steps. When a step defined in script fails, the build doesn’t end right away, it continues to run the remaining steps before it fails the build.

Currently, neither the after_success nor after_failure have any influence on the build result. Travis have plans to change this behaviour

Name		Name	Last commit message	Last commit date
Latest commit History 691 Commits
bin		bin
dev_logs		dev_logs
lib		lib
sanitize_data		sanitize_data
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Rakefile		Rakefile
config.yaml.tmpl		config.yaml.tmpl
gh-active-projects.csv		gh-active-projects.csv
travis-analyzer.gemspec		travis-analyzer.gemspec
travis-doc		travis-doc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installing required dependencies

Running the data extraction process

Extracting GitHub features about each build

Analyzing Buildlogs

Travis Breaking the Build

Breaking the Build

About

Releases

Packages

Contributors 5

Languages

TestRoots/travistorrent-tools

Folders and files

Latest commit

History

Repository files navigation

Installing required dependencies

Running the data extraction process

Extracting GitHub features about each build

Analyzing Buildlogs

Travis Breaking the Build

Breaking the Build

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages