Test automation

Our testing system is suffering from a mild case of feature creep and everybody working on his pet project, with no consensus about the end point where this should be going.

So this page is intended to collect final goals. Once they are implemented well enough that all final goals are visible in the test code, this page can go away.

This page is currently in "brainstorming mode": Add new ideas, but do not judge, modify, or delete ideas. Every idea should get its own paragraph and be signed off with the author's name so we can see who has an interest in what areas, can ask for clarification, etc. (Evaluation and consolidation will come in later phases.)

Desired properties of changes

Each change should have concrete advantages. (toolforger)

No feature should require long-term commitment in terms of manpower. Rationale: SymPy is about doing symbolic math; our primary expertise is math, not testing infrastructure; we don't want to tie up manpower resources with infrastructure maintenance, the infrastructure should "just work" as far as possible. (toolforger)

Each change should have a clear implementation path. (toolforger)

Environmental variations that tests could/should run under

Python version: all versions that SymPy advertises as supported, i.e. 2.5 to 3.2 (?). (toolforger)

Ground types: Python, NumPy, maybe others. (toolforger)

Operating systems: Windows, Linux, Mac OSX(?). This is mostly to avoid imposing OS constraints on contributors, though plotting may need to be tested across operating systems. (toolforger)

Processor (Intel, AMD, maybe more). This can be relevant for tests that involved floating-point calculations: the least significant bits can vary. This is because IEEE isn't as completely defined as one might think, and if you offload floats into the GPU you don't even get IEEE guarantees. (toolforger)

Word size (32 vs. 64 bit). Python's implementation of hashes will put entries into different slots, which means that tests that directly or indirectly rely on the order in which entries come out of a hash will find different results in 32 and in 64 bits, causing spurious error messages. (toolforger)

Proposed modes of operation

Background testing for contributors while they are working on a branch: Whenever a file is modified, re-run all tests that may be affected by it. Finding out what file depends on what other files is nontrivial, so maybe this can't be done. (toolforger)

Background testing for contributors while they prepare to upload to a pull request: Provide a script that clones the workdir, runs a full test suite. Reports back in case of failures, pushes to the pull request on GitHub in case of success. (toolforger)

Full testing for reviewers: Provide a script that runs a full test suite on somebody else's pull request. Failures are uploaded as comments to the pull request. (toolforger)

Full testing for project admins: Provide a script that merges a pull request to sympy/sympy, runs the full test suite, and either reports failures as comments on the pull request or commits the merge and pushes it back to sympy/sympy on GitHub. (toolforger)

Modes of operation that do something in case of success should be restrictable to doing nothing but reporting success to the person that started the test suite. (toolforger)

Tests could be run locally on the developer's machine, or remotely on a testing server. The latter case may actually be the domain of some CI software that we shouldn't write, we should just make sure the test suite interoperates well with the CI-provided environment. (toolforger)

There should be a mode that runs the tests most likely to fail first. (toolforger)

There should be a mode that keeps as many CPU cores busy as possible. (toolforger)

There should be a mode that allows running tests in order or reverse order of expected running time. (toolforger)

There should be a mode that runs an individual test that does not have differences in the command syntax or result output depending on whether it's a doctest or not. (toolforger)

Mechanisms

The test suite should have a mode in which is simply enumerates all tests. An outer layer can then schedule and prioritize individual tests according to various criteria. (toolforger)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test automation

Desired properties of changes

Environmental variations that tests could/should run under

Proposed modes of operation

Mechanisms

Clone this wiki locally