Baital is a test suite generator for large configurable systems with high t-wise coverage. The tool takes a set of constraints on features of the configurable system and outputs a set of configurations of a specified size and computes their t-wise coverage. In addition, the tool incorporates novel approximation techniques for the computation of t-wise coverage.
In the latest version we have added the support for the multi-valued or numerical features. baital_nf.py
is the entry point, all optional arguments are identical to the baital.py
. The description of the input format is provided below and its workflow here.
Linux OS with Python 3.6 or higher, pip3, and git.
Tested on: Ubuntu, Debian, and CentOS.
Detailed instructions can be found in Install.md.
- Install libraries required by additional tools:
build-essential
cmake
graphviz
libgmp-dev
libmpfr-dev
libmpc-dev
zlib1g-dev
libboost-program-options-dev
libboost-serialization-dev
z3
.sudo apt install
on Debian-based systems
- Install additional python libraries.
pip3 install -r requirements.txt
- (Optional) Install d4 and copy d4 binary to
bin/
folder and ensure execute permission. (Required only for WAPS sampling).
An alternative for installation is to build a docker image with:
docker build ./ -t baital
In order to perform sampling using docker, container inputs and outputs has to be passed via volumes. The run command shall be similar to:
docker run --mount type=bind,source=/path/to/test.cnf,target=/baital/src/test.cnf \
--mount type=bind,source=/path/to/results/,target=/baital/src/results/ baital baital.py test.cnf
Note that docker requires absolute paths for binding. There must be at least two bindings: an inputfile (or a folder on the path to inputfile, but in this case the remaining part of the path must be included in baital.py argument) and a folder where the results would be generated. Other arguments can be passed as usual; note that all files in the arguments must be injected with volumes. In the command above, the first mount binds the test.cnf file and the second mount binds the results folder. If --outputdir
is used for Baital, the second mount target shall be modified accordingly. For --preprocess-file
, the file should either be placed in /path/to/results/ folder (or alternative monted folder) or be mounted separately.
For the binary features, Baital expects input files in Dimacs CNF format. FeatureIDE can be used to generate input files from configurable systems. For configurable systems with multi-valued/numerical features Baital expects constraints described in QF_BV logic. We also partially support Clafer-like format as input that is converted into QF_BV automatically. Benchmarks can be found in this repository or at or at .
Baital uses an adaptive weighted sampling for the test suite generation for configurable systems. Contrary to uniform sampling where all samples are taken uniformly, Baital sampling is performed in rounds: at the beginning of each round a probability distribution for sampling is set by assigning weights to each literal. For an assigned set of weights two sampling tools are supported: CMSGen (default) and WAPS with option --waps. Note that CMSGen is significantly faster and provides better coverage. The selection of weights for each round is controlled by one of the strategies. All strategies compute weights for each literal; the computation is based on parameters listed below.
- Strategy 1: ratio between the number of combinations with a literal in the current sample set and the number of combinations with the literal allowed by constraints of the configurable system.
- Strategy 2: ratio between the number of combinations with a literal in the current sample set and choice of twise distinct elements in NVariables.
- Strategy 3: ratio between the approximate number of combinations with a literal in the current sample set and the number of combinations with the literal allowed by constraints of the configurable system.
- Strategy 4: ratio between the approximate number of combinations with a literal in the current sample set and choice of twise distinct elements in NVariables.
- Strategy 5: the number of samples the literal appears in the current sample set.
Baital performs 3 steps:
- Step 1: Preprocessing. Used for strategies 1 and 3. Preprocessing results are stored in a file (.comb or .acomb extensions) and can be reused later. Has 2 options that are switched with --preprocess-approximate option:
- (i) Lists the combinations of size allowed by cnf constraints (could take hours for twise=2, infeasible for large models for twise=3). Results could be reused at step 3. Precomputed file can be provided with --preprocess-file option, expected to have .comb extension.
- (ii) Lists the approximate number of combinations of size for each literal (Could take 1 hour for twise=2, several hours for twise=3, tens of hours for twise>3). Precomputed file can be provided with --preprocess-file option, expected to have .acomb extension. --preprocess-delta and --preprocess-epsilon set the PAC guarantees.
- Step 2: Sampling. Generation of samples with high t-wise coverage. Samples are stored in / (default results/<cnf_filename>.samples).
- twise option provides a size of combinations to maximize coverage
- strategy defines how the weights are generated between rounds
- samples number of samples to generate
- rounds number of rounds for sample generation, weights are updated between rounds. Higher coverage is expected with large number of rounds, however each round requires additional computations that might affect performance.
- weight-function a function transforming the ratios computed by strategies 1, 2, 3, and 4 into weights. Varying this parameter might affect the resulted coverage
- Step 3: Computation of twise coverage. It computes a ratio between the number of distinct combinations in the sample set and the number of distinct feature combinations satisfying the constraints of the system. The latter number is obtained from the results of Step 1.(i) or runs Step 1.(i) (could take hours for twise=2, infeasible for large models for twise=3). Alternatively, both elements of the ratio can be computed approximately. --cov-approximate forces approximate computation. Precomputed file can be provided with --maxcov-file (expects .comb extension). --no-maxcov computes only number of distinct combinations in the sample set.
Each step can be executed separately. Options --preprocess-only, --preprocess-file, and --no-sampling control the execution. Execution of a single step can also be achieved by calling corresponding scripts: cnf_combinations.py
for step 1, sampling.py
for step 2, for step 3 elements of the ratio are computed with sample_set_combinations.py
and cnf_combinations.py
.
The sampling with multi-valued features is performed by converting them into CNF and running baital's sampling. In particular, the first step is the convertation of constraints into CNF with z3. The next steps are Baital's preprocessing (step 1 above) and sampling (step 2 above). Any strategy can be used. The resulted samples are converted back into the original features. The last step is the coverage computation with the generalized algorithms supporting both exact and approximate computations. Note that unlike the binary case, preprocessing results cannot be reused for coverage computation.
Run command python3 baital.py <arguments>
or python3 baital_nf.py <arguments>
from src
folder for systems with binary and multi-valued features, respectively.
-
Required
- cnf_file.cnf in dimacs format for
baital.py
- qf_bv_file.smt or clafer_file.txt for
baital_nf.py
- cnf_file.cnf in dimacs format for
-
Optional
Argument | Type | Default value | Description |
---|---|---|---|
--outputdir | str | "./results/" | Output directory |
--outputfile | str | "" | Output file for samples, placed in outputdir, default <cnf_filename>.samples |
--twise | int | 2 | t value for t-wise coverage |
--seed | int | random seed | |
--preprocess-only | Only perform preprocessing | ||
--no-sampling | Computes the coverage of an existing test suite. Requires --samples-file argument | ||
--preprocess-file | str | "" | Precomputed file for skipping preprocessing step. Shall have .comb, or .acomb extension |
--preprocess-approximate | Approximate computation of preprocessing | ||
--preprocess-delta | float | 0.25 | Delta for approximate counting at preprocessing. Unused if not --preprocess-approximate |
--preprocess-epsilon | float | 0.25 | Epsilon for approximate counting at preprocessing. Unused if not --preprocess-approximate |
--strategy | int | 5 | Weight generation strategy |
--rounds | int | 10 | Number of rounds for sample generation |
--samples | int | 500 | Total number of samples to generate |
--desired-coverage | float | Samples are genereted until the desired coverage is reached or --rounds is completed. Cannot be used with --samples | |
--extra-samples | Each round 10x samples are generated and the best are selected | ||
--waps | Use WAPS instead of CMSGen as a sampling engine. | ||
--samples-per-round | int | 50 | Number of samples generated per round if --desired-coverage is set |
--weight-function | int | 2 | Function number between 1 and 7 for weight generation, used in strategies 1, 2, 3, and 4 |
--samples-file | str | File with samples to compute the coverage for --no-sampling option. Shall have .samples extension (or .nsamples for baital_nf.py) | |
--maxcov-file | str | File with pregenerated list of satisfiable feature combinations for the step 3. Shall have .comb extension | |
--no-maxcov | Compute only the number of distinct combinations in a test suite instead of coverage | ||
--cov-approximate | Compute the coverage approximately | ||
--cov-delta | float | 0.05 | Delta for approximate computation of coverage. Unused if not --cov-approximate |
--cov-epsilon | float | 0.05 | Epsilon for approximate computation of coverage. Unused if not --cov-approximate |
--outputfile-readable | str | "" | Only for baital_nf.py. Output file for samples in readable format, placed in outputdir, default <qf_bv_file>.rsamples |
We assume that features are numbered from 1 to nFeatures. Constraints of configurable system are expected in Dimacs CNF format.
Outputs are located in --outputdir (./results/ by default). Output file contains the list of generated samples. Each line contains sample index, comma, and a list of space-separated literals. Positive literals correspond to selected features and negative to non-selected. For example, line 2, 1 -2 3
corresponds to a second sample in which variables 1 and 3 are True (selected) and variable 2 is False (non-selected). This format is expected for --samples-file option with --no-sampling. Outputfile.txt contains the number of combinations in the generated samples and the t-wise coverage if computed.
Preprocessing can output the following types of files that can be reused with --preprocess-file or --maxcov-file (only .comb file) options:
- cnf_file_x.comb files where x in [1, twise] contain lists of feature combinations of size x that are satisfiable by the input cnf formula. Each line contains a space-separated satisfiable combination of literals.
- cnf_file_x.acomb file contains a list of approximations of the number of feature combinations of size x involving a literal for each literal. Each line contains a literal and a number of combinations.
Last lines of the console output contain the following information:
- Time taken by each step and total time.
- (Approximate) number of feature combinations in the sample set.
- (Approximate) coverage (skipped if --no-maxcov selected).
The constraints shall be provided with QF_BV logic in SMTLIB2 format. All features that have 2 values have to be declared as Bool, all features that have more values have to be declared with the smallest BitVector. First 3 lines are expected to be comments (starting with ;;
). The first line contains space-separated pairs feature_name=nValues
where nValues is the number of possible values for the feature. The order of features in the output sample set is the same as in this line. Contents of the second and third comment lines are optional and used to recover original values for numerical features, otherwise it is assumed that they have values between 0
and nValues-1
. Second line might contain space-separated pairs feature_name=minValue
indicating that the original feature take values between minValue
and minValue+nValues-1
. Third line might contain space-separated pairs feature_name=[0:val1,1:val2,...(nValues-1):valn]
indicating that the original feature has a set of values val1...valn
. Note that constraints should be for the features with values between 0
and nValues-1
. For example, if feature1 can take values 5, 6, 7, and 8, and feature2 has values 2, 4, 8, and 16, the first lines of the file are:
;; feature1=4 feature2=4
;; feature1=5
;; feature2=[0:2,1:4,2:8,3:16]
Alternative input is a clafer-like textual description. First line is always abstract <something> <top-level-feature>
, where top-level feature is binary and always included in any valid configuration. The following lines define features and currently the following options are supported 1) binary feature name; 2) xor <binary_feature_name>
followed by several lines with binary features that have higher tab level - a selection of binary_feature_name
implies a selection of an exactly one feature from the next lines; 3) mv_feature -> integer
- a multi-valued feature 4) [mv_feature_constraint]
- simple constraint defining possible values for a previously defined multi-valued feature, several types of constraints are supported: i) [mv_feature </> int_value]
ii) [mv_feature = v1 || mv_feature_name = v2 || ...]
iii) [mv_feature1 + mv_feature2 </> int_value]
under assumption that both features have sequential values starting from 0. Examples of supported inputs can be found in benchmarks (Note that Trimesh.txt cannot be automatically converted).
baital_nf.py outputs two files with samples (by default with extensions .nsamples
and .rsamples
). .nsamples
file has in the first line a space-separated list of the number of possible values for each feature in the same order as in the first line of the SMTLIB2 input file. The following lines represent generated samples: sample index, comma, and a list of space-separated values between 0 and the maximum value for each feature (order is the same as in the first line). This file is used for the coverage computation. .rsamples
contains a readable format: each line starts with sample index, followed by a comma, followed by a space-separated pairs <feature_name>=value
, where value is the original value of the feature. For the example above, the .nsamples
file could look like:
4 4
1, 1 2
2, 3 0
and .rsamples
:
1, feature1=6 feature2=16
2, feature2=7 feature2=2
Preprocessing step could take large time depending on the type of preprocessing, twise, and the number of variables. Several hours are required for exact computation for twise=2
.
For large benchmarks and twise >=3
approximate methods shall be used: for preprocessing --preprocess-approximate option; for sampling - strategies 3, 4, or 5; for computation of coverage --cov-approximate.
Evaluation results for Baital with WAPS engine have been published at DOI (Note that Strategy 4 in the paper is Strategy 5 in the current version of Baital). Evaluation results for the approximation algorithms have been published at DOI. Evaluation results for Strategy 3 and for CMSGen engine have been published at DOI.
Evaluation results and options used for their generation are in the repository.
If you use our tool, please cite us with the following bibtex:
@inproceedings{BLM20,
title={Baital: An Adaptive Weighted Sampling Approach for Improved t-wise Coverage},
author={Baranov, Eduard and Legay, Axel and Meel, Kuldeep S},
booktitle={Proc. 28th European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
year={2020}
}
@inproceedings{BCLMV22,
title={A Scalable t-wise Coverage Estimator},
author={Baranov, Eduard and Chakraborty, Sourav and Legay, Axel and Meel, Kuldeep S and Variyam, Vinodchandran N.},
booktitle={Proc. 44th International Conference on Software Engineering (ICSE 2022)},
year={2022}
}
@inproceedings{BL22,
title={Baital: an adaptive weighted sampling platform for configurable systems},
author={Baranov, Eduard and Legay, Axel},
booktitle={Proceedings of the 26th ACM International Systems and Software Product Line Conference-Volume B},
pages={46--49},
year={2022}
}