-
Notifications
You must be signed in to change notification settings - Fork 1
Modified version of Daikon
Daikon is an open-source tool that applies machine learning techniques to detect likely invariants in programs by monitoring test executions. This monitoring process involves observing the program state at designated program points, initially considering all possible invariants as valid. Those invariants that are not violated by any execution are reported as likely invariants. Daikon operates by analyzing an instrumented version of a software execution, generated by an instrumenter or front-end. This instrumentation produces a declaration file and a data trace file. The declaration file describes the structure of program points in terms of input and output variables. The data trace file contains the values assigned to the variables in each execution. Instrumenters are available for various programming languages and data formats, including Java, Perl, C++, and CSV.
Overall, our customized version of Daikon supports a total of 105 distinct types of invariants for REST APIs, classified into five categories:
-
Arithmetic comparisons (48 invariants). Specify numerical bounds (e.g.,
size(return.artists[]) >= 1
) and relations between numerical fields (e.g.,input.limit >= size(return.items[])
). -
Array properties (23 invariants). Represent comparisons between arrays, such as subsets, supersets, or fields that are always member of an array (e.g.,
return.hotel.hotelId in input.hotelIds[]
). -
Specific formats (22 invariants). Specify restrictions regarding the expected format (e.g.,
return.href is Url
) or length (e.g.,LENGTH(return.id)==22
) of string fields. -
Specific values (9 invariants). Restrict the possible values of fields (e.g.,
return.visibility one of {"public", "private"}
). -
String comparisons (3 invariants). Specify relations between string fields, such as equality (e.g.,
input.name == return.name
) or substrings (e.g.,input.id is a substring of return.href
).
This section describes the flags and configuration options available when running the provided Docker image containing our modified Daikon version.
-
use_modified_daikon_version
(default:true
): If true, the invariants will be detected using the modified version of Daikon proposed in our paper. If false, it will use the default version of Daikon, except for 13 types of invariants that were suppressed to avoid a combinatorial explosion of reported invariants. -
suppress_input_invariants
(default:true
): If true, the invariants that only affect input variables will not be reported. -
suppress_arithmetic_comparisons
(default:false
): If true, the invariant of the Arithmetic comparisons category will not be reported. -
suppress_string_comparisons
(default:false
): If true, the invariant of the String comparisons category will not be reported. -
suppress_specific_formats
(default:false
): If true, the invariant of the Specific formats category will not be reported. -
suppress_specific_values
(default:false
): If true, the invariant of the Specific values category will not be reported. -
suppress_array_properties
(default:false
): If true, the invariant of the Array properties category will not be reported.
For instance, to run our modified version of Daikon suppressing the invariants of the Arithmetic comparisons category, we use the following command:
docker run --rm -v C:\myUser\instrumentation:/files java -jar javalenzuela/daikon_modified.jar /files/declsFile.decls /files/dtraceFile.dtrace suppress_arithmetic_comparisons=true > invariants_without_arithmetic_comparisons.csv
This section describes the changes performed in Daikon version 5.8.10 (released on November 2021)
Section Invariants disabled in the default Daikon configuration contains the 13 invariants that we disabled in the default Daikon configuration to avoid a combinatorial explosions of string comparisons and a high number of false positives.
-
IsUrl: Indicates that the value of a string variable is always a URL. Represented as
x is Url
. -
SequenceStringElementsAreUrl: Indicates that all elements of an array of strings are URLs. Represented as
All the elements of x are URLs
. -
FixedLengthString: Indicates that the value of a string variable always has a fixed length n. Represented as
LENGTH(x)==n
. -
SequenceFixedLengthString: Indicates that all the elements of an array of strings have a fixed length n. Represented as
All the elements of x have LENGTH=n
. -
IsNumeric: Indicates that the characters of a string variable are always numeric. Represented as
x is Numeric
. -
SequenceStringElementsAreNumeric: Indicates that the characters of all the elements of an array of strings are always numeric. Represented as
All the elements of x are Numeric
. -
IsEmail: Indicates that the value of a string type variable is always an email. Represented as
x is Email
. -
SequenceStringElementsAreEmail: Indicates that all elements of an array of strings are emails. Represented as
All the elements of x are emails
. -
IsDateDDMMYYYY: Indicates that the value of a string variable is always a date following the format DD/MM/YYYY (the separator can be “/” or “-“). Represented as
x is a Date. Format: DD/MM/YYYY
. -
SequenceStringElementsAreDateDDMMYYYY: Indicates that all the elements of an array of strings are dates following the format DD/MM/YYYY. Represented as
All the elements of x are dates
. Format: DD/MM/YYYY. -
IsDateMMDDYYYY: Indicates that the value of a string variable is always a date following the format MM/DD/YYYY (the separator can be “/” or “-“). Represented as
x is a Date. Format: MM/DD/YYYY
. -
SequenceStringElementsAreDateMMDDYYYY: Indicates that all the elements of an array of strings are dates following the format MM/DD/YYYY. Represented as
All the elements of x are dates
. Format: MM/DD/YYYY. -
IsDateYYYYMMDD: Indicates that the value of a string variable is always a date following the format YYYY/MM/DD (the separator can be “/” or “-“). Represented as
x is a Date. Format: YYYY/MM/DD
. -
SequenceStringElementsAreDateYYYYMMDD: Indicates that all the elements of an array of strings are dates following the format YYYY/MM/DD. Represented as
All the elements of x are dates. Format: YYYY/MM/DD
. -
IsHour: Indicates that the value of a string variable is always a time in 24-hour format. Represented as
x is Hour: HH:MM 24-hour format, optional leading 0
. -
SequenceStringElementsAreHour: Indicates that all elements of an array of strings are hours in 24-hour format. Represented as
All the elements of x are Hours: HH:MM 24-hour format, optional leading 0
. -
IsHourAMPM: Indicates that the value of a string variable is always a time in 12-hour format. Represented as
x is Hour: HH:MM 12-hour format, optional leading 0
. -
SequenceStringElementsAreHourAMPM: Indicates that all elements of an array of strings are hours in 12-hour format. Represented as
All the elements of x are Hours: HH:MM 12-hour format, optional leading 0, mandatory meridiems (AM/PM)
. -
IsHourWithSeconds: Indicates that the value of a string variable is always a time in 24-hour format, including seconds. Represented as
x is Hour: HH:MM:SS 24-hour format with optional leading 0
. -
SequenceStringElementsAreHourWithSeconds: Indicates that all elements of an array of strings are hours in 24-hour format, including seconds. Represented as
All the elements of x are Hours: HH:MM:SS 24-hour format with optional leading 0
. -
IsTimestampYYYYMMHHThhmmssmm: Indicates that the value of a string variable is always a timestamp. Represented as x is Timestamp. Format:
YYYY-MM-DDTHH:MM:SS.mmZ (Miliseconds are optional)
. -
SequenceStringElementsAreTimestampYYYYMMHHThhmmssmm: Indicates that all elements of an array of strings are timestamps. Represented as
All the elements of x are Timestamps. Format: YYYY-MM-DDTHH:MM:SS.mmZ (Miliseconds are optional)
.
Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.
- daikon.inv.unary.scalar.NonZero
- daikon.inv.unary.scalar.NonZeroFloat
- daikon.inv.unary.scalar.RangeInt.PowerOfTwo
- daikon.inv.unary.sequence.EltNonZero
- daikon.inv.unary.sequence.EltNonZeroFloat
- daikon.inv.unary.sequence.EltRangeInt.PowerOfTwo
- daikon.inv.binary.twoScalar.IntNonEqual
- daikon.inv.binary.twoScalar.FloatNonEqual
- daikon.inv.binary.twoScalar.LinearBinary
- daikon.inv.binary.twoScalar.LinearBinaryFloat
- daikon.inv.binary.twoString.StringNonEqual
- daikon.inv.binary.twoString.StringLessThan
- daikon.inv.binary.twoString.StringGreaterThan
- daikon.inv.binary.twoString.StringLessEqual
- daikon.inv.binary.twoString.StringGreaterEqual
- daikon.inv.binary.twoSequence.SeqSeqStringEqual
- daikon.inv.binary.twoSequence.SeqSeqStringLessThan
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterThan
- daikon.inv.binary.twoSequence.SeqSeqStringLessEqual
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterEqual
- daikon.inv.binary.twoSequence.PairwiseStringLessThan
- daikon.inv.binary.twoSequence.PairwiseStringGreaterThan
- daikon.inv.binary.twoSequence.PairwiseStringLessEqual
- daikon.inv.binary.twoSequence.PairwiseStringGreaterEqual
- daikon.inv.ternary.threeScalar.LinearTernary
- daikon.inv.ternary.threeScalar.LinearTernaryFloat
- daikon.inv.binary.twoScalar.NumericInt.Divides
- daikon.inv.binary.twoScalar.NumericInt.Square
- daikon.inv.binary.twoScalar.NumericFloat.Divides
- daikon.inv.binary.twoScalar.NumericFloat.Square
- daikon.inv.binary.twoSequence.PairwiseNumericInt.Divides
- daikon.inv.binary.twoSequence.PairwiseNumericInt.Square
- daikon.inv.binary.twoSequence.PairwiseNumericFloat.Divides
- daikon.inv.binary.twoSequence.PairwiseNumericFloat.Square
- daikon.inv.binary.twoSequence.PairwiseLinearBinary
- daikon.inv.binary.twoSequence.PairwiseLinearBinaryFloat
Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.
-
daikon.inv.binary.twoString.StdString.SubString: We applied two modifications on this invariant:
-
We added a property that sets a minimum string length (2 by default) in order to avoid reporting that a string that is always empty is always a substring of every other string in the program point.
-
This invariant may report redundant information that would bloat our results, for example, if x is a substring of y and y is a substring of z, Daikon would report the invariants: “x is substring of y”, “y is substring of z” and “x is substring of z”. In this situation we consider the “x is substring of z” invariant redundant and modified Daikon so it would not report redundant invariants, in order to avoid bloating the results.
-
-
daikon.inv.binary.twoSequence.SubSequence
-
daikon.inv.binary.twoSequence.SubSequenceFloat
-
daikon.inv.binary.twoSequence.SubSet
-
daikon.inv.binary.twoSequence.SubSetFloat
-
daikon.inv.binary.twoSequence.SuperSequence
-
daikon.inv.binary.twoSequence.SuperSequenceFloat
-
daikon.inv.binary.twoSequence.SuperSet
-
daikon.inv.binary.twoSequence.SuperSetFloat
At the end of the Daikon execution, the computeConfidence
function is executed for every invariant that has not been falsified, returning a number between 0 and 1.
If this number achieves a certain threshold, the invariant is considered to be statistically justified and it is reported to the user. We found some invariants for which
Daikon always returned a confidence of 1.
protected double computeConfidence() {
return CONFIDENCE_JUSTIFIED;
}
We modified these functions to return 1 only if there was at least one occurrence of the invariant during the program execution.
protected double computeConfidence() {
if (ppt.num_samples() == 0) {
return CONFIDENCE_UNJUSTIFIED;
}
return CONFIDENCE_JUSTIFIED;
}
These are the invariants for which we modified the computeConfidence
function:
- Unary:
- Scalar:
- RangeFloat
- RangeInt
- Sequence:
- EltRangeFloat
- EltRangeInt
- Scalar:
- Binary:
- SequenceString:
- MemberString
- SequenceScalar:
- Member
- MemberFloat
- TwoSequence:
- Reverse
- ReverseFloat
- SubSequence
- SubSequenceFloat
- Subset
- SequenceString:
A derived variable is an expression that does not appear directly in the instrumented program,
but which Daikon uses as a variable for invariant detection.
For example, for array variables, Daikon generates a derived variable whose value is its size,
allowing to detect invariants such as input.limit >= size(return.items)
.
However, there are derived variables that do not provide any relevant information, this is the case of the derived variable
orig()
, which indicates the value of a variable at the input program point and is used to make comparisons with
the value of the same variable at the output program point. In the context of black box testing of RESTful APIs, the value
of the input parameters does not change at any time, so Daikon would only detect equality invariants between the parameter
value at the input and output (input.limit == orig(input.limit)
) that will always be satisfied.
For this reason, this derived variable has been disabled.
In some cases, Daikon applies the shift
derived variable (consisting on decreasing the value of a numerical variable by a shift value)
to try to find relations between numerical variables (e.g., input-1 > return
), we disabled this derived variable because it would most likely
report misleading information.
Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.
- daikon.inv.binary.twoString.StringNonEqual
- daikon.inv.binary.twoString.StringLessThan
- daikon.inv.binary.twoString.StringGreaterThan
- daikon.inv.binary.twoString.StringLessEqual
- daikon.inv.binary.twoString.StringGreaterEqual
- daikon.inv.binary.twoSequence.SeqSeqStringLessThan
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterThan
- daikon.inv.binary.twoSequence.SeqSeqStringLessEqual
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterEqual
- daikon.inv.binary.twoSequence.PairwiseStringLessThan
- daikon.inv.binary.twoSequence.PairwiseStringGreaterThan
- daikon.inv.binary.twoSequence.PairwiseStringLessEqual
- daikon.inv.binary.twoSequence.PairwiseStringGreaterEqual