-
Notifications
You must be signed in to change notification settings - Fork 1
Modified version of Daikon
Daikon is an open-source tool that applies machine learning techniques to detect likely invariants in programs by monitoring test executions. This monitoring process involves observing the program state at designated program points, initially considering all possible invariants as valid. Those invariants that are not violated by any execution are reported as likely invariants. Daikon operates by analyzing an instrumented version of a software execution, generated by an instrumenter or front-end. This instrumentation produces a declaration file and a data trace file. The declaration file describes the structure of program points in terms of input and output variables. The data trace file contains the values assigned to the variables in each execution. Instrumenters are available for various programming languages and data formats, including Java, Perl, C++, and CSV.
Overall, our customized version of Daikon supports a total of 106 distinct types of invariants for REST APIs, classified into five categories:
-
Arithmetic comparisons (48 invariants). Specify numerical bounds (e.g.,
size(return.artists[]) >= 1
) and relations between numerical fields (e.g.,input.limit >= size(return.items[])
). -
Array properties (23 invariants). Represent comparisons between arrays, such as subsets, supersets, or fields that are always member of an array (e.g.,
return.hotel.hotelId in input.hotelIds[]
). -
Specific formats (22 invariants). Specify restrictions regarding the expected format (e.g.,
return.href is Url
) or length (e.g.,LENGTH(return.id)==22
) of string fields. -
Specific values (9 invariants). Restrict the possible values of fields (e.g.,
return.visibility one of {"public", "private"}
). -
String comparisons (4 invariants). Specify relations between string fields, such as equality (e.g.,
input.name == return.name
) or substrings (e.g.,input.id is a substring of return.href
).
This section describes the flags and configuration options available when running the provided Docker image containing our modified Daikon version.
-
use_modified_daikon_version
(default:true
): If true, the invariants will be detected using the modified version of Daikon proposed in our paper. If false, it will use the default version of Daikon, except for 13 types of invariants that were suppressed to avoid a combinatorial explosion of reported invariants. -
use_agora_pp
(default:true
). If true, the heuristic defined in section Minimizing false positives for automatically suppressing false positives caused by invariants of the arithmetic comparison category is applied. Unless explicitly stated otherwise, if the default configuration of Daikon is being used (i.e.,use_modified_daikon_version=false
), this parameter is set to false by default. -
suppress_input_invariants
(default:true
): If true, the invariants that only affect input variables will not be reported. -
suppress_arithmetic_comparisons
(default:false
): If true, the invariant of the Arithmetic comparisons category will not be reported. -
suppress_string_comparisons
(default:false
): If true, the invariant of the String comparisons category will not be reported. -
suppress_specific_formats
(default:false
): If true, the invariant of the Specific formats category will not be reported. -
suppress_specific_values
(default:false
): If true, the invariant of the Specific values category will not be reported. -
suppress_array_properties
(default:false
): If true, the invariant of the Array properties category will not be reported.
For instance, to run our modified version of Daikon suppressing the invariants of the Arithmetic comparisons category, we use the following command:
docker run --rm -v C:\myUser\instrumentation:/files javalenzuela/daikon_agora java -jar daikon_modified.jar /files/declsFile.decls /files/dtraceFile.dtrace suppress_arithmetic_comparisons=true > invariants_without_arithmetic_comparisons.csv
This section describes the changes performed in Daikon version 5.8.10 (released on November 2021)
Section Invariants disabled in the default Daikon configuration contains the 13 invariants that we disabled in the default Daikon configuration to avoid a combinatorial explosions of string comparisons and a high number of false positives.
In the evaluation of the first version of AGORA, we identified the
invariants of the arithmetic comparison category as the cause of 3 out of every 4 false positives. Specifically, most
false positives were invariants comparing two independent output properties. For example, the invariant
return.duration_ms > size(return.artists[])
states that the duration of a song in milliseconds should always be
greater than the number of artists in the song. While this may be true in most cases, it clearly does not represent the
expected API behaviour and is considered a false positive. Finding counterexamples to automatically rule out these false
positives is extremely unlikely, and the number of reported false positives can become unbearable if the API has several
numerical response fields.
To address this, AGORA integrates a novel heuristic to minimize false positives derived from arithmetic comparisons based on the following observation: parameters with a different value range are unlikely to be related. In the previous example, for instance, the duration of a song in milliseconds may oscillate in the order of thousands, whereas the number of artists of a song may vary from 1 to around 10.
In practice, we propose a heuristic that discards the invariants that represent arithmetic inequalities
(i.e., x < y
, x <= y
, x > y
and x >= y
) if there is no overlap between the values of the two variables
involved. Specifically, we modified a total of 32 Daikon invariants so that invariants are not reported if the maximum
observed value for one of the involved variables (e.g., 10 artists) is less than the minimum observed value of the other
variable (e.g., 30K milliseconds).
This heuristic is enabled by default, to disable it, set the boolean parameter use_agora_pp
of the modified version
of Daikon to false.
-
IsUrl: Indicates that the value of a string variable is always a URL. Represented as
x is Url
. -
SequenceStringElementsAreUrl: Indicates that all elements of an array of strings are URLs. Represented as
All the elements of x are URLs
. -
FixedLengthString: Indicates that the value of a string variable always has a fixed length n. Represented as
LENGTH(x)==n
. -
SequenceFixedLengthString: Indicates that all the elements of an array of strings have a fixed length n. Represented as
All the elements of x have LENGTH=n
. -
IsNumeric: Indicates that the characters of a string variable are always numeric. Represented as
x is Numeric
. -
SequenceStringElementsAreNumeric: Indicates that the characters of all the elements of an array of strings are always numeric. Represented as
All the elements of x are Numeric
. -
IsEmail: Indicates that the value of a string type variable is always an email. Represented as
x is Email
. -
SequenceStringElementsAreEmail: Indicates that all elements of an array of strings are emails. Represented as
All the elements of x are emails
. -
IsDateDDMMYYYY: Indicates that the value of a string variable is always a date following the format DD/MM/YYYY (the separator can be “/” or “-“). Represented as
x is a Date. Format: DD/MM/YYYY
. -
SequenceStringElementsAreDateDDMMYYYY: Indicates that all the elements of an array of strings are dates following the format DD/MM/YYYY. Represented as
All the elements of x are dates
. Format: DD/MM/YYYY. -
IsDateMMDDYYYY: Indicates that the value of a string variable is always a date following the format MM/DD/YYYY (the separator can be “/” or “-“). Represented as
x is a Date. Format: MM/DD/YYYY
. -
SequenceStringElementsAreDateMMDDYYYY: Indicates that all the elements of an array of strings are dates following the format MM/DD/YYYY. Represented as
All the elements of x are dates
. Format: MM/DD/YYYY. -
IsDateYYYYMMDD: Indicates that the value of a string variable is always a date following the format YYYY/MM/DD (the separator can be “/” or “-“). Represented as
x is a Date. Format: YYYY/MM/DD
. -
SequenceStringElementsAreDateYYYYMMDD: Indicates that all the elements of an array of strings are dates following the format YYYY/MM/DD. Represented as
All the elements of x are dates. Format: YYYY/MM/DD
. -
IsHour: Indicates that the value of a string variable is always a time in 24-hour format. Represented as
x is Hour: HH:MM 24-hour format, optional leading 0
. -
SequenceStringElementsAreHour: Indicates that all elements of an array of strings are hours in 24-hour format. Represented as
All the elements of x are Hours: HH:MM 24-hour format, optional leading 0
. -
IsHourAMPM: Indicates that the value of a string variable is always a time in 12-hour format. Represented as
x is Hour: HH:MM 12-hour format, optional leading 0
. -
SequenceStringElementsAreHourAMPM: Indicates that all elements of an array of strings are hours in 12-hour format. Represented as
All the elements of x are Hours: HH:MM 12-hour format, optional leading 0, mandatory meridiems (AM/PM)
. -
IsHourWithSeconds: Indicates that the value of a string variable is always a time in 24-hour format, including seconds. Represented as
x is Hour: HH:MM:SS 24-hour format with optional leading 0
. -
SequenceStringElementsAreHourWithSeconds: Indicates that all elements of an array of strings are hours in 24-hour format, including seconds. Represented as
All the elements of x are Hours: HH:MM:SS 24-hour format with optional leading 0
. -
IsTimestampYYYYMMHHThhmmssmm: Indicates that the value of a string variable is always a timestamp. Represented as x is Timestamp. Format:
YYYY-MM-DDTHH:MM:SS.mmZ (Miliseconds are optional)
. -
SequenceStringElementsAreTimestampYYYYMMHHThhmmssmm: Indicates that all elements of an array of strings are timestamps. Represented as
All the elements of x are Timestamps. Format: YYYY-MM-DDTHH:MM:SS.mmZ (Miliseconds are optional)
.
Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.
- daikon.inv.unary.scalar.NonZero
- daikon.inv.unary.scalar.NonZeroFloat
- daikon.inv.unary.scalar.RangeInt.PowerOfTwo
- daikon.inv.unary.sequence.EltNonZero
- daikon.inv.unary.sequence.EltNonZeroFloat
- daikon.inv.unary.sequence.EltRangeInt.PowerOfTwo
- daikon.inv.binary.twoScalar.IntNonEqual
- daikon.inv.binary.twoScalar.FloatNonEqual
- daikon.inv.binary.twoScalar.LinearBinary
- daikon.inv.binary.twoScalar.LinearBinaryFloat
- daikon.inv.binary.twoString.StringNonEqual
- daikon.inv.binary.twoString.StringLessThan
- daikon.inv.binary.twoString.StringGreaterThan
- daikon.inv.binary.twoString.StringLessEqual
- daikon.inv.binary.twoString.StringGreaterEqual
- daikon.inv.binary.twoSequence.SeqSeqStringLessThan
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterThan
- daikon.inv.binary.twoSequence.SeqSeqStringLessEqual
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterEqual
- daikon.inv.binary.twoSequence.PairwiseStringLessThan
- daikon.inv.binary.twoSequence.PairwiseStringGreaterThan
- daikon.inv.binary.twoSequence.PairwiseStringLessEqual
- daikon.inv.binary.twoSequence.PairwiseStringGreaterEqual
- daikon.inv.ternary.threeScalar.LinearTernary
- daikon.inv.ternary.threeScalar.LinearTernaryFloat
- daikon.inv.binary.twoScalar.NumericInt.Divides
- daikon.inv.binary.twoScalar.NumericInt.Square
- daikon.inv.binary.twoScalar.NumericFloat.Divides
- daikon.inv.binary.twoScalar.NumericFloat.Square
- daikon.inv.binary.twoSequence.PairwiseNumericInt.Divides
- daikon.inv.binary.twoSequence.PairwiseNumericInt.Square
- daikon.inv.binary.twoSequence.PairwiseNumericFloat.Divides
- daikon.inv.binary.twoSequence.PairwiseNumericFloat.Square
- daikon.inv.binary.twoSequence.PairwiseLinearBinary
- daikon.inv.binary.twoSequence.PairwiseLinearBinaryFloat
Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.
-
daikon.inv.binary.twoString.StdString.SubString: We applied two modifications on this invariant:
-
We added a property that sets a minimum string length (2 by default) in order to avoid reporting that a string that is always empty is always a substring of every other string in the program point.
-
This invariant may report redundant information that would bloat our results, for example, if x is a substring of y and y is a substring of z, Daikon would report the invariants: “x is substring of y”, “y is substring of z” and “x is substring of z”. In this situation we consider the “x is substring of z” invariant redundant and modified Daikon so it would not report redundant invariants, in order to avoid bloating the results.
-
-
daikon.inv.binary.twoSequence.SubSequence
-
daikon.inv.binary.twoSequence.SubSequenceFloat
-
daikon.inv.binary.twoSequence.SubSet
-
daikon.inv.binary.twoSequence.SubSetFloat
-
daikon.inv.binary.twoSequence.SuperSequence
-
daikon.inv.binary.twoSequence.SuperSequenceFloat
-
daikon.inv.binary.twoSequence.SuperSet
-
daikon.inv.binary.twoSequence.SuperSetFloat
At the end of the Daikon execution, the computeConfidence
function is executed for every invariant that has not been falsified, returning a number between 0 and 1.
If this number achieves a certain threshold, the invariant is considered to be statistically justified and it is reported to the user. We found some invariants for which
Daikon always returned a confidence of 1.
protected double computeConfidence() {
return CONFIDENCE_JUSTIFIED;
}
We modified these functions to return 1 only if there was at least one occurrence of the invariant during the program execution.
protected double computeConfidence() {
if (ppt.num_samples() == 0) {
return CONFIDENCE_UNJUSTIFIED;
}
return CONFIDENCE_JUSTIFIED;
}
These are the invariants for which we modified the computeConfidence
function:
- Unary:
- Scalar:
- RangeFloat
- RangeInt
- Sequence:
- EltRangeFloat
- EltRangeInt
- Scalar:
- Binary:
- SequenceString:
- MemberString
- SequenceScalar:
- Member
- MemberFloat
- TwoSequence:
- Reverse
- ReverseFloat
- SubSequence
- SubSequenceFloat
- Subset
- SequenceString:
A derived variable is an expression that does not appear directly in the instrumented program,
but which Daikon uses as a variable for invariant detection.
For example, for array variables, Daikon generates a derived variable whose value is its size,
allowing to detect invariants such as input.limit >= size(return.items)
.
However, there are derived variables that do not provide any relevant information, this is the case of the derived variable
orig()
, which indicates the value of a variable at the input program point and is used to make comparisons with
the value of the same variable at the output program point. In the context of black box testing of RESTful APIs, the value
of the input parameters does not change at any time, so Daikon would only detect equality invariants between the parameter
value at the input and output (input.limit == orig(input.limit)
) that will always be satisfied.
For this reason, this derived variable has been disabled.
In some cases, Daikon applies the shift
derived variable (consisting on decreasing the value of a numerical variable by a shift value)
to try to find relations between numerical variables (e.g., input-1 > return
), we disabled this derived variable because it would most likely
report misleading information.
Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.
- daikon.inv.binary.twoString.StringNonEqual
- daikon.inv.binary.twoString.StringLessThan
- daikon.inv.binary.twoString.StringGreaterThan
- daikon.inv.binary.twoString.StringLessEqual
- daikon.inv.binary.twoString.StringGreaterEqual
- daikon.inv.binary.twoSequence.SeqSeqStringLessThan
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterThan
- daikon.inv.binary.twoSequence.SeqSeqStringLessEqual
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterEqual
- daikon.inv.binary.twoSequence.PairwiseStringLessThan
- daikon.inv.binary.twoSequence.PairwiseStringGreaterThan
- daikon.inv.binary.twoSequence.PairwiseStringLessEqual
- daikon.inv.binary.twoSequence.PairwiseStringGreaterEqual