At v0.0.3 and prior, the underlying tree data structure was modeled as a collection of nodes. Specifically, the following object types were used:
core.Node
with membersparent
- another node orNone
if this is a rootconstraints
- constraints associated with the branch to the nodemodel
- an object that implements apredict
method.- The model is also used for linking to the children of a node using a specialized class,
core.ChildSelector
. AChildSelector
:- Holds an array of children
- Implements the logic of checking each child's conditions when attempting to
predict
.
- The model is also used for linking to the children of a node using a specialized class,
This design was elegant in its simplicity of definitions. Particularly, defining prediction in terms or recursion simplifies implementation and makes it easy to implement model trees.
In attempting to reimplement Trepan and other prior model explanation methods, it has become clear that there are certain needs not met by the v0.0.3 data strucutre. These needs are:
- Ability to efficiently keep track of the number of nodes
- Querying depth
- Attaching auxillary information to nodes
- In addition to the constraints on the path to the node
- For Trepan:
- Training set samples that reach the node
- Pointer to sample generator
- Fidelity and Reach estimates
- Ability to pretty-print the tree
We've considered the anytree
package, the followind doesn't quite fit our use case:
- Querying depth/number of nodes is still a tree walk
- Nodes have names as text
Going forward, we'll be using our own tree data structure, defined in the tree
subpackage.