Skip to content

Commit

Permalink
Fix issue with negative leaf depth (#57)
Browse files Browse the repository at this point in the history
* Update dev from master (#47)

* Working version of insert_point

* Fix bug with query and insertpoint

* Cleanup disp and codisp functions

* Reorder methods

* Insertpoint operational, add docstrings

* Small fix

* Remove old files

* Allow empty tree; handle duplicates in insert_point and forget_point

* Account for duplicates in tree construction

* Add ability to print tree

* Update docstrings and minor fixes

* Docstring fix

* Minor fixes

* Bugfix for forget_point

* Add image

* Update readme

* Update README.md

* Type check for point

* Update README.md

* Update README.md

* Store bounding boxes

* Minor changes

* Fix bbox bug, add unit tests

* Add support for CI

* Remove Python 3.7

* Update README.md

* Fix duplicate precision bug

* Fix duplicates issue?

* Use new indexing strategy with forget_point

* Update n bug

* Return 0 for codisp if leaf is root

* Add efficient shingle

* Add sine wave image

* Update README.md

* Minor cleanup

* sklearn test

* Add classification notebook

* Updated gitignore

* Edit classification notebook

* taxi data test

* removed swamp

* taxi data 200 tree run

* IF test

* Add OC-SVM example with sine wave

* Add OC-SVM example with taxi data

* Add IF example with sine wave

* Minor changes

* Minor updates

* Delete old sine_ocsvm_test notebook

* Delete old taxi_ocsvm_test notebook

* Delete old sine_if notebook

* Add IF example with sine wave

* Add OC-SVM example with sine wave

* Add OC-SVM example with taxi data

* Delete old taxi_ocsvm notebook

* Add OC-SVM example with taxi data

* sine wave comaprasion

* table1 notebook

* rrcf notebook

* renamed rrcf

* taxi data if

* Fix shingle bug; clean up classification example

* Set theme jekyll-theme-minimal

* Add index labels

* Update batch image

* Update README.md

* Update README.md

* Update README.md

* Create _config.yml

* Create default.html

* Create index.md

* Update index.md

* Create nav.html

* Create nav.yml

* Create tree-construction.html

* Rename tree-construction.html to tree-construction.md

* Create insert-and-delete.md

* Create anomaly-scoring.md

* Create batch.md

* Create streaming.md

* Update tree-construction.md

* Update tree-construction.md

* Update tree-construction.md

* Update tree-construction.md

* Update tree-construction.md

* Update tree-construction.md

* Update insert-and-delete.md

* Update insert-and-delete.md

* Update tree-construction.md

* Update insert-and-delete.md

* Update insert-and-delete.md

* Update anomaly-scoring.md

* Create related-work.md

* Update nav.yml

* Update related-work.md

* Create random-cut-tree.md

* Update nav.yml

* Create modifying-rctree.md

* Update nav.yml

* Update random-cut-tree.md

* Update random-cut-tree.md

* Update random-cut-tree.md

* Update random-cut-tree.md

* Update modifying-rctree.md

* Update random-cut-tree.md

* Create scoring-rctree.md

* Update nav.yml

* Update scoring-rctree.md

* Update README.md

* Update README.md

* Update index.md

* Update index.md

* Update index.md

* Create paper.md

* Create paper.bib

* Add files via upload

* Update paper.md

* Update README.md

* Delete figure_1.png

* Add files via upload

* Update paper.md

* Add files via upload

* Create taxi.md

* Update nav.yml

* Update batch.md

* Update streaming.md

* Update streaming.md

* Update streaming.md

* Update streaming.md

* Update taxi.md

* Update related-work.md

* Update related-work.md

* Update related-work.md

* Update tree-construction.md

* Update insert-and-delete.md

* Update anomaly-scoring.md

* Update related-work.md

* Update tree-construction.md

* Update insert-and-delete.md

* Update taxi.md

* Update README.md

* Update batch.md

* Update streaming.md

* Updates to authors

* Update paper

* updated abhi orcid

* Create rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update nav.yml

* Update rctree-api.md

* Update rctree-api.md

* Update docstrings

* Update rctree-api.md

* Update rctree-api.md

* Update anomaly-scoring.md

* Update random-cut-tree.md

* Update rctree-api.md

* Update related-work.md

* Update rctree-api.md

* Update setup.py

* Update __init__.py

* Update paper.md

* Update paper.md

* JOSS review updates (#48)

* Working version of insert_point

* Fix bug with query and insertpoint

* Cleanup disp and codisp functions

* Reorder methods

* Insertpoint operational, add docstrings

* Small fix

* Remove old files

* Allow empty tree; handle duplicates in insert_point and forget_point

* Account for duplicates in tree construction

* Add ability to print tree

* Update docstrings and minor fixes

* Docstring fix

* Minor fixes

* Bugfix for forget_point

* Add image

* Update readme

* Update README.md

* Type check for point

* Update README.md

* Update README.md

* Store bounding boxes

* Minor changes

* Fix bbox bug, add unit tests

* Add support for CI

* Remove Python 3.7

* Update README.md

* Fix duplicate precision bug

* Fix duplicates issue?

* Use new indexing strategy with forget_point

* Update n bug

* Return 0 for codisp if leaf is root

* Add efficient shingle

* Add sine wave image

* Update README.md

* Minor cleanup

* sklearn test

* Add classification notebook

* Updated gitignore

* Edit classification notebook

* taxi data test

* removed swamp

* taxi data 200 tree run

* IF test

* Add OC-SVM example with sine wave

* Add OC-SVM example with taxi data

* Add IF example with sine wave

* Minor changes

* Minor updates

* Delete old sine_ocsvm_test notebook

* Delete old taxi_ocsvm_test notebook

* Delete old sine_if notebook

* Add IF example with sine wave

* Add OC-SVM example with sine wave

* Add OC-SVM example with taxi data

* Delete old taxi_ocsvm notebook

* Add OC-SVM example with taxi data

* sine wave comaprasion

* table1 notebook

* rrcf notebook

* renamed rrcf

* taxi data if

* Fix shingle bug; clean up classification example

* Set theme jekyll-theme-minimal

* Add index labels

* Update batch image

* Update README.md

* Update README.md

* Update README.md

* Create _config.yml

* Create default.html

* Create index.md

* Update index.md

* Create nav.html

* Create nav.yml

* Create tree-construction.html

* Rename tree-construction.html to tree-construction.md

* Create insert-and-delete.md

* Create anomaly-scoring.md

* Create batch.md

* Create streaming.md

* Update tree-construction.md

* Update tree-construction.md

* Update tree-construction.md

* Update tree-construction.md

* Update tree-construction.md

* Update tree-construction.md

* Update insert-and-delete.md

* Update insert-and-delete.md

* Update tree-construction.md

* Update insert-and-delete.md

* Update insert-and-delete.md

* Update anomaly-scoring.md

* Create related-work.md

* Update nav.yml

* Update related-work.md

* Create random-cut-tree.md

* Update nav.yml

* Create modifying-rctree.md

* Update nav.yml

* Update random-cut-tree.md

* Update random-cut-tree.md

* Update random-cut-tree.md

* Update random-cut-tree.md

* Update modifying-rctree.md

* Update random-cut-tree.md

* Create scoring-rctree.md

* Update nav.yml

* Update scoring-rctree.md

* Update README.md

* Update README.md

* Update index.md

* Update index.md

* Update index.md

* Create paper.md

* Create paper.bib

* Add files via upload

* Update paper.md

* Update README.md

* Delete figure_1.png

* Add files via upload

* Update paper.md

* Add files via upload

* Create taxi.md

* Update nav.yml

* Update batch.md

* Update streaming.md

* Update streaming.md

* Update streaming.md

* Update streaming.md

* Update taxi.md

* Update related-work.md

* Update related-work.md

* Update related-work.md

* Update tree-construction.md

* Update insert-and-delete.md

* Update anomaly-scoring.md

* Update related-work.md

* Update tree-construction.md

* Update insert-and-delete.md

* Update taxi.md

* Update README.md

* Update batch.md

* Update streaming.md

* Updates to authors

* Update paper

* updated abhi orcid

* Create rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update rctree-api.md

* Update nav.yml

* Update rctree-api.md

* Update rctree-api.md

* Update docstrings

* Update rctree-api.md

* Update rctree-api.md

* Update anomaly-scoring.md

* Update random-cut-tree.md

* Update rctree-api.md

* Update related-work.md

* Update rctree-api.md

* Update setup.py

* Update __init__.py

* Update paper.md

* Update paper.md

* Update README.md

* Update paper.bib

* Update paper.md

* Update paper.md

* JOSS review suggested changes

* Add license badge

* Move installation instructions; list dependencies

* Add coveralls support

* Add classification and comparison to docs

* Move notebook material into documentation

* Add example data

* Update data locations in docs

* Fix error with coveralls build

* Add coveralls badge

* Add init for pytest

* Increase test coverage

* Add version numbers to dependencies

* Update README.md

* Update index.md

* Add caveats documentation

* Minor edit to caveats

* Fix spacing it comparisons documentation

* Spacing updates to docs

* Update version post-JOSS review

* Fix capitalization

* Add JOSS badge

* Added random state to the constructor of RCTree (#55)

* added 'seed' constructor parameter for the random number generation

* added proper random state to RCTree class

* added random_state to docstring, wrote a test to increase coverage

* Fix issue with negative depth

* Add test for insert edge case
  • Loading branch information
mdbartos authored May 23, 2019
1 parent 317e325 commit 0b4eb01
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 7 deletions.
32 changes: 25 additions & 7 deletions rrcf/rrcf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ class RCTree:
X: np.ndarray (n x d) (optional)
Array containing n data points, each with dimension d.
If no data provided, an empty tree is created.
random_state: int, RandomState instance or None (optional) (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used by np.random.
Attributes:
-----------
Expand Down Expand Up @@ -56,7 +60,15 @@ class RCTree:
>>> tree.forget_point(100)
"""

def __init__(self, X=None, index_labels=None, precision=9):
def __init__(self, X=None, index_labels=None, precision=9,
random_state=None):
# Random number generation with provided seed
if isinstance(random_state, int):
self.rng = np.random.RandomState(random_state)
elif isinstance(random_state, np.random.RandomState):
self.rng = random_state
else:
self.rng = np.random
# Initialize dict for leaves
self.leaves = {}
# Initialize tree root
Expand Down Expand Up @@ -134,9 +146,9 @@ def _cut(self, X, S, parent=None, side='l'):
l = xmax - xmin
l /= l.sum()
# Determine dimension to cut
q = np.random.choice(self.ndim, p=l)
q = self.rng.choice(self.ndim, p=l)
# Determine value for split
p = np.random.uniform(xmin[q], xmax[q])
p = self.rng.uniform(xmin[q], xmax[q])
# Determine subset of points to left
S1 = (X[:, q] <= p) & (S)
# Determine subset of points to right
Expand Down Expand Up @@ -338,11 +350,12 @@ def forget_point(self, index):
del parent
# Set sibling as new root
sibling.u = None
if isinstance(sibling, Leaf):
sibling.d = 0
self.root = sibling
# Update depths
self.map_leaves(sibling, op=self._increment_depth, inc=-1)
if isinstance(sibling, Leaf):
sibling.d = 0
else:
self.map_leaves(sibling, op=self._increment_depth, inc=-1)
return self.leaves.pop(index)
# Find grandparent
grandparent = parent.u
Expand Down Expand Up @@ -429,6 +442,7 @@ def insert_point(self, point, index, tolerance=None):
parent = node.u
maxdepth = max([leaf.d for leaf in self.leaves.values()])
depth = 0
branch = None
for _ in range(maxdepth + 1):
bbox = node.b
cut_dimension, cut = self._insert_point_cut(point, bbox)
Expand All @@ -452,6 +466,10 @@ def insert_point(self, point, index, tolerance=None):
parent = node
node = node.r
side = 'r'
try:
assert branch is not None
except:
raise AssertionError('Error with program logic: a cut was not found.')
# Set parent of new leaf and old branch
node.u = branch
leaf.u = branch
Expand Down Expand Up @@ -834,7 +852,7 @@ def _insert_point_cut(self, point, bbox):
bbox_hat[-1, :] = np.maximum(bbox[-1, :], point)
b_span = bbox_hat[-1, :] - bbox_hat[0, :]
b_range = b_span.sum()
r = np.random.uniform(0, b_range)
r = self.rng.uniform(0, b_range)
span_sum = np.cumsum(b_span)
cut_dimension = np.inf
for j in range(len(span_sum)):
Expand Down
21 changes: 21 additions & 0 deletions test/test_rrcf.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
tree = rrcf.RCTree(X)
duplicate_tree = rrcf.RCTree(Z)

tree_seeded = rrcf.RCTree(random_state=0)
duplicate_tree_seeded = rrcf.RCTree(random_state=np.random.RandomState(0))

deck = np.arange(n, dtype=int)
np.random.shuffle(deck)
indexes = deck[:5]
Expand Down Expand Up @@ -128,3 +131,21 @@ def test_shingle():
step_0 = next(shingle)
step_1 = next(shingle)
assert (step_0[1] == step_1[0]).all()

def test_random_state():
# The two trees should have the exact same random-cuts
points = np.random.uniform(size=(100, 5))
for idx, point in enumerate(points):
tree_seeded.insert_point(point, idx)
duplicate_tree_seeded.insert_point(point, idx)
assert str(tree_seeded) == str(duplicate_tree_seeded)

def test_insert_depth():
tree = rrcf.RCTree()
tree.insert_point([0., 0.], index=0)
tree.insert_point([0., 0.], index=1)
tree.insert_point([0., 0.], index=2)
tree.insert_point([0., 1.], index=3)
tree.forget_point(index=3)
min_depth = min(leaf.d for leaf in tree.leaves.values())
assert min_depth >= 0

0 comments on commit 0b4eb01

Please sign in to comment.