Updates to documentation, unit tests and version (#52)

* Update dev from master (#47) * Working version of insert_point * Fix bug with query and insertpoint * Cleanup disp and codisp functions * Reorder methods * Insertpoint operational, add docstrings * Small fix * Remove old files * Allow empty tree; handle duplicates in insert_point and forget_point * Account for duplicates in tree construction * Add ability to print tree * Update docstrings and minor fixes * Docstring fix * Minor fixes * Bugfix for forget_point * Add image * Update readme * Update README.md * Type check for point * Update README.md * Update README.md * Store bounding boxes * Minor changes * Fix bbox bug, add unit tests * Add support for CI * Remove Python 3.7 * Update README.md * Fix duplicate precision bug * Fix duplicates issue? * Use new indexing strategy with forget_point * Update n bug * Return 0 for codisp if leaf is root * Add efficient shingle * Add sine wave image * Update README.md * Minor cleanup * sklearn test * Add classification notebook * Updated gitignore * Edit classification notebook * taxi data test * removed swamp * taxi data 200 tree run * IF test * Add OC-SVM example with sine wave * Add OC-SVM example with taxi data * Add IF example with sine wave * Minor changes * Minor updates * Delete old sine_ocsvm_test notebook * Delete old taxi_ocsvm_test notebook * Delete old sine_if notebook * Add IF example with sine wave * Add OC-SVM example with sine wave * Add OC-SVM example with taxi data * Delete old taxi_ocsvm notebook * Add OC-SVM example with taxi data * sine wave comaprasion * table1 notebook * rrcf notebook * renamed rrcf * taxi data if * Fix shingle bug; clean up classification example * Set theme jekyll-theme-minimal * Add index labels * Update batch image * Update README.md * Update README.md * Update README.md * Create _config.yml * Create default.html * Create index.md * Update index.md * Create nav.html * Create nav.yml * Create tree-construction.html * Rename tree-construction.html to tree-construction.md * Create insert-and-delete.md * Create anomaly-scoring.md * Create batch.md * Create streaming.md * Update tree-construction.md * Update tree-construction.md * Update tree-construction.md * Update tree-construction.md * Update tree-construction.md * Update tree-construction.md * Update insert-and-delete.md * Update insert-and-delete.md * Update tree-construction.md * Update insert-and-delete.md * Update insert-and-delete.md * Update anomaly-scoring.md * Create related-work.md * Update nav.yml * Update related-work.md * Create random-cut-tree.md * Update nav.yml * Create modifying-rctree.md * Update nav.yml * Update random-cut-tree.md * Update random-cut-tree.md * Update random-cut-tree.md * Update random-cut-tree.md * Update modifying-rctree.md * Update random-cut-tree.md * Create scoring-rctree.md * Update nav.yml * Update scoring-rctree.md * Update README.md * Update README.md * Update index.md * Update index.md * Update index.md * Create paper.md * Create paper.bib * Add files via upload * Update paper.md * Update README.md * Delete figure_1.png * Add files via upload * Update paper.md * Add files via upload * Create taxi.md * Update nav.yml * Update batch.md * Update streaming.md * Update streaming.md * Update streaming.md * Update streaming.md * Update taxi.md * Update related-work.md * Update related-work.md * Update related-work.md * Update tree-construction.md * Update insert-and-delete.md * Update anomaly-scoring.md * Update related-work.md * Update tree-construction.md * Update insert-and-delete.md * Update taxi.md * Update README.md * Update batch.md * Update streaming.md * Updates to authors * Update paper * updated abhi orcid * Create rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update nav.yml * Update rctree-api.md * Update rctree-api.md * Update docstrings * Update rctree-api.md * Update rctree-api.md * Update anomaly-scoring.md * Update random-cut-tree.md * Update rctree-api.md * Update related-work.md * Update rctree-api.md * Update setup.py * Update __init__.py * Update paper.md * Update paper.md * JOSS review updates (#48) * Working version of insert_point * Fix bug with query and insertpoint * Cleanup disp and codisp functions * Reorder methods * Insertpoint operational, add docstrings * Small fix * Remove old files * Allow empty tree; handle duplicates in insert_point and forget_point * Account for duplicates in tree construction * Add ability to print tree * Update docstrings and minor fixes * Docstring fix * Minor fixes * Bugfix for forget_point * Add image * Update readme * Update README.md * Type check for point * Update README.md * Update README.md * Store bounding boxes * Minor changes * Fix bbox bug, add unit tests * Add support for CI * Remove Python 3.7 * Update README.md * Fix duplicate precision bug * Fix duplicates issue? * Use new indexing strategy with forget_point * Update n bug * Return 0 for codisp if leaf is root * Add efficient shingle * Add sine wave image * Update README.md * Minor cleanup * sklearn test * Add classification notebook * Updated gitignore * Edit classification notebook * taxi data test * removed swamp * taxi data 200 tree run * IF test * Add OC-SVM example with sine wave * Add OC-SVM example with taxi data * Add IF example with sine wave * Minor changes * Minor updates * Delete old sine_ocsvm_test notebook * Delete old taxi_ocsvm_test notebook * Delete old sine_if notebook * Add IF example with sine wave * Add OC-SVM example with sine wave * Add OC-SVM example with taxi data * Delete old taxi_ocsvm notebook * Add OC-SVM example with taxi data * sine wave comaprasion * table1 notebook * rrcf notebook * renamed rrcf * taxi data if * Fix shingle bug; clean up classification example * Set theme jekyll-theme-minimal * Add index labels * Update batch image * Update README.md * Update README.md * Update README.md * Create _config.yml * Create default.html * Create index.md * Update index.md * Create nav.html * Create nav.yml * Create tree-construction.html * Rename tree-construction.html to tree-construction.md * Create insert-and-delete.md * Create anomaly-scoring.md * Create batch.md * Create streaming.md * Update tree-construction.md * Update tree-construction.md * Update tree-construction.md * Update tree-construction.md * Update tree-construction.md * Update tree-construction.md * Update insert-and-delete.md * Update insert-and-delete.md * Update tree-construction.md * Update insert-and-delete.md * Update insert-and-delete.md * Update anomaly-scoring.md * Create related-work.md * Update nav.yml * Update related-work.md * Create random-cut-tree.md * Update nav.yml * Create modifying-rctree.md * Update nav.yml * Update random-cut-tree.md * Update random-cut-tree.md * Update random-cut-tree.md * Update random-cut-tree.md * Update modifying-rctree.md * Update random-cut-tree.md * Create scoring-rctree.md * Update nav.yml * Update scoring-rctree.md * Update README.md * Update README.md * Update index.md * Update index.md * Update index.md * Create paper.md * Create paper.bib * Add files via upload * Update paper.md * Update README.md * Delete figure_1.png * Add files via upload * Update paper.md * Add files via upload * Create taxi.md * Update nav.yml * Update batch.md * Update streaming.md * Update streaming.md * Update streaming.md * Update streaming.md * Update taxi.md * Update related-work.md * Update related-work.md * Update related-work.md * Update tree-construction.md * Update insert-and-delete.md * Update anomaly-scoring.md * Update related-work.md * Update tree-construction.md * Update insert-and-delete.md * Update taxi.md * Update README.md * Update batch.md * Update streaming.md * Updates to authors * Update paper * updated abhi orcid * Create rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update rctree-api.md * Update nav.yml * Update rctree-api.md * Update rctree-api.md * Update docstrings * Update rctree-api.md * Update rctree-api.md * Update anomaly-scoring.md * Update random-cut-tree.md * Update rctree-api.md * Update related-work.md * Update rctree-api.md * Update setup.py * Update __init__.py * Update paper.md * Update paper.md * Update README.md * Update paper.bib * Update paper.md * Update paper.md * JOSS review suggested changes * Add license badge * Move installation instructions; list dependencies * Add coveralls support * Add classification and comparison to docs * Move notebook material into documentation * Add example data * Update data locations in docs * Fix error with coveralls build * Add coveralls badge * Add init for pytest * Increase test coverage * Add version numbers to dependencies * Update README.md * Update index.md * Add caveats documentation * Minor edit to caveats * Fix spacing it comparisons documentation * Spacing updates to docs * Update version post-JOSS review
kLabUM · Mar 26, 2019 · f58e330 · f58e330
1 parent e73d830
commit f58e330
Show file tree

Hide file tree

Showing 8 changed files with 219 additions and 64 deletions.
diff --git a/README.md b/README.md
@@ -39,14 +39,16 @@ Currently, only Python 3 is supported.
 
 The following dependencies are *required* to install and use `rrcf`:
 
-- [numpy](http://www.numpy.org/)
+- [numpy](http://www.numpy.org/) (>= 1.15)
 
 The following *optional* dependencies are required to run the examples shown in the documentation:
 
-- [pandas](https://pandas.pydata.org/)
-- [scipy](https://www.scipy.org/)
-- [scikit-learn](https://scikit-learn.org/stable/)
-- [matplotlib](https://matplotlib.org/)
+- [pandas](https://pandas.pydata.org/) (>= 0.23)
+- [scipy](https://www.scipy.org/) (>= 1.2)
+- [scikit-learn](https://scikit-learn.org/stable/) (>= 0.20)
+- [matplotlib](https://matplotlib.org/) (>= 3.0)
+
+Listed version numbers have been tested and are known to work (this does not necessarily preclude older versions).
 
 ## Robust random cut trees
 
@@ -234,4 +236,36 @@ for index, point in enumerate(points):
 
 ## Contributing
 
-To contribute, submit a pull request to the `dev` branch.
+We welcome contributions to the `rrcf` repo. To contribute, submit a [pull request](https://help.github.com/en/articles/about-pull-requests) to the `dev` branch.
+
+#### Types of contributions
+
+Some suggested types of contributions include:
+
+- Bug fixes
+- Documentation improvements
+- Performance enhancements
+- Extensions to the algorithm
+
+Check the issue tracker for any specific issues that need help. If you encounter a problem using `rrcf`, or have an idea for an extension, feel free to raise an issue.
+
+#### Guidelines for contributors
+
+Please consider the following guidelines when contributing to the codebase:
+
+- Ensure that any new methods, functions or classes include docstrings. Docstrings should include a description of the code, as well as descriptions of the inputs (arguments) and outputs (returns). Providing an example use case is recommended (see existing methods for examples).
+- Write unit tests for any new code and ensure that all tests are passing with no warnings. Please ensure that overall code coverage does not drop below 80%.
+
+#### Running unit tests
+
+To run unit tests, first ensure that `pytest` and `pytest-cov` are installed:
+
+```
+$ pip install pytest pytest-cov
+```
+
+To run the tests, navigate to the root directory of the repo and run:
+
+```
+$ pytest --cov=rrcf/
+```
diff --git a/docs/_data/nav.yml b/docs/_data/nav.yml
@@ -19,6 +19,8 @@ toc:
         url: /rrcf/scoring-rctree.html
       - page: API documentation
         url: /rrcf/rctree-api.html
+      - page: Caveats and gotchas
+        url: /rrcf/caveats.html
   - title: Examples
     subfolderitems:
       - page: Batch detection

diff --git a/docs/caveats.md b/docs/caveats.md
@@ -0,0 +1,27 @@
+# Caveats and gotchas
+
+## Scaling of dimensions
+
+The RRCF algorithm considers the relative scale of each dimension when constructing robust random cut trees. This means that dimensions with less variability (on an absolute scale) will affect the outlier score of a point less than dimensions with higher variability.
+
+This consideration is important to remember if each dimension represents a different categorical property or is measured with a different set of units. Consider, for example the following dataset.
+
+| Person      | Height (in)  | Weight (lb)   | Age (yr)    | 
+| ------------| ------------ | ------------- | ----------- |
+| Alice       | 61           | 105           | 34          |
+| Bob         | 70           | 300           | 50          |
+| Timmy       | 48           | 70            | 10          |
+| Nosferatu   | 75           | 180           | 170         |
+
+In this case, `Weight` will influence the outlier score most, because the range between the maximum and minimum values is largest (300 - 70 = 230). However, looking at the table, age seems like the most intuitive category for determining the outlier (in this case, Nosferatu is more than three times as old as the second-oldest person).
+
+In cases where each column is measured in different units, or measures a different type of quantity, it may be necessary to scale each column before constructing the random cut tree. For example, min-max scaling each column between zero and one yields:
+
+| Person      | Height (-)   | Weight (-)    | Age (-)     | 
+| ------------| ------------ | ------------- | ----------- |
+| Alice       | 0.48         | 0.15          | 0.15        |
+| Bob         | 0.81         | 1.0           | 0.25        |
+| Timmy       | 0.0          | 0.0           | 0.0         |
+| Nosferatu   | 1.0          | 0.48          | 1.0         |
+
+Other scaling methods may suit other datasets better (for instance, scaling each dimension to a mean of zero and a standard deviation of one). The user should experiment with different scalings to determine the method that works best for the task at hand.
diff --git a/docs/classification.md b/docs/classification.md
@@ -70,7 +70,8 @@ avg_codisp /= num_trees
 
 ```python
 predictions = np.argmin(avg_codisp, axis=1)
-test_error = 1 - ((predictions == labels).sum()/num_points)
+test_error = 1 - ((predictions == labels).sum()
+                  /num_points)
 print("Test error: {:.1f}%".format(100*test_error))
 ```
 
@@ -83,13 +84,19 @@ Test error: 0.0%
 ```python
 fig = plt.figure(figsize=(8,6))
 ax = fig.add_subplot(111, projection='3d')
-ax.scatter(X_0[:,0], X_0[:,1], X_0[:,2], c='0.5', alpha=0.3,
+ax.scatter(X_0[:,0], X_0[:,1], X_0[:,2],
+           c='0.5', alpha=0.3,
            label='Training data')
-ax.scatter(X_1[:,0], X_1[:,1], X_1[:,2], c='0.5', alpha=0.3)
-ax.scatter(x[predictions == 0][:,0], x[predictions == 0][:,1],
-           x[predictions == 0][:,2], c='b', label='Class 0')
-ax.scatter(x[predictions == 1][:,0], x[predictions == 1][:,1],
-           x[predictions == 1][:,2], c='r', label='Class 1')
+ax.scatter(X_1[:,0], X_1[:,1], X_1[:,2],
+           c='0.5', alpha=0.3)
+ax.scatter(x[predictions == 0][:,0],
+           x[predictions == 0][:,1],
+           x[predictions == 0][:,2],
+           c='b', label='Class 0')
+ax.scatter(x[predictions == 1][:,0],
+           x[predictions == 1][:,1],
+           x[predictions == 1][:,2],
+           c='r', label='Class 1')
 plt.title('Classification results', size=14)
 plt.legend(frameon=True)
 plt.tight_layout()
@@ -110,10 +117,10 @@ x = nuc['x'].astype(float).T
 y = nuc['y'].astype(float).T
 y = pd.Series({-1:0, 1:1})[y.ravel()].values
 
-plt.scatter(x[y == 0][:,0], x[y == 0][:,1], c='b', alpha=0.3,
-            label='Class 0')
-plt.scatter(x[y == 1][:,0], x[y == 1][:,1], c='r', alpha=0.3,
-            label='Class 1')
+plt.scatter(x[y == 0][:,0], x[y == 0][:,1],
+            c='b', alpha=0.3, label='Class 0')
+plt.scatter(x[y == 1][:,0], x[y == 1][:,1],
+            c='r', alpha=0.3, label='Class 1')
 plt.title('Original labeled data', size=14)
 plt.xlabel('Total energy')
 plt.ylabel('Tail energy')
@@ -134,8 +141,10 @@ d = 2
 num_trees = 60
 
 # Take random sample
-X_0 = x[np.random.choice(np.flatnonzero(y.ravel() == 0), size=n)]
-X_1 = x[np.random.choice(np.flatnonzero(y.ravel() == 1), size=n)]
+X_0 = x[np.random.choice(np.flatnonzero(y.ravel() == 0),
+                         size=n)]
+X_1 = x[np.random.choice(np.flatnonzero(y.ravel() == 1),
+                         size=n)]
 
 # Create random cut forests
 forest_0 = []
@@ -177,10 +186,13 @@ Test error: 9.0%
 
 ```python
 plt.scatter(X_0[:,0], X_0[:,1], c='0.5', alpha=0.1)
-plt.scatter(X_1[:,0], X_1[:,1], c='0.5', alpha=0.1, label='Training data')
-plt.scatter(x[ix][predictions == 0][:,0], x[ix][predictions == 0][:,1],
+plt.scatter(X_1[:,0], X_1[:,1], c='0.5', alpha=0.1,
+            label='Training data')
+plt.scatter(x[ix][predictions == 0][:,0],
+            x[ix][predictions == 0][:,1],
             c='b', alpha=0.4, label='Class 0')
-plt.scatter(x[ix][predictions == 1][:,0], x[ix][predictions == 1][:,1],
+plt.scatter(x[ix][predictions == 1][:,0],
+            x[ix][predictions == 1][:,1],
             c='r', alpha=0.4, label='Class 1')
 plt.title('Classified points', size=14)
 plt.xlabel('Total energy')
@@ -207,14 +219,18 @@ for _ in range(num_trees):
     forest_0.append(tree_0)
     forest_1.append(tree_1)
 
-points = np.vstack(np.dstack(np.meshgrid(np.linspace(0, 8, 100),
-                                         np.linspace(0, 1.4, 100))))
+    points = np.vstack(np.dstack(np.meshgrid(
+        np.linspace(0, 8, 100),
+        np.linspace(0, 1.4, 100))))
+
 avg_codisp = np.zeros((nn, d))
 
 for index in range(nn):
     for tree_0, tree_1 in zip(forest_0, forest_1):
-        tree_0.insert_point(points[index], index=n + index)
-        tree_1.insert_point(points[index], index=n + index)
+        tree_0.insert_point(points[index],
+                            index=n + index)
+        tree_1.insert_point(points[index],
+                            index=n + index)
         avg_codisp[index,0] += tree_0.codisp(n + index)
         avg_codisp[index,1] += tree_1.codisp(n + index)
         tree_0.forget_point(n + index)
@@ -227,9 +243,10 @@ avg_codisp /= num_trees
 
 ```python
 fig, ax = plt.subplots(figsize=(10,6))
-plt.imshow(-np.log(avg_codisp[:,1] / avg_codisp[:,0]).reshape(100, 100),
-           cmap='seismic', extent=(0, 8, 0, 1.4), origin='lower',
-           aspect='auto')
+plt.imshow(-np.log(avg_codisp[:,1] /
+                   avg_codisp[:,0]).reshape(100, 100),
+           cmap='seismic', extent=(0, 8, 0, 1.4),
+           origin='lower', aspect='auto')
 plt.colorbar(label='Log ratio of Class 1 Codisp to Class 0 Codisp')
 plt.grid('off')
 plt.title('Decision regions', size=16)
@@ -245,12 +262,14 @@ plt.tight_layout()
 
 ```python
 fig, ax = plt.subplots(figsize=(10,6))
-plt.imshow(np.log(np.min(avg_codisp, axis=1)).reshape(100, 100),
+plt.imshow(np.log(np.min(avg_codisp,
+                         axis=1)).reshape(100, 100),
            extent=(0, 8, 0, 1.4), origin='lower',
            aspect='auto', cmap='cubehelix_r')
 plt.colorbar(label='$\log(\min(CoDisp(x^{(0)}), CoDisp(x^{(1)})))$')
 plt.grid('off')
-plt.title('Likelihood of belonging to neither class', size=14)
+plt.title('Likelihood of belonging to neither class',
+          size=14)
 plt.xlabel('Total energy')
 plt.ylabel('Tail energy')
 plt.tight_layout()

diff --git a/docs/comparisons.md b/docs/comparisons.md
@@ -34,30 +34,42 @@ n_inliers = n_samples - n_outliers
 
 # Outlier detectors from sklean plot
 anomaly_algorithms = [
-    ("Robust covariance", EllipticEnvelope(contamination=outliers_fraction)),
-    ("One-Class SVM", svm.OneClassSVM(nu=outliers_fraction,
-                                      kernel="rbf",
-                                      gamma=0.1)),
-    ("Isolation Forest", IsolationForest(contamination=outliers_fraction,
-                                         behaviour='new')),
-    ("Local Outlier Factor", LocalOutlierFactor(n_neighbors=35,
-                                                contamination=outliers_fraction))]
+    ("Robust covariance",
+     EllipticEnvelope(contamination=outliers_fraction)),
+    ("One-Class SVM",
+     svm.OneClassSVM(nu=outliers_fraction,
+                     kernel="rbf",
+                     gamma=0.1)),
+    ("Isolation Forest",
+     IsolationForest(contamination=outliers_fraction,
+                     behaviour='new')),
+    ("Local Outlier Factor",
+     LocalOutlierFactor(n_neighbors=35,
+                        contamination=outliers_fraction))]
 
 # Define datasets
-blobs_params = dict(random_state=0, n_samples=n_inliers, n_features=2)
+blobs_params = dict(random_state=0,
+                    n_samples=n_inliers,
+                    n_features=2)
 datasets = [
-    make_blobs(centers=[[0, 0], [0, 0]], cluster_std=0.5,**blobs_params)[0],
-    make_blobs(centers=[[2, 2], [-2, -2]], cluster_std=[0.5, 0.5],**blobs_params)[0],
-    make_blobs(centers=[[2, 2], [-2, -2]], cluster_std=[1.5, .3],**blobs_params)[0],
-    4. * (make_moons(n_samples=n_samples, noise=.05, random_state=0)[0]
+    make_blobs(centers=[[0, 0], [0, 0]],
+               cluster_std=0.5,**blobs_params)[0],
+    make_blobs(centers=[[2, 2], [-2, -2]],
+               cluster_std=[0.5, 0.5],**blobs_params)[0],
+    make_blobs(centers=[[2, 2], [-2, -2]],
+               cluster_std=[1.5, .3],**blobs_params)[0],
+    4. * (make_moons(n_samples=n_samples,
+                     noise=.05, random_state=0)[0]
                      - np.array([0.5, 0.25])),
-    14. * (np.random.RandomState(42).rand(n_samples, 2) - 0.5)]
+    14. * (np.random.RandomState(42).rand(n_samples, 2)
+           - 0.5)]
 
 # Add outliers to the data sets
 outliers = []  # record keeping
 data = []
 for i in datasets:
-    out = rng.uniform(low=-6, high=6, size=(n_outliers, 2))
+    out = rng.uniform(low=-6, high=6,
+                      size=(n_outliers, 2))
     outliers.append(out)
     data.append(np.concatenate([i, out], axis=0))
 
@@ -75,10 +87,12 @@ for d in range(len(data)):
     tr1 = time.time()
     while len(forest) < num_trees:
         # Select random subsets of points uniformly from point set
-        ixs = np.random.choice(n, size=(n // tree_size, tree_size),
+        ixs = np.random.choice(n,
+                               size=(n // tree_size, tree_size),
                                replace=False)
         # Add sampled trees to forest
-        trees = [rrcf.RCTree(data[d][ix], index_labels=ix) for ix in ixs]
+        trees = [rrcf.RCTree(data[d][ix],
+                             index_labels=ix) for ix in ixs]
         forest.extend(trees)
 
     # Compute average CoDisp
@@ -99,20 +113,23 @@ for d in range(len(data)):
         t0 = time.time()
         algorithm.fit(data[d])
         t1 = time.time()
-        plt.subplot(5, len(anomaly_algorithms) + 1, plot_num)
+        plt.subplot(5, len(anomaly_algorithms) + 1,
+                    plot_num)
         if d == 0:
             plt.title(name, size=16)
 
         # fit the data and tag outliers
         if name == "Local Outlier Factor":
             y_pred = algorithm.fit_predict(data[d])
         else:
-            y_pred = algorithm.fit(data[d]).predict(data[d])
+            y_pred = (algorithm.fit(data[d])
+                      .predict(data[d]))
 
         colors = np.array(['#377eb8', '#ff7f00'])
         plt.scatter(data[d][:, 0], data[d][:, 1], s=10,
                     color=colors[(y_pred + 1) // 2])
-        plt.text(.99, .01, ('%.2fs' % (t1 - t0)).lstrip('0'),
+        plt.text(.99, .01,
+                 ('%.2fs' % (t1 - t0)).lstrip('0'),
                  transform=plt.gca().transAxes, size=15,
                  horizontalalignment='right')
         plot_num += 1
@@ -122,16 +139,17 @@ for d in range(len(data)):
     mask = np.percentile(avg_cod, 85)
     avg_cod[avg_cod < mask] = 1
     avg_cod[avg_cod > mask] = 0
-    c = ['#377eb8' if i == 0 else '#ff7f00' for i in avg_cod]
+    c = ['#377eb8' if i == 0 else '#ff7f00'
+         for i in avg_cod]
     plt.scatter(data[d][:,0], data[d][:,1], s=10, c=c)
     if d == 0:
         plt.title("RRCF", size=16)
 
-    plt.text(.99, .01, ('%.2fs' % (tr2 - tr1)).lstrip('0'),
-             transform=plt.gca().transAxes, size=15,
-             horizontalalignment='right')
+        plt.text(.99, .01,
+                 ('%.2fs' % (tr2 - tr1)).lstrip('0'),
+                  transform=plt.gca().transAxes, size=15,
+                  horizontalalignment='right')
     plot_num += 1
-plt.savefig('method_comparison.png', bbox_inches='tight')
 ```
 
 ![Comparison](https://s3.us-east-2.amazonaws.com/mdbartos-img/rrcf/method_comparison.png)