Merge pull request #64 from PythonPredictions/develop

develop to master - new version
PythonPredictions · Jul 12, 2021 · 4da81c6 · 4da81c6
2 parents 6ef6723 + 48d8a9a
commit 4da81c6
Show file tree

Hide file tree

Showing 22 changed files with 645 additions and 295 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,32 @@
+---
+name: Bug report
+about: Create a report to help us improve
+---
+
+<!-- Please search existing issues to avoid creating duplicates. -->
+
+# Bug Report
+
+Bug: X does not work when I do Y
+
+## Description
+
+Info about the bug goes here.
+
+### Steps to Reproduce
+
+1. Step 1
+2. Step 2
+3. ...
+
+### Expected Result
+
+I was expecting ...
+
+You may write the expected result or add a screenshot.
+
+### Actual Results
+
+I actually got ...
+
+Would be awesome to link screenshots here and/or error messages received.
diff --git a/.github/ISSUE_TEMPLATE/issue.md b/.github/ISSUE_TEMPLATE/issue.md
@@ -0,0 +1,14 @@
+---
+name: Task
+about: A small issue t. It will usually be labeled as `good first issue` or `enhancement`.
+---
+
+<!-- Issue title should mirror the Task Title. -->
+
+# Task Title
+
+Task: I am an Issue
+
+## Task Description
+
+This issue will...
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,16 @@
+# Story Title
+
+[This is the Issue Title](https://github.com/username/repository-name/issues/1)
+
+## Changes made
+
+- made this
+- did that
+
+## How does the solution address the problem
+
+This PR will...
+
+## Linked issues
+
+Resolves #1
diff --git a/.github/workflows/development_CI.yaml b/.github/workflows/development_CI.yaml
@@ -0,0 +1,38 @@
+# Runs CI when pushing to develop branch
+# runs pylint and pytest
+
+name: CI_develop_action
+
+on:
+  push:
+    branches: [ develop ]
+  pull_request:
+    branches: [ develop ]
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+
+    - name: Set up Python 3.8
+      uses: actions/setup-python@v2
+      with:
+        python-version: 3.8
+
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        python -m pip install -r requirements.txt
+        python -m pip install pylint pytest pytest-mock pytest-cov
+
+    - name: Test with pytest
+      run: |
+        pytest --cov=cobra tests/
+        
+    # until we refactor accordingly
+    #- name: Lint check with pylint
+    #  run: |
+    #    pylint cobra
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,4 @@
-#Ignoired directories in root folder
+#Ignored directories in root folder
 
 
 # Byte-compiled / optimized / DLL files
@@ -109,3 +109,4 @@ ENV/
 # Other ignore files
 *.pptx
 *.ppt
+.idea/
diff --git a/README.rst b/README.rst
@@ -1,87 +1,89 @@
-=====
-cobra
-=====
-
-**cobra** is a Python package to build predictive models using logistic regression with a focus on performance and interpretation. It consists of several modules for data preprocessing, feature selection and model evaluation. The underlying methodology was developed at Python Predictions in the course of hundreds of business-related prediction challenges. It has been tweaked, tested and optimized over the years based on feedback from clients, our team, and academic researchers.
-
-
-Main Features
-=============
-
-- Prepare a given pandas DataFrame for predictive modelling:
-
-   - partition into train/selection/validation sets
-   - create bins from continuous variables
-   - regroup categorical variables based on statistical significance
-   - replace missing values and
-   - add columns with incidence rate per category/bin
-
-- Perform univariate feature selection based on AUC
-- Compute correlation matrix of predictors
-- Find the suitable variables using forward feature selection
-- Evaluate model performance and visualize the results
-
-Getting started
-===============
-
-These instructions will get you a copy of the project up and running on your local machine for usage, development and testing purposes.
-
-Requirements
-------------
-
-This package requires the usual Python packages for data science:
-
-- numpy (>=1.19.4)
-- pandas (>=1.1.5)
-- scipy (>=1.5.4)
-- scikit-learn (>=0.23.1)
-- matplotlib (>=3.3.3)
-- seaborn (>=0.11.0)
-
-
-These packages, along with their versions are listed in ``requirements.txt`` and can be installed using ``pip``:    ::
-
-
-  pip install -r requirements.txt
-
-
-**Note**: if you want to install cobra with e.g. pip, you don't have to install all of these requirements as these are automatically installed with cobra itself.
-
-Installation
-------------
-
-The easiest way to install cobra is using ``pip``   ::
-
-  pip install -U pythonpredictions-cobra
-
-Contributing to cobra
-=====================
-
-We'd love you to contribute to the development of cobra! There are many ways in which you can contribute, the most common of which is to contribute to the source code or documentation of the project. However, there are many other ways you can contribute (report issues, improve code coverage by adding unit tests, ...).
-We use GitHub issue to track all bugs and feature requests. Feel free to open an issue in case you found a bug or in case you wish to see a new feature added.
-
-How to contribute code
-----------------------
-
-The preferred way to contribute to cobra is to fork the main repository on GitHub, then submit a "pull request" (PR). The first step is to get a local development copy by installing cobra from source through the following steps:
-
-- Fork the `project repository <https://github.com/PythonPredictions/cobra>`_. For more details on how to fork a repository see `this guide <https://docs.github.com/en/free-pro-team@latest/github/getting-started-with-github/fork-a-repo>`__
-- Clone your fork of cobra's repo.
-- Open a shell and navigate to the folder where this repo was cloned in.
-- Once you are in the folder, execute ``pip install --editable .``.
-- Create a *feature branch* to do your development.
-- Once your are finished developing, you can create a *pull request* from your fork (see `this guide <https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork>`__ for detailed instructions).
-
-**Notes**
-
-- Make sure to follow the *PEP 8* styleguide if you make any changes to cobra. You should also write or modify unit test for your changes.
-- To avoid duplicating work, it is highly recommended that you search through the issue tracker and/or the PR list. If in doubt, you can always reach out to us through email (cobra@pythonpredictions.com)
-
-Help and Support
-================
-
-Documentation
--------------
-
-- HTML documentation of the `individual modules <https://pythonpredictions.github.io/cobra.io/docstring/modules.html>`_
-- A step-by-step `tutorial <https://pythonpredictions.github.io/cobra.io/tutorial.html>`_
+
+
+.. image:: https://img.shields.io/pypi/v/pythonpredictions-cobra.svg
+    :target: https://pypi.org/project/pythonpredictions-cobra/
+.. image:: https://img.shields.io/pypi/dm/pythonpredictions-cobra.svg
+    :target: https://pypistats.org/packages/pythonpredictions-cobra
+.. image:: https://github.com/PythonPredictions/cobra/actions/workflows/development_CI.yaml/badge.svg?branch=develop
+    :target: https://github.com/PythonPredictions/cobra/actions/workflows/development_CI.yaml
+
+------------------------------------------------------------------------------------------------------------------------------------ 
+
+=====
+cobra
+=====
+.. image:: material\logo.png
+    :width: 300
+
+**cobra** is a Python package to build predictive models using linear/logistic regression with a focus on performance and interpretation. It consists of several modules for data preprocessing, feature selection and model evaluation. The underlying methodology was developed at Python Predictions in the course of hundreds of business-related prediction challenges. It has been tweaked, tested and optimized over the years based on feedback from clients, our team, and academic researchers.
+
+Main Features
+=============
+
+- Prepare a given pandas DataFrame for predictive modelling:
+
+   - partition into train/selection/validation sets
+   - create bins from continuous variables
+   - regroup categorical variables based on statistical significance
+   - replace missing values and
+   - add columns with incidence rate per category/bin
+
+- Perform univariate feature selection based on AUC
+- Compute correlation matrix of predictors
+- Find the suitable variables using forward feature selection
+- Evaluate model performance and visualize the results
+
+Getting started
+===============
+
+These instructions will get you a copy of the project up and running on your local machine for usage, development and testing purposes.
+
+Requirements
+------------
+
+This package requires the usual Python packages for data science:
+
+- numpy (>=1.19.4)
+- pandas (>=1.1.5)
+- scipy (>=1.5.4)
+- scikit-learn (>=0.23.1)
+- matplotlib (>=3.3.3)
+- seaborn (>=0.11.0)
+
+
+These packages, along with their versions are listed in ``requirements.txt`` and can be installed using ``pip``:    ::
+
+
+  pip install -r requirements.txt
+
+
+**Note**: if you want to install cobra with e.g. pip, you don't have to install all of these requirements as these are automatically installed with cobra itself.
+
+Installation
+------------
+
+The easiest way to install cobra is using ``pip``:    ::
+
+  pip install -U pythonpredictions-cobra
+
+Contributing to cobra
+=====================
+
+We'd love you to contribute to the development of cobra! There are many ways in which you can contribute, the most common of which is to contribute to the source code or documentation of the project. However, there are many other ways you can contribute (report issues, improve code coverage by adding unit tests, ...).
+We use GitHub issue to track all bugs and feature requests. Feel free to open an issue in case you found a bug or in case you wish to see a new feature added.
+
+For more details, check our `wiki <https://github.com/PythonPredictions/cobra/wiki/Contributing-guidelines-&-workflows>`_.
+
+Help and Support
+================
+
+Documentation
+-------------
+
+- HTML documentation of the `individual modules <https://pythonpredictions.github.io/cobra.io/docstring/modules.html>`_
+- A step-by-step `tutorial <https://pythonpredictions.github.io/cobra.io/tutorial.html>`_
+
+Outreach
+-------------
+
+- Check out the Data Science Leuven Meetup `talk <https://www.youtube.com/watch?v=w7ceZZqMEaA&feature=youtu.be>`_ by one of the core developers (second presentation)
diff --git a/cobra/evaluation/evaluator.py b/cobra/evaluation/evaluator.py
@@ -35,15 +35,20 @@ class Evaluator():
     probability_cutoff : float
         probability cut off to convert probability scores to a binary score
     roc_curve : dict
-        map containing true-positive-rate, false-positve-rate at various
+        map containing true-positive-rate, false-positive-rate at various
         thresholds (also incl.)
+    n_bins : int, optional
+        defines the number of bins used to calculate the lift curve for
+        (by default 10, so deciles)
     """
 
     def __init__(self, probability_cutoff: float=None,
-                 lift_at: float=0.05):
+                 lift_at: float=0.05,
+                 n_bins: int = 10):
 
         self.lift_at = lift_at
         self.probability_cutoff = probability_cutoff
+        self.n_bins = n_bins
 
         # Placeholder to store fitted output
         self.scalar_metrics = None
@@ -85,7 +90,7 @@ def fit(self, y_true: np.ndarray, y_pred: np.ndarray):
 
         self.roc_curve = {"fpr": fpr, "tpr": tpr, "thresholds": thresholds}
         self.confusion_matrix = confusion_matrix(y_true, y_pred_b)
-        self.lift_curve = Evaluator._compute_lift_per_decile(y_true, y_pred)
+        self.lift_curve = Evaluator._compute_lift_per_bin(y_true, y_pred, self.n_bins)
         self.cumulative_gains = Evaluator._compute_cumulative_gains(y_true,
                                                                     y_pred)
 
@@ -199,8 +204,7 @@ def plot_confusion_matrix(self, path: str=None, dim: tuple=(12, 8),
 
         plt.show()
 
-    def plot_cumulative_response_curve(self, path: str=None,
-                                       dim: tuple=(12, 8)):
+    def plot_cumulative_response_curve(self, path: str=None, dim: tuple=(12, 8)):
         """Plot cumulative response curve
 
         Parameters
@@ -430,17 +434,21 @@ def _compute_cumulative_gains(y_true: np.ndarray,
         return percentages, gains
 
     @staticmethod
-    def _compute_lift_per_decile(y_true: np.ndarray,
-                                 y_pred: np.ndarray) -> tuple:
-        """Compute lift of the model per decile, returns x-labels, lifts and
-        the target incidence to create cummulative response curves
+    def _compute_lift_per_bin(y_true: np.ndarray,
+                              y_pred: np.ndarray,
+                              n_bins: int = 10) -> tuple:
+        """Compute lift of the model for a given number of bins, returns x-labels,
+        lifts and the target incidence to create cumulative response curves
 
         Parameters
         ----------
         y_true : np.ndarray
             True binary target data labels
         y_pred : np.ndarray
             Target scores of the model
+        n_bins : int, optional
+            defines the number of bins used to calculate the lift curve for
+            (by default 10, so deciles)
 
         Returns
         -------
@@ -451,7 +459,7 @@ def _compute_lift_per_decile(y_true: np.ndarray,
         lifts = [Evaluator._compute_lift(y_true=y_true,
                                          y_pred=y_pred,
                                          lift_at=perc_lift)
-                 for perc_lift in np.arange(0.1, 1.1, 0.1)]
+                 for perc_lift in np.linspace(1/n_bins, 1, num=n_bins, endpoint=True)]
 
         x_labels = [len(lifts)-x for x in np.arange(0, len(lifts), 1)]