Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump version of Python, torch and pytorch-lightning #58

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 19 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,24 @@
# PyCave
# TorchGMM

![PyPi](https://img.shields.io/pypi/v/pycave?label=version)
![License](https://img.shields.io/pypi/l/pycave)
<!-- ![PyPi](https://img.shields.io/pypi/v/torchgmm?label=version)
![License](https://img.shields.io/pypi/l/torchgmm) -->

PyCave allows you to run traditional machine learning models on CPU, GPU, and even on multiple
nodes. All models are implemented in [PyTorch](https://pytorch.org/) and provide an `Estimator` API
TorchGMM allows to run Gaussian Mixture Models on single or multiple CPUs/GPUs.
The repository is a fork from [PyCave](https://github.com/borchero/pycave) and [LightKit](https://github.com/borchero/lightkit), two amazing packages developed by [Olivier Borchert](https://github.com/borchero) that are not being maintained anymore.
While PyCave implements additional models such as Markov Chains, TorchGMM focuses only on Gaussian Mixture Models.

The models are implemented in [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/), and provide an `Estimator` API
that is fully compatible with [scikit-learn](https://scikit-learn.org/stable/).

For Gaussian mixture model, PyCave allows for 100x speed ups when using a GPU and enables to train
For Gaussian mixture model, TorchGMM allows for 100x speed ups when using a GPU and enables to train
on markedly larger datasets via mini-batch training. The full suite of benchmarks run to compare
PyCave models against scikit-learn models is available on the
TorchGMM models against scikit-learn models is available on the
[documentation website](https://pycave.borchero.com/sites/benchmark.html).

_PyCave version 3 is a complete rewrite of PyCave which is tested much more rigorously, depends on
well-maintained libraries and is tuned for better performance. While you are, thus, highly
encouraged to upgrade, refer to [pycave-v2.borchero.com](https://pycave-v2.borchero.com) for
documentation on PyCave 2._

## Features

- Support for GPU and multi-node training by implementing models in PyTorch and relying on
[PyTorch Lightning](https://www.pytorchlightning.ai/)
[PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/)
- Mini-batch training for all models such that they can be used on huge datasets
- Well-structured implementation of models

Expand All @@ -31,21 +29,21 @@ documentation on PyCave 2._

## Installation

PyCave is available via `pip`:
TorchGMM is available via `pip`:

```bash
pip install pycave
pip install torchgmm
```

If you are using [Poetry](https://python-poetry.org/):

```bash
poetry add pycave
poetry add torchgmm
```

## Usage

If you've ever used scikit-learn, you'll feel right at home when using PyCave. First, let's create
If you've ever used scikit-learn, you'll feel right at home when using TorchGMM. First, let's create
some artificial data to work with:

```python
Expand All @@ -62,7 +60,7 @@ This dataset consists of three clusters with 8-dimensional datapoints. If you wa
model, to find the clusters' centroids, it's as easy as:

```python
from pycave.clustering import KMeans
from torchgmm.clustering import KMeans

estimator = KMeans(3)
estimator.fit(X)
Expand All @@ -79,7 +77,7 @@ and which methods are available.

### GPU and Multi-Node training

For GPU- and multi-node training, PyCave leverages PyTorch Lightning. The hardware that training
For GPU- and multi-node training, TorchGMM leverages PyTorch Lightning. The hardware that training
runs on is determined by the
[Trainer](https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.trainer.trainer.html#pytorch_lightning.trainer.trainer.Trainer)
class. It's
Expand All @@ -104,12 +102,11 @@ In fact, **you do not need to change anything else in your code**.

### Implemented Models

Currently, PyCave implements three different models:
Currently, TorchGMM implements two different models:

- [GaussianMixture](https://pycave.borchero.com/sites/generated/bayes/gmm/pycave.bayes.GaussianMixture.html)
- [MarkovChain](https://pycave.borchero.com/sites/generated/bayes/markov_chain/pycave.bayes.MarkovChain.html)
- [K-Means](https://pycave.borchero.com/sites/generated/clustering/kmeans/pycave.clustering.KMeans.html)

## License

PyCave is licensed under the [MIT License](https://github.com/borchero/pycave/blob/main/LICENSE).
TorchGMM is licensed under the [MIT License](https://github.com/marcovarrone/torchgmm/blob/main/LICENSE).
3,002 changes: 1,423 additions & 1,579 deletions poetry.lock

Large diffs are not rendered by default.

15 changes: 7 additions & 8 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,20 @@ classifiers = [
"Development Status :: 5 - Production/Stable",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
]
description = "Traditional Machine Learning Models in PyTorch."
documentation = "https://pycave.borchero.com"
description = "Gaussian Mixture Models in PyTorch."
license = "MIT"
name = "pycave"
name = "torchgmm"
readme = "README.md"
repository = "https://github.com/borchero/pycave"
repository = "https://github.com/marcovarrone/torchgmm"
version = "0.0.0"

[tool.poetry.dependencies]
lightkit = "^0.5.0"
numpy = "^1.20.3"
python = ">=3.8,<3.11"
pytorch-lightning = "^1.6.0"
torch = "^1.8.0"
torchmetrics = ">=0.6,<0.12"
pytorch-lightning = "<2.2"
torch = "<2.2.0"
torchmetrics = ">=0.6,<1.4.0"

[tool.poetry.group.pre-commit.dependencies]
black = "^22.12.0"
Expand Down Expand Up @@ -76,7 +75,7 @@ target-version = ["py38", "py39", "py310"]
[tool.isort]
force_alphabetical_sort_within_sections = true
include_trailing_comma = true
known_first_party = "pycave,tests"
known_first_party = "torchgmm,tests"
line_length = 99
lines_between_sections = 0
profile = "black"
Expand Down
4 changes: 2 additions & 2 deletions tests/_data/gmm.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# pylint: disable=missing-function-docstring
from typing import Tuple
import torch
from pycave.bayes.core import CovarianceType
from pycave.bayes.gmm import GaussianMixtureModel, GaussianMixtureModelConfig
from torchgmm.bayes.core import CovarianceType
from torchgmm.bayes.gmm import GaussianMixtureModel, GaussianMixtureModelConfig


def sample_gmm(
Expand Down
2 changes: 1 addition & 1 deletion tests/bayes/core/benchmark_log_normal.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from sklearn.mixture._gaussian_mixture import _compute_precision_cholesky # type: ignore
from sklearn.mixture._gaussian_mixture import _estimate_log_gaussian_prob # type: ignore
from torch.distributions import MultivariateNormal
from pycave.bayes.core import cholesky_precision, log_normal
from torchgmm.bayes.core import cholesky_precision, log_normal


def test_log_normal_spherical(benchmark: BenchmarkFixture):
Expand Down
2 changes: 1 addition & 1 deletion tests/bayes/core/benchmark_precision_cholesky.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import torch
from pytest_benchmark.fixture import BenchmarkFixture # type: ignore
from sklearn.mixture._gaussian_mixture import _compute_precision_cholesky # type: ignore
from pycave.bayes.core import cholesky_precision
from torchgmm.bayes.core import cholesky_precision


def test_cholesky_precision_spherical(benchmark: BenchmarkFixture):
Expand Down
4 changes: 2 additions & 2 deletions tests/bayes/core/test_normal.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
from sklearn.mixture._gaussian_mixture import _compute_log_det_cholesky # type: ignore
from sklearn.mixture._gaussian_mixture import _compute_precision_cholesky # type: ignore
from torch.distributions import MultivariateNormal
from pycave.bayes.core import cholesky_precision, covariance, log_normal, sample_normal
from pycave.bayes.core._jit import _cholesky_logdet # type: ignore
from torchgmm.bayes.core import cholesky_precision, covariance, log_normal, sample_normal
from torchgmm.bayes.core._jit import _cholesky_logdet # type: ignore
from tests._data.normal import (
sample_data,
sample_diag_covars,
Expand Down
8 changes: 4 additions & 4 deletions tests/bayes/gmm/benchmark_gmm_estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
import torch
from pytest_benchmark.fixture import BenchmarkFixture # type: ignore
from sklearn.mixture import GaussianMixture as SklearnGaussianMixture # type: ignore
from pycave.bayes import GaussianMixture
from pycave.bayes.core.types import CovarianceType
from torchgmm.bayes import GaussianMixture
from torchgmm.bayes.core.types import CovarianceType
from tests._data.gmm import sample_gmm


Expand Down Expand Up @@ -64,7 +64,7 @@ def test_sklearn(
(1000000, 64, 64, "diag", 100000),
],
)
def test_pycave(
def test_torchgmm(
benchmark: BenchmarkFixture,
num_datapoints: int,
num_features: int,
Expand Down Expand Up @@ -107,7 +107,7 @@ def test_pycave(
(1000000, 64, 64, "tied", 100000),
],
)
def test_pycave_gpu(
def test_torchgmm_gpu(
benchmark: BenchmarkFixture,
num_datapoints: int,
num_features: int,
Expand Down
4 changes: 2 additions & 2 deletions tests/bayes/gmm/test_gmm_estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
import pytest
import torch
from sklearn.mixture import GaussianMixture as SklearnGaussianMixture # type: ignore
from pycave.bayes import GaussianMixture
from pycave.bayes.core import CovarianceType
from torchgmm.bayes import GaussianMixture
from torchgmm.bayes.core import CovarianceType
from tests._data.gmm import sample_gmm


Expand Down
4 changes: 2 additions & 2 deletions tests/bayes/gmm/test_gmm_metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
import numpy as np
import sklearn.mixture._gaussian_mixture as skgmm # type: ignore
import torch
from pycave.bayes.core import CovarianceType
from pycave.bayes.gmm.metrics import CovarianceAggregator, MeanAggregator, PriorAggregator
from torchgmm.bayes.core import CovarianceType
from torchgmm.bayes.gmm.metrics import CovarianceAggregator, MeanAggregator, PriorAggregator


def test_prior_aggregator():
Expand Down
2 changes: 1 addition & 1 deletion tests/bayes/gmm/test_gmm_model.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# pylint: disable=missing-function-docstring
from torch import jit
from pycave.bayes.gmm import GaussianMixtureModel, GaussianMixtureModelConfig
from torchgmm.bayes.gmm import GaussianMixtureModel, GaussianMixtureModelConfig


def test_compile():
Expand Down
2 changes: 1 addition & 1 deletion tests/bayes/markov_chain/test_markov_chain_estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from typing import Tuple
import pytest
import torch
from pycave.bayes import MarkovChain
from torchgmm.bayes import MarkovChain


def test_fit_automatic_config():
Expand Down
2 changes: 1 addition & 1 deletion tests/bayes/markov_chain/test_markov_chain_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import torch
from torch import jit
from torch.nn.utils.rnn import pack_padded_sequence
from pycave.bayes.markov_chain import MarkovChainModel, MarkovChainModelConfig
from torchgmm.bayes.markov_chain import MarkovChainModel, MarkovChainModelConfig


def test_compile():
Expand Down
8 changes: 4 additions & 4 deletions tests/clustering/kmeans/benchmark_kmeans_estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
import torch
from pytest_benchmark.fixture import BenchmarkFixture # type: ignore
from sklearn.cluster import KMeans as SklearnKMeans # type: ignore
from pycave.clustering import KMeans
from pycave.clustering.kmeans.types import KMeansInitStrategy
from torchgmm.clustering import KMeans
from torchgmm.clustering.kmeans.types import KMeansInitStrategy
from tests._data.gmm import sample_gmm


Expand Down Expand Up @@ -61,7 +61,7 @@ def test_sklearn(
(1000000, 100000, 64, 64, "random"),
],
)
def test_pycave(
def test_torchgmm(
benchmark: BenchmarkFixture,
num_datapoints: int,
batch_size: Optional[int],
Expand Down Expand Up @@ -101,7 +101,7 @@ def test_pycave(
(10000000, 1000000, 128, 128, "random"),
],
)
def test_pycave_gpu(
def test_torchgmm_gpu(
benchmark: BenchmarkFixture,
num_datapoints: int,
batch_size: Optional[int],
Expand Down
2 changes: 1 addition & 1 deletion tests/clustering/kmeans/test_kmeans_estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import pytest
import torch
from sklearn.cluster import KMeans as SklearnKMeans # type: ignore
from pycave.clustering import KMeans
from torchgmm.clustering import KMeans
from tests._data.gmm import sample_gmm


Expand Down
2 changes: 1 addition & 1 deletion tests/clustering/kmeans/test_kmeans_model.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# pylint: disable=missing-function-docstring
import torch
from torch import jit
from pycave.clustering.kmeans import KMeansModel, KMeansModelConfig
from torchgmm.clustering.kmeans import KMeansModel, KMeansModelConfig


def test_compile():
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
from lightkit import ConfigurableBaseEstimator
from lightkit.data import collate_tensor, DataLoader, dataset_from_tensors, TensorLike
from lightkit.estimator import PredictorMixin
from pycave.bayes.core import CovarianceType
from pycave.clustering import KMeans
from torchgmm.bayes.core import CovarianceType
from torchgmm.clustering import KMeans
from .lightning_module import (
GaussianMixtureKmeansInitLightningModule,
GaussianMixtureLightningModule,
Expand All @@ -30,7 +30,7 @@ class GaussianMixture(
`Wikipedia <https://en.wikipedia.org/wiki/Mixture_model>`_.

See also:
.. currentmodule:: pycave.bayes.gmm
.. currentmodule:: torchgmm.bayes.gmm
.. autosummary::
:nosignatures:
:template: classes/pytorch_module.rst
Expand Down Expand Up @@ -80,7 +80,7 @@ def __init__(
num_workers: The number of workers to use for loading the data. Only used if a PyTorch
dataset is passed to :meth:`fit` or related methods.
trainer_params: Initialization parameters to use when initializing a PyTorch Lightning
trainer. By default, it disables various stdout logs unless PyCave is configured to
trainer. By default, it disables various stdout logs unless TorchGMM is configured to
do verbose logging. Checkpointing and logging are disabled regardless of the log
level. This estimator further sets the following overridable defaults:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
import torch
from pytorch_lightning.callbacks import EarlyStopping
from torchmetrics import MeanMetric
from pycave.bayes.core import cholesky_precision
from pycave.utils import NonparametricLightningModule
from torchgmm.bayes.core import cholesky_precision
from torchgmm.utils import NonparametricLightningModule
from .metrics import CovarianceAggregator, MeanAggregator, PriorAggregator
from .model import GaussianMixtureModel

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from typing import Any, Callable, Optional
import torch
from torchmetrics import Metric
from pycave.bayes.core import covariance_shape, CovarianceType
from torchgmm.bayes.core import covariance_shape, CovarianceType


class PriorAggregator(Metric):
Expand Down
4 changes: 2 additions & 2 deletions pycave/bayes/gmm/model.py → torchgmm/bayes/gmm/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
import torch
from lightkit.nn import Configurable
from torch import jit, nn
from pycave.bayes.core import covariance, covariance_shape, CovarianceType
from pycave.bayes.core._jit import jit_log_normal, jit_sample_normal
from torchgmm.bayes.core import covariance, covariance_shape, CovarianceType
from torchgmm.bayes.core._jit import jit_log_normal, jit_sample_normal


@dataclass
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

- **random**: Samples responsibilities of datapoints at random and subsequently initializes means
and covariances from these.
- **kmeans**: Runs K-Means via :class:`pycave.clustering.KMeans` and uses the centroids as the
- **kmeans**: Runs K-Means via :class:`torchgmm.clustering.KMeans` and uses the centroids as the
initial component means. For computing the covariances, responsibilities are given as the
one-hot cluster assignments.
- **kmeans++**: Runs only the K-Means++ initialization procedure to sample means in a smart
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ class MarkovChain(ConfigurableBaseEstimator[MarkovChainModel]): # type: ignore
available on `Wikipedia <https://en.wikipedia.org/wiki/Markov_chain>`_.

See also:
.. currentmodule:: pycave.bayes.markov_chain
.. currentmodule:: torchgmm.bayes.markov_chain
.. autosummary::
:nosignatures:
:template: classes/pytorch_module.rst
Expand Down Expand Up @@ -54,7 +54,7 @@ def __init__(
num_workers: The number of workers to use for loading the data. Only used if a PyTorch
dataset is passed to :meth:`fit` or related methods.
trainer_params: Initialization parameters to use when initializing a PyTorch Lightning
trainer. By default, it disables various stdout logs unless PyCave is configured to
trainer. By default, it disables various stdout logs unless TorchGMM is configured to
do verbose logging. Checkpointing and logging are disabled regardless of the log
level. This estimator further enforces the following parameters:

Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import torch
from torch.nn.utils.rnn import PackedSequence
from torchmetrics import MeanMetric
from pycave.bayes.markov_chain.metrics import StateCountAggregator
from pycave.utils import NonparametricLightningModule
from torchgmm.bayes.markov_chain.metrics import StateCountAggregator
from torchgmm.utils import NonparametricLightningModule
from .model import MarkovChainModel


Expand Down
File renamed without changes.
Loading
Loading