Skip to content

Commit

Permalink
docs: ported docs
Browse files Browse the repository at this point in the history
  • Loading branch information
nkapila6 committed Aug 14, 2024
1 parent 8885a58 commit fdb3889
Show file tree
Hide file tree
Showing 28 changed files with 1,351 additions and 270 deletions.
148 changes: 0 additions & 148 deletions docs/conf.py

This file was deleted.

28 changes: 9 additions & 19 deletions docs/source/intro.md → docs/docs/about.md
Original file line number Diff line number Diff line change
@@ -1,57 +1,47 @@
# Overview

## Overview
mlrose is a Python package for applying some of the most common randomized optimization and search algorithms to a range of different optimization problems, over both discrete- and continuous-valued parameter spaces.

## Project Background

### Project Background
mlrose was initially developed to support students of Georgia Tech’s OMSCS/OMSA offering of CS 7641: Machine Learning.

It includes implementations of all randomized optimization algorithms taught in this course, as well as functionality to apply these algorithms to integer-string optimization problems, such as N-Queens and the Knapsack problem; continuous-valued optimization problems, such as the neural network weight problem; and tour optimization problems, such as the Travelling Salesperson problem. It also has the flexibility to solve user-defined optimization problems.

At the time of development, there did not exist a single Python package that collected all of this functionality together in the one location.

## Main Features

### Main Features
**Randomized Optimization Algorithms**

* Implementations of: hill climbing, randomized hill climbing, simulated annealing, genetic algorithm and (discrete) MIMIC;
* Solve both maximization and minimization problems;
* Define the algorithm’s initial state or start from a random state;
* Define your own simulated annealing decay schedule or use one of three pre-defined, customizable decay schedules: geometric decay, arithmetic decay or exponential decay.

**Problem Types**

* Solve discrete-value (bit-string and integer-string), continuous-value and tour optimization (travelling salesperson) problems;
* Define your own fitness function for optimization or use a pre-defined function.
* Pre-defined fitness functions exist for solving the: One Max, Flip Flop, Four Peaks, Six Peaks, Continuous Peaks, Knapsack, Travelling Salesperson, N-Queens and Max-K Color optimization problems.

**Machine Learning Weight Optimization**

* Optimize the weights of neural networks, linear regression models and logistic regression models using randomized hill climbing, simulated annealing, the genetic algorithm or gradient descent;
* Supports classification and regression neural networks.

<a id="install"></a>

## Installation
### Installation
mlrose-ky was written in Python 3 and requires NumPy, SciPy and Scikit-Learn (sklearn).

mlrose was written in Python 3 and requires NumPy, SciPy and Scikit-Learn (sklearn).
The latest released version is available at the [Python package index](#) and can be installed using pip:

The latest released version is available at the [Python package index](https://pypi.org/project/mlrose/) and can be installed using pip:

```default
pip install mlrose
``` pip
pip install mlrose-ky
```

## Licensing, Authors, Acknowledgements

### Licensing, Authors, Acknowledgements
mlrose was written by Genevieve Hayes and is distributed under the [3-Clause BSD license](https://github.com/gkhayes/mlrose/blob/master/LICENSE). The source code is maintained in a [GitHub repository](https://github.com/gkhayes/mlrose).

You can cite mlrose in research publications and reports as follows:

* Hayes, G. (2019). *mlrose: Machine Learning, Randomized Optimization and SEarch package for Python*. [https://github.com/gkhayes/mlrose](https://github.com/gkhayes/mlrose). Accessed: *day month year*.

BibTeX entry:

```default
@misc{Hayes19,
author = {Hayes, G},
Expand Down
143 changes: 143 additions & 0 deletions docs/docs/algorithms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
## Algorithms
Functions to implement the randomized optimization and search algorithms.

> [!NOTE] Recommendation
> The below functions are implemented within mlrose-ky. However, it is highly recommended to use the [Runners](/runners/) for assignment.
### Hill Climbing

> [!INFO] Function
> `hill_climb`(_problem_, _max\_iters=inf_, _restarts=0_, _init\_state=None_, _curve=False_, _random\_state=None_)
Use standard hill climbing to find the optimum for a given optimization problem.

**Parameters:**

* **problem** (_optimization object_) – Object containing fitness function optimization problem to be solved. For example, `DiscreteOpt()`, `ContinuousOpt()` or `TSPOpt()`.
* **max\_iters** (_int, default: np.inf_) – Maximum number of iterations of the algorithm for each restart.
* **restarts** (_int, default: 0_) – Number of random restarts.
* **init\_state** (_array, default: None_) – 1-D Numpy array containing starting state for algorithm. If `None`, then a random state is used.
* **curve** (_bool, default: False_) – Boolean to keep fitness values for a curve. If `False`, then no curve is stored. If `True`, then a history of fitness values is provided as a third return value.
* **random\_state** (_int, default: None_) – If random\_state is a positive integer, random\_state is the seed used by np.random.seed(); otherwise, the random seed is not set.

**Returns:**

* **best\_state** (_array_) – Numpy array containing state that optimizes the fitness function.
* **best\_fitness** (_float_) – Value of fitness function at best state.
* **fitness\_curve** (_array_) – Numpy array containing the fitness at every iteration. Only returned if input argument `curve` is `True`.

#### References

Russell, S. and P. Norvig (2010). _Artificial Intelligence: A Modern Approach_, 3rd edition. Prentice Hall, New Jersey, USA.

### Random Hill Climbing

> [!INFO] Function
> `random_hill_climb`(_problem_, _max\_attempts=10_, _max\_iters=inf_, _restarts=0_, _init\_state=None_, _curve=False_, _random\_state=None_)
Use randomized hill climbing to find the optimum for a given optimization problem.

**Parameters**:

* **problem** (_optimization object_) – Object containing fitness function optimization problem to be solved. For example, `DiscreteOpt()`, `ContinuousOpt()` or `TSPOpt()`.
* **max\_attempts** (_int, default: 10_) – Maximum number of attempts to find a better neighbor at each step.
* **max\_iters** (_int, default: np.inf_) – Maximum number of iterations of the algorithm.
* **restarts** (_int, default: 0_) – Number of random restarts.
* **init\_state** (_array, default: None_) – 1-D Numpy array containing starting state for algorithm. If `None`, then a random state is used.
* **curve** (_bool, default: False_) – Boolean to keep fitness values for a curve. If `False`, then no curve is stored. If `True`, then a history of fitness values is provided as a third return value.
* **random\_state** (_int, default: None_) – If random\_state is a positive integer, random\_state is the seed used by np.random.seed(); otherwise, the random seed is not set.

**Returns**:

* **best\_state** (_array_) – Numpy array containing state that optimizes the fitness function.
* **best\_fitness** (_float_) – Value of fitness function at best state.
* **fitness\_curve** (_array_) – Numpy array containing the fitness at every iteration. Only returned if input argument `curve` is `True`.

#### References

Brownlee, J (2011). _Clever Algorithms: Nature-Inspired Programming Recipes_. [http://www.cleveralgorithms.com](http://www.cleveralgorithms.com/).

### Simulated Annealing

> [!INFO] Function
> `simulated_annealing`(_problem_, _schedule=<mlrose.decay.GeomDecay object>_, _max\_attempts=10_, _max\_iters=inf_, _init\_state=None_, _curve=False_, _random\_state=None_)
Use simulated annealing to find the optimum for a given optimization problem.

**Parameters**:

* **problem** (_optimization object_) – Object containing fitness function optimization problem to be solved. For example, `DiscreteOpt()`, `ContinuousOpt()` or `TSPOpt()`.
* **schedule** (schedule object, default: `mlrose.GeomDecay()`) – Schedule used to determine the value of the temperature parameter.
* **max\_attempts** (_int, default: 10_) – Maximum number of attempts to find a better neighbor at each step.
* **max\_iters** (_int, default: np.inf_) – Maximum number of iterations of the algorithm.
* **init\_state** (_array, default: None_) – 1-D Numpy array containing starting state for algorithm. If `None`, then a random state is used.
* **curve** (_bool, default: False_) – Boolean to keep fitness values for a curve. If `False`, then no curve is stored. If `True`, then a history of fitness values is provided as a third return value.
* **random\_state** (_int, default: None_) – If random\_state is a positive integer, random\_state is the seed used by np.random.seed(); otherwise, the random seed is not set.

**Returns**:

* **best\_state** (_array_) – Numpy array containing state that optimizes the fitness function.
* **best\_fitness** (_float_) – Value of fitness function at best state.
* **fitness\_curve** (_array_) – Numpy array containing the fitness at every iteration. Only returned if input argument `curve` is `True`.

#### References

Russell, S. and P. Norvig (2010). _Artificial Intelligence: A Modern Approach_, 3rd edition. Prentice Hall, New Jersey, USA.

### Genetic Algorithms

> [!INFO] Function
> `genetic_alg`(_problem_, _pop\_size=200_, _mutation\_prob=0.1_, _max\_attempts=10_, _max\_iters=inf_, _curve=False_, _random\_state=None_)
Use a standard genetic algorithm to find the optimum for a given optimization problem.

**Parameters**:

* **problem** (_optimization object_) – Object containing fitness function optimization problem to be solved. For example, `DiscreteOpt()`, `ContinuousOpt()` or `TSPOpt()`.
* **pop\_size** (_int, default: 200_) – Size of population to be used in genetic algorithm.
* **mutation\_prob** (_float, default: 0.1_) – Probability of a mutation at each element of the state vector during reproduction, expressed as a value between 0 and 1.
* **max\_attempts** (_int, default: 10_) – Maximum number of attempts to find a better state at each step.
* **max\_iters** (_int, default: np.inf_) – Maximum number of iterations of the algorithm.
* **curve** (_bool, default: False_) – Boolean to keep fitness values for a curve. If `False`, then no curve is stored. If `True`, then a history of fitness values is provided as a third return value.
* **random\_state** (_int, default: None_) – If random\_state is a positive integer, random\_state is the seed used by np.random.seed(); otherwise, the random seed is not set.

**Returns**:

* **best\_state** (_array_) – Numpy array containing state that optimizes the fitness function.
* **best\_fitness** (_float_) – Value of fitness function at best state.
* **fitness\_curve** (_array_) – Numpy array of arrays containing the fitness of the entire population at every iteration. Only returned if input argument `curve` is `True`.

#### References

Russell, S. and P. Norvig (2010). _Artificial Intelligence: A Modern Approach_, 3rd edition. Prentice Hall, New Jersey, USA.

### MIMIC

> [!INFO] Function
> `mimic`(_problem_, _pop\_size=200_, _keep\_pct=0.2_, _max\_attempts=10_, _max\_iters=inf_, _curve=False_, _random\_state=None_, _fast\_mimic=False_)
Use MIMIC to find the optimum for a given optimization problem.
> [!DANGER] Warning
> MIMIC cannot be used for solving continuous-state optimization problems.
**Parameters**:

* **problem** (_optimization object_) – Object containing fitness function optimization problem to be solved. For example, `DiscreteOpt()` or `TSPOpt()`.
* **pop\_size** (_int, default: 200_) – Size of population to be used in algorithm.
* **keep\_pct** (_float, default: 0.2_) – Proportion of samples to keep at each iteration of the algorithm, expressed as a value between 0 and 1.
* **max\_attempts** (_int, default: 10_) – Maximum number of attempts to find a better neighbor at each step.
* **max\_iters** (_int, default: np.inf_) – Maximum number of iterations of the algorithm.
* **curve** (_bool, default: False_) – Boolean to keep fitness values for a curve. If `False`, then no curve is stored. If `True`, then a history of fitness values is provided as a third return value.
* **random\_state** (_int, default: None_) – If random\_state is a positive integer, random\_state is the seed used by np.random.seed(); otherwise, the random seed is not set.
* **fast\_mimic** (_bool, default: False_) – Activate fast mimic mode to compute the mutual information in vectorized form. Faster speed but requires more memory.

**Returns**:

* **best\_state** (_array_) – Numpy array containing state that optimizes the fitness function.
* **best\_fitness** (_float_) – Value of fitness function at best state.
* **fitness\_curve** (_array_) – Numpy array containing the fitness at every iteration. Only returned if input argument `curve` is `True`.

#### References

De Bonet, J., C. Isbell, and P. Viola (1997). MIMIC: Finding Optima by Estimating Probability Densities. In _Advances in Neural Information Processing Systems_ (NIPS) 9, pp. 424–430.

Loading

0 comments on commit fdb3889

Please sign in to comment.