Skip to content

Commit

Permalink
update wiki in progress...
Browse files Browse the repository at this point in the history
  • Loading branch information
xuyuting committed Aug 13, 2024
1 parent a35ba74 commit 4d9541a
Show file tree
Hide file tree
Showing 4 changed files with 119 additions and 9 deletions.
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,6 @@ instance/
# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/
Expand Down
4 changes: 3 additions & 1 deletion docs/wiki/1_APO_Workflow_and_CodeStructure.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,9 @@ The data structure in a typical APO work flow includes:
See section
[Data](3_Data.md)
and submodule
[`parameters`](https://github.com/MSDLLCpapers/obsidian/tree/main/obsidian/parameters)
[`parameters`](https://github.com/MSDLLCpapers/obsidian/tree/main/obsidian/parameters),
and submodule
[`objectives`](https://github.com/MSDLLCpapers/obsidian/tree/main/obsidian/objectives)
for more details.


Expand Down
6 changes: 3 additions & 3 deletions docs/wiki/2_Analysis_and_Visualization.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

![APO Workflow](https://github.com/MSDLLCpapers/obsidian/blob/main/docs/_static/APO_workflow.png?raw=true)

This section introduces some additional components, such as retrospective analysis and visualization methods, that are not essential steps in a Sequential Model-Based Optimization (SMBO) algorithm but are valuable in real-world applications for several reasons.
This section introduces some additional components, such as retrospective analysis and visualization methods, that are not essential steps in a Algorithmic Process Optimization (APO) algorithm but are valuable in real-world applications for several reasons.

* The technical details involved in SMBO algorithms, such as surrogate models and acquisition functions, may seem complicated to users with non-quantitative background. As a result, the entire workflow of suggesting new experimental conditions may appear to be a black box for users, which hampers the adoption of this this powerful process optimization technique. Various performance metrics and model interpretation methods help to bridge this gap by providing users with a better intuitive understanding of the underlying algorithms and revealing the decision-making processes involved.
* The technical details involved in APO algorithms, such as surrogate models and acquisition functions, may seem complicated to users with non-quantitative background. As a result, the entire workflow of suggesting new experimental conditions may appear to be a black box for users, which hampers the adoption of this this powerful process optimization technique. Various performance metrics and model interpretation methods help to bridge this gap by providing users with a better intuitive understanding of the underlying algorithms and revealing the decision-making processes involved.

* The variable importance analysis and/or model interpretation tools can provide critical insights into the optimization process, aiding in a deeper understanding of the variables that influenced the selection of optimal solution and the relationships between input variables, which could be confirmed with additional experiments or scientific domain experts.

Expand Down Expand Up @@ -60,7 +60,7 @@ For multi-objective optimization, the evaluation is subjective to user preferenc
#### The Overall Optimization Performance Metrics
To monitor the progress of SMBO workflow, we need to define a scalar evaluation metric to summarize performance over all the $N$ data points.
To monitor the progress of APO workflow, we need to define a scalar evaluation metric to summarize performance over all the $N$ data points.
* Single-objective optimization: The optimal value (either max or min, depends on target specification) of measured experimental outcome.
Expand Down
115 changes: 113 additions & 2 deletions docs/wiki/3_Data.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,114 @@
# Data
# Data Structure

(TBA...)
## Experimental design space $X_{space}$


### Basic Syntax

Each of the input varible is defined according to the variable type and domain.
Continuous variable is specified by variable name, followed by lower and upper bounds.
> Param_Continuous('varName', lower_bound, upper_bound)
Discrete varaible is specified by variable name, followed by the (ordered) list of possible values in string format.
> Param_Categorical('varName', ['level 1', 'level 2', 'level 3',...])
An example list of input parameter specifications including commonly used variable types: continuous, categorical and ordinal:

```python
from obsidian.parameters import Param_Continuous, Param_Categorical, Param_Ordinal

params = [
Param_Continuous('Temperature', -10, 30),
Param_Continuous('Concentration', 10, 150),
Param_Continuous('Enzyme', 0.01, 0.30),
Param_Categorical('Variant', ['MRK001', 'MRK002', 'MRK003']),
Param_Ordinal('StirRate', ['Low', 'Medium', 'High']),
]
```
Then the $X_{space}$ is specified as a `ParamSpace` class object, initialized by the list of parameters.

```python
from obsidian import ParamSpace
X_space = ParamSpace(params)
```

The `ParamSpace` class object can be exported into dictionary format to facilite save and reload for future usage:

```python
X_space_dict = X_space.save_state()
X_space_reload = ParamSpace.load_state(X_space_dict)
```

### Additional Variable Types

* Continuous observatioal variable

For example, an entire time course was measured during the experiment, and data at all the different timepoints ranging from 0 to 10 are used for fitting.
But during optimization, we are only interested in improving the results for a certain fixed time point at 6.

```python
from obsidian.parameters import Param_Discrete_Numeric
Param_Observational(name = 'Time', min = 0, max = 10, design_point = 6)
```


* Discrete numerical variable

```python
from obsidian.parameters import Param_Discrete_Numeric
Param_Discrete_Numeric('LightStage', [1, 2, 3, 4, 5])
```

* Task variable

...

## Initial experimental conditions, or seed experiments $X_0$

When we start the APO workflow from scratch, the initial experimental conditions are usually generated by random sampling or design-of-experiments algorithms.
For example, generate six input conditions $X_0$ according to previously specified $X_{space}$ using Latin hypercube sampling (LHS) method:

```python
from obsidian.experiment import ExpDesigner

designer = ExpDesigner(X_space, seed = 0)
X0 = designer.initialize(m_initial = 6, method='LHS')
```


| | Temperature | Concentration | Enzyme | Variant | StirRate |
|---:|--------------:|----------------:|----------:|:----------|:-----------|
| 0 | 13.3333 | 68.3333 | 0.2275 | MRK003 | High |
| 1 | 6.66667 | 115 | 0.0825 | MRK003 | Low |
| 2 | 26.6667 | 45 | 0.0341667 | MRK002 | Medium |
| 3 | 20 | 91.6667 | 0.275833 | MRK001 | Low |
| 4 | -6.66667 | 21.6667 | 0.179167 | MRK002 | Medium |
| 5 | 0 | 138.333 | 0.130833 | MRK001 | High |


The `designer` returns experimental conditions as a pandas dataframe, which is the default data format in various `obsidian` functions.



## Experimental outcome variable(s) $Y$

...

```python
from obsidian import Target
target = Target('Yield', aim='max')
```

```python
from obsidian import Target
target = [
Target('Yield', aim='max'),
Target('Cost', aim='min')
]
```

## Use campaign object to manage data

The campaign class object serves as a convinient portal to access all components in APO workflow, including data management.

...

0 comments on commit 4d9541a

Please sign in to comment.