From 4d9541a221b8fa13b45bef37faa77b48b3716bf7 Mon Sep 17 00:00:00 2001 From: Yuting Xu <12775874+xuyuting@users.noreply.github.com> Date: Tue, 13 Aug 2024 15:31:20 -0400 Subject: [PATCH] update wiki in progress... --- .gitignore | 3 - docs/wiki/1_APO_Workflow_and_CodeStructure.md | 4 +- docs/wiki/2_Analysis_and_Visualization.md | 6 +- docs/wiki/3_Data.md | 115 +++++++++++++++++- 4 files changed, 119 insertions(+), 9 deletions(-) diff --git a/.gitignore b/.gitignore index c2e89ce..a5fae96 100644 --- a/.gitignore +++ b/.gitignore @@ -85,9 +85,6 @@ instance/ # Scrapy stuff: .scrapy -# Sphinx documentation -docs/_build/ - # PyBuilder .pybuilder/ target/ diff --git a/docs/wiki/1_APO_Workflow_and_CodeStructure.md b/docs/wiki/1_APO_Workflow_and_CodeStructure.md index 47f39c2..2b8c3d7 100644 --- a/docs/wiki/1_APO_Workflow_and_CodeStructure.md +++ b/docs/wiki/1_APO_Workflow_and_CodeStructure.md @@ -48,7 +48,9 @@ The data structure in a typical APO work flow includes: See section [Data](3_Data.md) and submodule -[`parameters`](https://github.com/MSDLLCpapers/obsidian/tree/main/obsidian/parameters) +[`parameters`](https://github.com/MSDLLCpapers/obsidian/tree/main/obsidian/parameters), +and submodule +[`objectives`](https://github.com/MSDLLCpapers/obsidian/tree/main/obsidian/objectives) for more details. diff --git a/docs/wiki/2_Analysis_and_Visualization.md b/docs/wiki/2_Analysis_and_Visualization.md index 453bc33..f157d03 100644 --- a/docs/wiki/2_Analysis_and_Visualization.md +++ b/docs/wiki/2_Analysis_and_Visualization.md @@ -3,9 +3,9 @@ ![APO Workflow](https://github.com/MSDLLCpapers/obsidian/blob/main/docs/_static/APO_workflow.png?raw=true) -This section introduces some additional components, such as retrospective analysis and visualization methods, that are not essential steps in a Sequential Model-Based Optimization (SMBO) algorithm but are valuable in real-world applications for several reasons. +This section introduces some additional components, such as retrospective analysis and visualization methods, that are not essential steps in a Algorithmic Process Optimization (APO) algorithm but are valuable in real-world applications for several reasons. -* The technical details involved in SMBO algorithms, such as surrogate models and acquisition functions, may seem complicated to users with non-quantitative background. As a result, the entire workflow of suggesting new experimental conditions may appear to be a black box for users, which hampers the adoption of this this powerful process optimization technique. Various performance metrics and model interpretation methods help to bridge this gap by providing users with a better intuitive understanding of the underlying algorithms and revealing the decision-making processes involved. +* The technical details involved in APO algorithms, such as surrogate models and acquisition functions, may seem complicated to users with non-quantitative background. As a result, the entire workflow of suggesting new experimental conditions may appear to be a black box for users, which hampers the adoption of this this powerful process optimization technique. Various performance metrics and model interpretation methods help to bridge this gap by providing users with a better intuitive understanding of the underlying algorithms and revealing the decision-making processes involved. * The variable importance analysis and/or model interpretation tools can provide critical insights into the optimization process, aiding in a deeper understanding of the variables that influenced the selection of optimal solution and the relationships between input variables, which could be confirmed with additional experiments or scientific domain experts. @@ -60,7 +60,7 @@ For multi-objective optimization, the evaluation is subjective to user preferenc #### The Overall Optimization Performance Metrics -To monitor the progress of SMBO workflow, we need to define a scalar evaluation metric to summarize performance over all the $N$ data points. +To monitor the progress of APO workflow, we need to define a scalar evaluation metric to summarize performance over all the $N$ data points. * Single-objective optimization: The optimal value (either max or min, depends on target specification) of measured experimental outcome. diff --git a/docs/wiki/3_Data.md b/docs/wiki/3_Data.md index c8daa1d..b906f5d 100644 --- a/docs/wiki/3_Data.md +++ b/docs/wiki/3_Data.md @@ -1,3 +1,114 @@ -# Data +# Data Structure -(TBA...) \ No newline at end of file +## Experimental design space $X_{space}$ + + +### Basic Syntax + +Each of the input varible is defined according to the variable type and domain. +Continuous variable is specified by variable name, followed by lower and upper bounds. +> Param_Continuous('varName', lower_bound, upper_bound) + +Discrete varaible is specified by variable name, followed by the (ordered) list of possible values in string format. +> Param_Categorical('varName', ['level 1', 'level 2', 'level 3',...]) + +An example list of input parameter specifications including commonly used variable types: continuous, categorical and ordinal: + +```python +from obsidian.parameters import Param_Continuous, Param_Categorical, Param_Ordinal + +params = [ + Param_Continuous('Temperature', -10, 30), + Param_Continuous('Concentration', 10, 150), + Param_Continuous('Enzyme', 0.01, 0.30), + Param_Categorical('Variant', ['MRK001', 'MRK002', 'MRK003']), + Param_Ordinal('StirRate', ['Low', 'Medium', 'High']), +] +``` +Then the $X_{space}$ is specified as a `ParamSpace` class object, initialized by the list of parameters. + +```python +from obsidian import ParamSpace +X_space = ParamSpace(params) +``` + +The `ParamSpace` class object can be exported into dictionary format to facilite save and reload for future usage: + +```python +X_space_dict = X_space.save_state() +X_space_reload = ParamSpace.load_state(X_space_dict) +``` + +### Additional Variable Types + +* Continuous observatioal variable + + For example, an entire time course was measured during the experiment, and data at all the different timepoints ranging from 0 to 10 are used for fitting. + But during optimization, we are only interested in improving the results for a certain fixed time point at 6. + + ```python + from obsidian.parameters import Param_Discrete_Numeric + Param_Observational(name = 'Time', min = 0, max = 10, design_point = 6) + ``` + + +* Discrete numerical variable + + ```python + from obsidian.parameters import Param_Discrete_Numeric + Param_Discrete_Numeric('LightStage', [1, 2, 3, 4, 5]) + ``` + +* Task variable + + ... + +## Initial experimental conditions, or seed experiments $X_0$ + +When we start the APO workflow from scratch, the initial experimental conditions are usually generated by random sampling or design-of-experiments algorithms. +For example, generate six input conditions $X_0$ according to previously specified $X_{space}$ using Latin hypercube sampling (LHS) method: + +```python +from obsidian.experiment import ExpDesigner + +designer = ExpDesigner(X_space, seed = 0) +X0 = designer.initialize(m_initial = 6, method='LHS') +``` + + +| | Temperature | Concentration | Enzyme | Variant | StirRate | +|---:|--------------:|----------------:|----------:|:----------|:-----------| +| 0 | 13.3333 | 68.3333 | 0.2275 | MRK003 | High | +| 1 | 6.66667 | 115 | 0.0825 | MRK003 | Low | +| 2 | 26.6667 | 45 | 0.0341667 | MRK002 | Medium | +| 3 | 20 | 91.6667 | 0.275833 | MRK001 | Low | +| 4 | -6.66667 | 21.6667 | 0.179167 | MRK002 | Medium | +| 5 | 0 | 138.333 | 0.130833 | MRK001 | High | + + +The `designer` returns experimental conditions as a pandas dataframe, which is the default data format in various `obsidian` functions. + + + +## Experimental outcome variable(s) $Y$ + +... + +```python +from obsidian import Target +target = Target('Yield', aim='max') +``` + +```python +from obsidian import Target +target = [ + Target('Yield', aim='max'), + Target('Cost', aim='min') +] +``` + +## Use campaign object to manage data + +The campaign class object serves as a convinient portal to access all components in APO workflow, including data management. + +...