finish wiki/3_Data.md

MSDLLCpapers · Aug 14, 2024 · 574702d · 574702d
1 parent b1c0202
commit 574702d
Showing 1 changed file with 121 additions and 16 deletions.
diff --git a/docs/wiki/3_Data.md b/docs/wiki/3_Data.md
@@ -2,7 +2,6 @@
 
 ## Experimental design space $X_{space}$
 
-
 ### Basic Syntax 
 
 Each of the input varible is defined according to the variable type and domain.
@@ -32,13 +31,22 @@ from obsidian import ParamSpace
 X_space = ParamSpace(params)
 ```
 
-The `ParamSpace` class object can be exported into dictionary format to facilite save and reload for future usage:
+The `ParamSpace` class object can be exported into dictionary format to facilite save to json files and reload for future usage:
 
 ```python
-X_space_dict = X_space.save_state()
-X_space_reload = ParamSpace.load_state(X_space_dict)
+import json
+
+with open('X_space.json', 'w') as f:
+    X_space_dict = X_space.save_state()
+    json.dump(X_space_dict, f)
+
+with open('X_space.json', 'r') as f:
+    X_space_dict = json.load(f)
+    X_space_reload = ParamSpace.load_state(X_space_dict)
 ```
 
+In addition, the `ParamSpace` class contains various instance methods for input variable transformation, which are implicitly called during the optimization but no need for direct access by the user. 
+
 ### Additional Variable Types
 
 * Continuous observatioal variable
@@ -51,7 +59,6 @@ X_space_reload = ParamSpace.load_state(X_space_dict)
     Param_Observational(name = 'Time', min = 0, max = 10, design_point = 6)
     ```
 
-
 * Discrete numerical variable
 
     ```python
@@ -61,21 +68,28 @@ X_space_reload = ParamSpace.load_state(X_space_dict)
 
 * Task variable
 
-    ...
+    Only one special 'task' categorical variable is allowed for encoding multiple tasks.
+    Distinct response will be predicted for each task. 
+
+    ```python
+    from obsidian.parameters import Task
+    Task('TaskVar', ['Task_A', 'Task_B', 'Task_C', 'Task_D'])
+    ```
 
 ## Initial experimental conditions, or seed experiments $X_0$
 
 When we start the APO workflow from scratch, the initial experimental conditions are usually generated by random sampling or design-of-experiments algorithms. 
+
 For example, generate six input conditions $X_0$ according to previously specified $X_{space}$ using Latin hypercube sampling (LHS) method:
 
 ```python
 from obsidian.experiment import ExpDesigner
 
 designer = ExpDesigner(X_space, seed = 0)
 X0 = designer.initialize(m_initial = 6, method='LHS')
+print(X0.to_markdown())
 ```
 
-
 |    |   Temperature |   Concentration |    Enzyme | Variant   | StirRate   |
 |---:|--------------:|----------------:|----------:|:----------|:-----------|
 |  0 |      13.3333  |         68.3333 | 0.2275    | MRK003    | High       |
@@ -92,23 +106,114 @@ The `designer` returns experimental conditions as a pandas dataframe, which is t
 
 ## Experimental outcome variable(s) $Y$
 
-...
+### Basic Syntax 
+
+Similar to the `ParamSpace` object for input variables, there is `Target` class object which handles the specification and preprocessing for experimental outcome variables. 
+
+For each outcome measurement, there are three essential arguments to be specified:
+* name: Variable name, which is a required input by user
+* f_transform: Transformation function for preprocessing the raw response values, to facilitate the numerical computations during optimization. 
+    - 'Identity': (default) No transformation
+    - 'Standard': Normalization into zero mean and unit standard deviation
+    - 'Logit_MinMax': Logit transofrmation with the range or scale automatically calculated based on data
+    - 'Logit_Percentage': Assuming input response is a percentage ranging between 0 to 100, apply logit transofrmation with scale 1/100.
+* aim: Either 'max'(default) or 'min', which specifies the desirable direction for improvement. Currently it only handles continuous outcome values. 
+
+
+Depend on the number of outcomes, define one `Target` object or a list of multiple objects:
 
 ```python
 from obsidian import Target
-target = Target('Yield', aim='max')
+
+target = Target(name = 'Yield', f_transform = 'Logit_Percentage', aim='max')
+
+target_multiple = [
+    Target(name = 'Yield', f_transform = 'Logit_Percentage', aim='max'),
+    Target(name = 'Cost', f_transform = 'Standard', aim='min')
+]
 ```
 
+### Example 
+
+To demonstrate the usage of `Target` class, we simulate a single task experimental outcome $y_0$ using the previously generated $X_0$ and an analytical function 'shifted_parab'.
+
 ```python
-from obsidian import Target
-target = [
-    Target('Yield', aim='max'),
-    Target('Cost', aim='min')
-]
+from obsidian.experiment import Simulator
+from obsidian.experiment.benchmark import shifted_parab
+
+simulator = Simulator(X_space, shifted_parab, name='Yield')
+y0 = simulator.simulate(X0)
+print(y0.to_markdown())
+```
+
+|    |   Yield |
+|---:|--------:|
+|  0 | 47.8147 |
+|  1 | 62.5599 |
+|  2 | 60.7972 |
+|  3 | 39.1121 |
+|  4 | 83.0833 |
+|  5 | 52.2631 |
+
+If manually input $y_0$, it should be a pandas dataframe with the same variable name 'Yield' as specifed in the `target` definition. 
+
+When the 'transform_f' function is called with 'fit=True' during the optimization workflow, the raw response will be saved as an attribute to `target` object
+```python
+y_transformed = target.transform_f(y0, fit = True)
+type(target.f_raw) # torch.Tensor
+```
+
+The `Target` class object, as well as the input response 'f_raw' (if exists), can be exported into dictionary format to facilite save to json files and reload for future usage:
+
+```python
+import json
+
+with open('target.json', 'w') as f:
+    target_dict = target.save_state()
+    json.dump(target_dict, f)
+
+with open('target.json', 'r') as f:
+    target_dict = json.load(f)
+    target_reload = Target.load_state(target_dict)
 ```
 
 ## Use campaign object to manage data
 
-The campaign class object serves as a convinient portal to access all components in APO workflow, including data management. 
+The `Campaign` class object acts as the central hub, seamlessly connecting all components within the APO workflow, including data management, optimizer, and experimental designer. 
+It is the recommended approach that offers a more streamlined workflow compared to utilizing each component separately.
+
+
+
+Here is an example of creating a `Campaign` class object and adding the initial dataset to its 'data' attribute:
+
+```python
+from obsidian.campaign import Campaign
+
+data_Iter0 = pd.concat([X0, y0], axis=1)
+my_campaign = Campaign(X_space, target, seed=0)
+my_campaign.add_data(data_Iter0)
+```
+
+The 'add_data' method will append each new batch of data to one pandas dataframe with incremental integer 'Iteration'. The new data should be a dataframe contains both the input experimental conditions and the target outcomes.   
+
+
+There are various ways to retrieve data from `Campaign`:
 
-...
+```python
+print(my_campaign.data.to_markdown())
+```
+
+|   Observation ID |   Temperature |   Concentration |    Enzyme | Variant   | StirRate   |   Yield |   Iteration |
+|-----------------:|--------------:|----------------:|----------:|:----------|:-----------|--------:|------------:|
+|                0 |      13.3333  |         68.3333 | 0.2275    | MRK003    | High       | 47.4471 |           0 |
+|                1 |       6.66667 |        115      | 0.0825    | MRK003    | Low        | 61.3989 |           0 |
+|                2 |      26.6667  |         45      | 0.0341667 | MRK002    | Medium     | 63.6213 |           0 |
+|                3 |      20       |         91.6667 | 0.275833  | MRK001    | Low        | 43.4116 |           0 |
+|                4 |      -6.66667 |         21.6667 | 0.179167  | MRK002    | Medium     | 84.5542 |           0 |
+|                5 |       0       |        138.333  | 0.130833  | MRK001    | High       | 51.8577 |           0 |
+
+and 
+```python
+my_campaign.X
+my_campaign.y
+```