Skip to content

Commit

Permalink
finish wiki/3_Data.md
Browse files Browse the repository at this point in the history
  • Loading branch information
xuyuting committed Aug 14, 2024
1 parent b1c0202 commit 574702d
Showing 1 changed file with 121 additions and 16 deletions.
137 changes: 121 additions & 16 deletions docs/wiki/3_Data.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

## Experimental design space $X_{space}$


### Basic Syntax

Each of the input varible is defined according to the variable type and domain.
Expand Down Expand Up @@ -32,13 +31,22 @@ from obsidian import ParamSpace
X_space = ParamSpace(params)
```

The `ParamSpace` class object can be exported into dictionary format to facilite save and reload for future usage:
The `ParamSpace` class object can be exported into dictionary format to facilite save to json files and reload for future usage:

```python
X_space_dict = X_space.save_state()
X_space_reload = ParamSpace.load_state(X_space_dict)
import json

with open('X_space.json', 'w') as f:
X_space_dict = X_space.save_state()
json.dump(X_space_dict, f)

with open('X_space.json', 'r') as f:
X_space_dict = json.load(f)
X_space_reload = ParamSpace.load_state(X_space_dict)
```

In addition, the `ParamSpace` class contains various instance methods for input variable transformation, which are implicitly called during the optimization but no need for direct access by the user.

### Additional Variable Types

* Continuous observatioal variable
Expand All @@ -51,7 +59,6 @@ X_space_reload = ParamSpace.load_state(X_space_dict)
Param_Observational(name = 'Time', min = 0, max = 10, design_point = 6)
```


* Discrete numerical variable

```python
Expand All @@ -61,21 +68,28 @@ X_space_reload = ParamSpace.load_state(X_space_dict)

* Task variable

...
Only one special 'task' categorical variable is allowed for encoding multiple tasks.
Distinct response will be predicted for each task.

```python
from obsidian.parameters import Task
Task('TaskVar', ['Task_A', 'Task_B', 'Task_C', 'Task_D'])
```

## Initial experimental conditions, or seed experiments $X_0$

When we start the APO workflow from scratch, the initial experimental conditions are usually generated by random sampling or design-of-experiments algorithms.

For example, generate six input conditions $X_0$ according to previously specified $X_{space}$ using Latin hypercube sampling (LHS) method:

```python
from obsidian.experiment import ExpDesigner

designer = ExpDesigner(X_space, seed = 0)
X0 = designer.initialize(m_initial = 6, method='LHS')
print(X0.to_markdown())
```


| | Temperature | Concentration | Enzyme | Variant | StirRate |
|---:|--------------:|----------------:|----------:|:----------|:-----------|
| 0 | 13.3333 | 68.3333 | 0.2275 | MRK003 | High |
Expand All @@ -92,23 +106,114 @@ The `designer` returns experimental conditions as a pandas dataframe, which is t

## Experimental outcome variable(s) $Y$

...
### Basic Syntax

Similar to the `ParamSpace` object for input variables, there is `Target` class object which handles the specification and preprocessing for experimental outcome variables.

For each outcome measurement, there are three essential arguments to be specified:
* name: Variable name, which is a required input by user
* f_transform: Transformation function for preprocessing the raw response values, to facilitate the numerical computations during optimization.
- 'Identity': (default) No transformation
- 'Standard': Normalization into zero mean and unit standard deviation
- 'Logit_MinMax': Logit transofrmation with the range or scale automatically calculated based on data
- 'Logit_Percentage': Assuming input response is a percentage ranging between 0 to 100, apply logit transofrmation with scale 1/100.
* aim: Either 'max'(default) or 'min', which specifies the desirable direction for improvement. Currently it only handles continuous outcome values.


Depend on the number of outcomes, define one `Target` object or a list of multiple objects:

```python
from obsidian import Target
target = Target('Yield', aim='max')

target = Target(name = 'Yield', f_transform = 'Logit_Percentage', aim='max')

target_multiple = [
Target(name = 'Yield', f_transform = 'Logit_Percentage', aim='max'),
Target(name = 'Cost', f_transform = 'Standard', aim='min')
]
```

### Example

To demonstrate the usage of `Target` class, we simulate a single task experimental outcome $y_0$ using the previously generated $X_0$ and an analytical function 'shifted_parab'.

```python
from obsidian import Target
target = [
Target('Yield', aim='max'),
Target('Cost', aim='min')
]
from obsidian.experiment import Simulator
from obsidian.experiment.benchmark import shifted_parab

simulator = Simulator(X_space, shifted_parab, name='Yield')
y0 = simulator.simulate(X0)
print(y0.to_markdown())
```

| | Yield |
|---:|--------:|
| 0 | 47.8147 |
| 1 | 62.5599 |
| 2 | 60.7972 |
| 3 | 39.1121 |
| 4 | 83.0833 |
| 5 | 52.2631 |

If manually input $y_0$, it should be a pandas dataframe with the same variable name 'Yield' as specifed in the `target` definition.

When the 'transform_f' function is called with 'fit=True' during the optimization workflow, the raw response will be saved as an attribute to `target` object
```python
y_transformed = target.transform_f(y0, fit = True)
type(target.f_raw) # torch.Tensor
```

The `Target` class object, as well as the input response 'f_raw' (if exists), can be exported into dictionary format to facilite save to json files and reload for future usage:

```python
import json

with open('target.json', 'w') as f:
target_dict = target.save_state()
json.dump(target_dict, f)

with open('target.json', 'r') as f:
target_dict = json.load(f)
target_reload = Target.load_state(target_dict)
```

## Use campaign object to manage data

The campaign class object serves as a convinient portal to access all components in APO workflow, including data management.
The `Campaign` class object acts as the central hub, seamlessly connecting all components within the APO workflow, including data management, optimizer, and experimental designer.
It is the recommended approach that offers a more streamlined workflow compared to utilizing each component separately.



Here is an example of creating a `Campaign` class object and adding the initial dataset to its 'data' attribute:

```python
from obsidian.campaign import Campaign

data_Iter0 = pd.concat([X0, y0], axis=1)
my_campaign = Campaign(X_space, target, seed=0)
my_campaign.add_data(data_Iter0)
```

The 'add_data' method will append each new batch of data to one pandas dataframe with incremental integer 'Iteration'. The new data should be a dataframe contains both the input experimental conditions and the target outcomes.


There are various ways to retrieve data from `Campaign`:

...
```python
print(my_campaign.data.to_markdown())
```

| Observation ID | Temperature | Concentration | Enzyme | Variant | StirRate | Yield | Iteration |
|-----------------:|--------------:|----------------:|----------:|:----------|:-----------|--------:|------------:|
| 0 | 13.3333 | 68.3333 | 0.2275 | MRK003 | High | 47.4471 | 0 |
| 1 | 6.66667 | 115 | 0.0825 | MRK003 | Low | 61.3989 | 0 |
| 2 | 26.6667 | 45 | 0.0341667 | MRK002 | Medium | 63.6213 | 0 |
| 3 | 20 | 91.6667 | 0.275833 | MRK001 | Low | 43.4116 | 0 |
| 4 | -6.66667 | 21.6667 | 0.179167 | MRK002 | Medium | 84.5542 | 0 |
| 5 | 0 | 138.333 | 0.130833 | MRK001 | High | 51.8577 | 0 |

and
```python
my_campaign.X
my_campaign.y
```

0 comments on commit 574702d

Please sign in to comment.