[IDEA] Support for simulators and surrogates in the problem format #155

light-weaver · 2024-08-31T12:55:40Z

What is the current behavior?
The current problem format does not support simulators and surrogates.
Describe the solution you'd like
The problem format should be updated to support simulators and surrogates.
The problem JSON should be updated to store extra information about simulators and surrogates. This can be done by adding
the following fields to the problem JSON, specifically in the objectives array, and a new top-level key called simulators:

problem:{
    //...
    objectives: [
        //...
        {
            // same stuff as before, plus the following
            // The following three entries are all nullable, but at least one should not be null for a valid problem JSON.
            func: stuff, 
            // Analytical functions Now nullable. If this is not null, then the objective is an analytical function.
            simulator_file: Path 
            // path to a python file with the connection to simulators. Also nullable. The python file has certain
            // requirements, listed below.
            surrogates: Array[Path] 
            // An array of paths to models saved on disk. also nullable.
            // Both simulator_file and surrogates can be non-null. E.g. a smart solver can choose to use either a simulator
            // or a surrogate model. A solver can also choose to train new surrogate models and append the paths to this array.
        },
    ],
    simulators: [
        {
            file: Path,
            parameter_options: {
                // parameters to the simulator that are not decision variables, but affect the results.
                // format is similar to decision variables.
            } // nullable
        }
    ]
}

Note that a single simulator or surrogate can return multiple objective function values. In such cases, the simulator_file Path
or the surrogate Paths are repeated for all such objectives.

None of the current parsers will support simulators. A new evaluator must be created that can call the polars parser for
analytic objectives. This evaluator will call the simulator file with the following command (just an example):

subprocess.run(f"{python_interpreter} {file_name} -d {decision_vars} -p {parameters}") # Check how this works.

python_interpreter is discovered during runtime. file_name is stored in the problem JSON (problem.objectives.simulator_file).
decision_vars come from the solver. The parameters are set by the analyst while initializing the evaluator.

Finally, while initializing the evaluator, the analyst can also choose to load pre-trained surrogate models from disk.
The evaluator must provide the following methods:

def evaluate_simulator(self, decision_vars: np.ndarray) -> np.ndarray:
    """
    Evaluate the objectives for the given decision variables using the simulator.

    If there is a mix of (mutually exclusive and exhaustive) analytical and simulator objectives, this method will use
    polars to evaluate the analytical objectives and the simulator file to evaluate the simulator objectives, and 
    return the combined results.

    Parameters
    ----------
    decision_vars : np.ndarray
        The decision variables for which the objectives need to be evaluated. The shape of the array is (n, m), where n is the number of decision variables and m is the number of samples. Note that there is no need to support TensorVariables in
        this evaluator.

    Returns
    -------
    np.ndarray
        The objective values for the given decision variables. The shape of the array is (k, m), where k is the number of objectives and m is the number of samples.
    """

def evaluate_surrogates(self, decision_vars: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
    """
    Evaluate the objectives as well as uncertainity predictions for the given decision variables using the surrogate models.

    If there is a mix of (mutually exclusive and exhaustive) analytical and surrogate objectives, this method will use
    polars to evaluate the analytical objectives and the surrogate models to evaluate the surrogate objectives, and
    return the combined results.

    Parameters
    ----------
    decision_vars : np.ndarray
        The decision variables for which the objectives need to be evaluated. The shape of the array is (n, m), where n is the number of decision variables and m is the number of samples. Note that there is no need to support TensorVariables in
        this evaluator.

    Returns
    ------- 
    Tuple[np.ndarray, np.ndarray]
        The objective values for the given decision variables. The shape of the first array is (k, m), where k is the number of objectives and m is the number of samples. The second array is the uncertainity predictions for the objectives. The shape of the array is (k, m), where k is the number of objectives and m is the number of samples. For objectives that are not surrogates, the uncertainity predictions should be set to 0. For objectives that are surrogates, the uncertainity predictions should be the uncertainity predictions of the surrogate model (see, e.g., sklearn's predict method for gaussian process regressors with return_std=True). For objectives that are surrogates but do not have uncertainity predictions, the uncertainity predictions should be set to np.nan. Maybe a warning should be raised in such cases?
    """

def load_surrogates(self, surrogate_paths: Dict[str, Path]) -> None:
    """
    Load the surrogate models from disk and store them within the evaluator.

    This is used during initialization of the evaluator or when the analyst wants to replace the current surrogate models with other models. However if a new model is trained after initialization of the evaluator, the problem JSON should be updated with the new model paths and the evaluator should be re-initialized. This can happen with any solver that does model management.

    Parameters
    ----------
    surrogate_paths : Dict[str, Path]
        A dictionary where the keys are the names of the objectives and the values are the paths to the surrogate models saved on disk. The names of the objectives should match the names of the objectives in the problem JSON. This Evaluator class
        must support loading popular file formats. Check documentation of popular libraries like sklearn, pytorch, etc. for more information.
    """

def evaluate(self, decision_vars: np.ndarray) -> np.ndarray:
    """
    Evaluate the objectives for the given decision variables using the simulator OR the surrogate models.

    The analyst can choose to use either the simulator or the surrogate models by setting a flag. If simulator is chosen, this
    method will call evaluate_simulator and return the results directly. If surrogate models are chosen, this method will call evaluate_surrogates, and return the results without the uncertainity predictions. The uncertainity predictions can be obtained by calling evaluate_surrogates directly.

    Parameters
    ----------
    decision_vars : np.ndarray
        The decision variables for which the objectives need to be evaluated. The shape of the array is (n, m), where n is the number of decision variables and m is the number of samples. Note that there is no need to support TensorVariables in
        this evaluator.

    Returns
    -------
    np.ndarray
        The objective values for the given decision variables. The shape of the array is (k, m), where k is the number of objectives and m is the number of samples.

What is the motivation/use case for changing the behavior?
The current problem format does not support simulators and surrogates. This feature is essential for many real-world optimization problems.
Describe alternatives you've considered
See Issue [IDEA] Connecting DESDEO to external software, like simulators #85.
Additional context

At least initially, the Evaluator class does not need to support all three cases (analytical, simulator, and surrogate).
Start with supporting only the simulator case. Then add support for the surrogate case. Note that the simulator and surrogate
cases are mutually exclusive (and exhaustive if there are no analytical objectives). For the forseeable future, there is
no need to support analytical objectives in this evaluator. However, if it is trivial to use the current polars evaluator
to add support for analytical objectives, then go ahead and do it.

The simulator file should have the following requirements:

It should be a python file.
When called as a script, it should take the following arguments:
- -d followed by a json string of the decision variables. The json string should be a list of lists. Each inner list should contain the decision variables for a single sample. The length of the inner list should be equal to the number of decision variables.
- -p followed by a json string of the parameters. The json string should be a dictionary of parameters that are not decision variables but affect the results. The keys should be the names of the parameters and the values should be the values of the parameters. The parameters should be set by the analyst while initializing the evaluator.
If called without the required arguments, the script should print a help message with the following information:
- How to use the script.
- The required arguments.
- The format of the arguments.
- Basically the contents of problem.simulators.parameters is also present in the problem JSON file.
- An example of the arguments.
If the simulator runs without any issues, it should print the objective values as a json string. The stdout will be captured by the evaluator and parsed to get the objective values. If there is an error, the stderr should be captured and the error should be raised by the evaluator.
This file should be evaluate all decision variables before returning the objective values. The analyst is responsible for ensuring that the simulator file is correct and that it can handle the decision variables and parameters correctly. The analyst is also responsible for features such as parallelism, etc.

As this involves a lot of boilerplate code, a few templates should be provided in the documentation. The only relevant
template for this issue is a template for connecting a simulator that exists as a binary on the local machine.
In the future, more templates can be added for connecting simulators that are hosted on the cloud, etc. For now, focus on supporting just a single simulator file.

As for the surrogates, for now, the analyst is responsible for training the surrogate models and saving them to disk.
For this issue, focus on supporting just sklearn models with a certain file format. Though this needs to be expanded in the future.

The text was updated successfully, but these errors were encountered:

light-weaver · 2024-09-02T14:04:16Z

Note: constraints can also come from this simulator script (and from surrogates). This can be accommodated by making similar changes to the constraints field in the problem schema as the objectives field.

light-weaver · 2024-09-03T10:07:01Z

In fact, do the same for extra_funcs. Simulators can have intermediate outputs that can be used as "sanity check" variables. It may be good to have them in the problem model to enable storing them for later use

gialmisi · 2024-09-05T12:12:15Z

Added the label 'desdeo2paper'.

light-weaver added the restructuring Label for the DESDEO restructuring project label Aug 31, 2024

light-weaver changed the title ~~[IDEA] Suppo~~ [IDEA] Support for simulators and surrogates in the problem format Aug 31, 2024

light-weaver added the enhancement label Aug 31, 2024

light-weaver assigned light-weaver and Matskuu Aug 31, 2024

gialmisi added the desdeo2paper Issues that need to be resolved before a papaer describing DESDEO 2.0 can be submitted. label Sep 5, 2024

gialmisi mentioned this issue Sep 5, 2024

Implementing a variety of interactive evolutionary multiobjective optimization methods in DESDEO 2.0 #173

Open

7 tasks

Matskuu mentioned this issue Oct 31, 2024

Simulator and surrogate support #185

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IDEA] Support for simulators and surrogates in the problem format #155

[IDEA] Support for simulators and surrogates in the problem format #155

light-weaver commented Aug 31, 2024

light-weaver commented Sep 2, 2024

light-weaver commented Sep 3, 2024

gialmisi commented Sep 5, 2024

[IDEA] Support for simulators and surrogates in the problem format #155

[IDEA] Support for simulators and surrogates in the problem format #155

Comments

light-weaver commented Aug 31, 2024

light-weaver commented Sep 2, 2024

light-weaver commented Sep 3, 2024

gialmisi commented Sep 5, 2024