Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IDEA] Support for simulators and surrogates in the problem format #155

Open
light-weaver opened this issue Aug 31, 2024 · 3 comments
Open
Assignees
Labels
desdeo2paper Issues that need to be resolved before a papaer describing DESDEO 2.0 can be submitted. enhancement restructuring Label for the DESDEO restructuring project

Comments

@light-weaver
Copy link
Member

  • What is the current behavior?
    The current problem format does not support simulators and surrogates.

  • Describe the solution you'd like
    The problem format should be updated to support simulators and surrogates.
    The problem JSON should be updated to store extra information about simulators and surrogates. This can be done by adding
    the following fields to the problem JSON, specifically in the objectives array, and a new top-level key called simulators:

problem:{
    //...
    objectives: [
        //...
        {
            // same stuff as before, plus the following
            // The following three entries are all nullable, but at least one should not be null for a valid problem JSON.
            func: stuff, 
            // Analytical functions Now nullable. If this is not null, then the objective is an analytical function.
            simulator_file: Path 
            // path to a python file with the connection to simulators. Also nullable. The python file has certain
            // requirements, listed below.
            surrogates: Array[Path] 
            // An array of paths to models saved on disk. also nullable.
            // Both simulator_file and surrogates can be non-null. E.g. a smart solver can choose to use either a simulator
            // or a surrogate model. A solver can also choose to train new surrogate models and append the paths to this array.
        },
    ],
    simulators: [
        {
            file: Path,
            parameter_options: {
                // parameters to the simulator that are not decision variables, but affect the results.
                // format is similar to decision variables.
            } // nullable
        }
    ]
}

Note that a single simulator or surrogate can return multiple objective function values. In such cases, the simulator_file Path
or the surrogate Paths are repeated for all such objectives.

None of the current parsers will support simulators. A new evaluator must be created that can call the polars parser for
analytic objectives. This evaluator will call the simulator file with the following command (just an example):

subprocess.run(f"{python_interpreter} {file_name} -d {decision_vars} -p {parameters}") # Check how this works.

python_interpreter is discovered during runtime. file_name is stored in the problem JSON (problem.objectives.simulator_file).
decision_vars come from the solver. The parameters are set by the analyst while initializing the evaluator.

Finally, while initializing the evaluator, the analyst can also choose to load pre-trained surrogate models from disk.
The evaluator must provide the following methods:

def evaluate_simulator(self, decision_vars: np.ndarray) -> np.ndarray:
    """
    Evaluate the objectives for the given decision variables using the simulator.

    If there is a mix of (mutually exclusive and exhaustive) analytical and simulator objectives, this method will use
    polars to evaluate the analytical objectives and the simulator file to evaluate the simulator objectives, and 
    return the combined results.

    Parameters
    ----------
    decision_vars : np.ndarray
        The decision variables for which the objectives need to be evaluated. The shape of the array is (n, m), where n is the number of decision variables and m is the number of samples. Note that there is no need to support TensorVariables in
        this evaluator.

    Returns
    -------
    np.ndarray
        The objective values for the given decision variables. The shape of the array is (k, m), where k is the number of objectives and m is the number of samples.
    """

def evaluate_surrogates(self, decision_vars: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
    """
    Evaluate the objectives as well as uncertainity predictions for the given decision variables using the surrogate models.

    If there is a mix of (mutually exclusive and exhaustive) analytical and surrogate objectives, this method will use
    polars to evaluate the analytical objectives and the surrogate models to evaluate the surrogate objectives, and
    return the combined results.

    Parameters
    ----------
    decision_vars : np.ndarray
        The decision variables for which the objectives need to be evaluated. The shape of the array is (n, m), where n is the number of decision variables and m is the number of samples. Note that there is no need to support TensorVariables in
        this evaluator.

    Returns
    ------- 
    Tuple[np.ndarray, np.ndarray]
        The objective values for the given decision variables. The shape of the first array is (k, m), where k is the number of objectives and m is the number of samples. The second array is the uncertainity predictions for the objectives. The shape of the array is (k, m), where k is the number of objectives and m is the number of samples. For objectives that are not surrogates, the uncertainity predictions should be set to 0. For objectives that are surrogates, the uncertainity predictions should be the uncertainity predictions of the surrogate model (see, e.g., sklearn's predict method for gaussian process regressors with return_std=True). For objectives that are surrogates but do not have uncertainity predictions, the uncertainity predictions should be set to np.nan. Maybe a warning should be raised in such cases?
    """

def load_surrogates(self, surrogate_paths: Dict[str, Path]) -> None:
    """
    Load the surrogate models from disk and store them within the evaluator.

    This is used during initialization of the evaluator or when the analyst wants to replace the current surrogate models with other models. However if a new model is trained after initialization of the evaluator, the problem JSON should be updated with the new model paths and the evaluator should be re-initialized. This can happen with any solver that does model management.

    Parameters
    ----------
    surrogate_paths : Dict[str, Path]
        A dictionary where the keys are the names of the objectives and the values are the paths to the surrogate models saved on disk. The names of the objectives should match the names of the objectives in the problem JSON. This Evaluator class
        must support loading popular file formats. Check documentation of popular libraries like sklearn, pytorch, etc. for more information.
    """

def evaluate(self, decision_vars: np.ndarray) -> np.ndarray:
    """
    Evaluate the objectives for the given decision variables using the simulator OR the surrogate models.

    The analyst can choose to use either the simulator or the surrogate models by setting a flag. If simulator is chosen, this
    method will call evaluate_simulator and return the results directly. If surrogate models are chosen, this method will call evaluate_surrogates, and return the results without the uncertainity predictions. The uncertainity predictions can be obtained by calling evaluate_surrogates directly.

    Parameters
    ----------
    decision_vars : np.ndarray
        The decision variables for which the objectives need to be evaluated. The shape of the array is (n, m), where n is the number of decision variables and m is the number of samples. Note that there is no need to support TensorVariables in
        this evaluator.

    Returns
    -------
    np.ndarray
        The objective values for the given decision variables. The shape of the array is (k, m), where k is the number of objectives and m is the number of samples.
  • What is the motivation/use case for changing the behavior?
    The current problem format does not support simulators and surrogates. This feature is essential for many real-world optimization problems.

  • Describe alternatives you've considered
    See Issue [IDEA] Connecting DESDEO to external software, like simulators #85.

  • Additional context

At least initially, the Evaluator class does not need to support all three cases (analytical, simulator, and surrogate).
Start with supporting only the simulator case. Then add support for the surrogate case. Note that the simulator and surrogate
cases are mutually exclusive (and exhaustive if there are no analytical objectives). For the forseeable future, there is
no need to support analytical objectives in this evaluator. However, if it is trivial to use the current polars evaluator
to add support for analytical objectives, then go ahead and do it.

The simulator file should have the following requirements:

  • It should be a python file.
  • When called as a script, it should take the following arguments:
    • -d followed by a json string of the decision variables. The json string should be a list of lists. Each inner list should contain the decision variables for a single sample. The length of the inner list should be equal to the number of decision variables.
    • -p followed by a json string of the parameters. The json string should be a dictionary of parameters that are not decision variables but affect the results. The keys should be the names of the parameters and the values should be the values of the parameters. The parameters should be set by the analyst while initializing the evaluator.
  • If called without the required arguments, the script should print a help message with the following information:
    • How to use the script.
    • The required arguments.
    • The format of the arguments.
    • Basically the contents of problem.simulators.parameters is also present in the problem JSON file.
    • An example of the arguments.
  • If the simulator runs without any issues, it should print the objective values as a json string. The stdout will be captured by the evaluator and parsed to get the objective values. If there is an error, the stderr should be captured and the error should be raised by the evaluator.
  • This file should be evaluate all decision variables before returning the objective values. The analyst is responsible for ensuring that the simulator file is correct and that it can handle the decision variables and parameters correctly. The analyst is also responsible for features such as parallelism, etc.

As this involves a lot of boilerplate code, a few templates should be provided in the documentation. The only relevant
template for this issue is a template for connecting a simulator that exists as a binary on the local machine.
In the future, more templates can be added for connecting simulators that are hosted on the cloud, etc. For now, focus on supporting just a single simulator file.

As for the surrogates, for now, the analyst is responsible for training the surrogate models and saving them to disk.
For this issue, focus on supporting just sklearn models with a certain file format. Though this needs to be expanded in the future.

@light-weaver light-weaver added the restructuring Label for the DESDEO restructuring project label Aug 31, 2024
@light-weaver light-weaver changed the title [IDEA] Suppo [IDEA] Support for simulators and surrogates in the problem format Aug 31, 2024
@light-weaver
Copy link
Member Author

Note: constraints can also come from this simulator script (and from surrogates). This can be accommodated by making similar changes to the constraints field in the problem schema as the objectives field.

@light-weaver
Copy link
Member Author

In fact, do the same for extra_funcs. Simulators can have intermediate outputs that can be used as "sanity check" variables. It may be good to have them in the problem model to enable storing them for later use

@gialmisi gialmisi added the desdeo2paper Issues that need to be resolved before a papaer describing DESDEO 2.0 can be submitted. label Sep 5, 2024
@gialmisi
Copy link
Contributor

gialmisi commented Sep 5, 2024

Added the label 'desdeo2paper'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
desdeo2paper Issues that need to be resolved before a papaer describing DESDEO 2.0 can be submitted. enhancement restructuring Label for the DESDEO restructuring project
Projects
None yet
Development

No branches or pull requests

3 participants