Problems related to the creation of a new chemical inverse synthesis planning environment in the LightZero framework #317

shushushulian · 2025-01-09T09:46:19Z

I am trying to translate a molecular inverse synthesis planning problem into the LightZero environment.First of all, the problem is overviewed as follows: starting from the target molecule $S_0$, multiple reaction rules $[a_1,a_2,..., a_n]$ can be obtained to convert the target molecule into precursor molecules (one reaction rule may get one precursor molecule or more than one) by using a specialized chemical model B. It is judged for the obtained precursor molecules whether they can be purchased to get the molecules or not available using model B to get the reaction rule. The termination condition is that the target molecule finds a molecule that can be purchased through a series of reactions or reaches the maximum depth of search. ==It would also be nice for the model to output a few more reaction paths for reference.==
If this problem is transformed into a LightZero framework compatible environment can be solved using AlphaZero, Muzero, how should this problem be modeled by MDP, the difficulty is that in a Step a reaction rule $a_1$ is chosen, two molecules $S_1$ and $S_2$ may be obtained, if you want to get the synthesis path you must $S_1$ and $S_2$ both decomposed into molecules that can be purchased.
In response to this question, how do you think my environment should be designed to give a better fit with the LightZero framework. I look forward to discussing this with you, thank you!

puyuan1996 · 2025-01-13T15:25:41Z

To adapt the molecular retrosynthesis planning problem into an environment compatible with the LightZero framework and leverage AlphaZero or MuZero to solve it, the design can be approached as follows:

1. Modeling the Problem as an MDP

To transform the retrosynthesis planning problem into a Markov Decision Process (MDP), the following key elements need to be defined:

(1) State

Definition: A state represents the current set of molecules to be decomposed. For example:
- The initial state is the target molecule S₀.
- Subsequent states may consist of a set of molecules {S₁, S₂, ..., Sm}.
Characteristics:
- The state can be represented as a tree structure, where the root node is the target molecule, and the leaf nodes are the precursor molecules obtained through decomposition.
- Each molecule is represented using a feature vector (e.g., molecular fingerprints, molecular graphs) to serve as input to the model.

(2) Action

Definition: An action corresponds to selecting a reaction rule aᵢ. This rule is applied to a molecule Sⱼ, decomposing it into a set of precursor molecules {P₁, P₂, ..., Pₖ}.
Action Space:
- The action space is dynamic and varies with the state, as the applicable reaction rules depend on the specific molecule being decomposed.
- A chemical model B can be used to generate a list of available reaction rules [a₁, a₂, ..., aₙ] for the current molecule.

(3) Reward

Definition: The reward function evaluates the quality of a decomposition path.
- A positive reward is given when all molecules in the state can be purchased or when a successful termination condition is achieved.
- A negative reward is assigned when the search reaches the maximum depth or produces irreducible molecules.
- The reward can also incorporate pathway costs, such as the number of reaction steps or the economic cost.

(4) State Transition

Definition: State transitions occur when a reaction rule is applied, updating the current molecule set.
- For the current state {S₁, S₂, ..., Sm}, selecting a molecule Sⱼ and applying a reaction rule aᵢ leads to a new state:
```
{S₁, ..., Sⱼ₋₁, P₁, P₂, ..., Pₖ, Sⱼ₊₁, ..., Sm}
```

(5) Termination Condition

The process terminates when:
- All molecules in the set can be purchased (successful termination).
- The search reaches the maximum depth, or the decomposition fails (unsuccessful termination).

2. Designing the Environment for the LightZero Framework

The LightZero framework supports training with AlphaZero and MuZero. To integrate the retrosynthesis problem, the MDP must be implemented as an interactive environment. Below are the design suggestions:

(1) Environment Interface

The environment should implement LightZero's standard interface（you can start from this env）, with the following core functions:

reset(): Initializes the environment and returns the initial state (the target molecule S₀).
step(action): Executes an action, returning the new state, reward, termination flag, and additional information.
render(): Visualizes the current state (e.g., the decomposition tree).
get_legal_actions(state): Returns the list of valid actions for the current state.

(2) State Representation

Use Graph Neural Networks (GNNs) or other molecular feature extraction methods to represent molecules as feature vectors.
The state can include the entire decomposition tree, tracking both the current molecule set and the decomposition history.

(3) Dynamic Action Space

The action space is dynamic and varies with the state. For each molecule, the chemical model B generates a list of applicable reaction rules.
Actions can be encoded as tuples (molecule index, reaction rule index).
To reduce computational overhead:
- Cache the outputs of the chemical model B to avoid redundant calculations.
- Use pretrained molecular models to predict valid reaction rules efficiently.

(4) State Transition Logic

When a reaction rule is applied, the chemical model B generates precursor molecules, and the state is updated accordingly.
If the precursor molecules are known to be purchasable, they are marked as "leaf nodes."
Potential challenges:
- State-space explosion: The depth and breadth of the search tree can grow rapidly.
- Solutions:
  - Limit the maximum search depth and terminate the search if the limit is reached.
  - Introduce heuristic methods (e.g., prioritizing high-probability reaction rules) to reduce invalid searches.

(5) Reward Design

The reward function should balance pathway efficiency and search cost. For example:
- Assign a positive reward (+R) if all molecules can be purchased.
- Assign a negative reward (-R) if the maximum search depth is reached without success.
- Penalize each reaction step with a small negative reward (e.g., -0.1) to encourage shorter pathways.
Additional considerations:
- In some cases, decomposing a molecule generates multiple new molecules (e.g., S₁ and S₂), and the solution depends on successfully decomposing all of them.
- Solutions:
  - Use a decomposition tree to track the status of each molecule.
  - Process molecules iteratively or recursively until all are decomposed into purchasable molecules or the termination condition is met.

3. Generating Multiple Reference Pathways

The LightZero framework can generate multiple reference pathways by:

Running the Monte Carlo Tree Search (MCTS) multiple times during inference and recording different search results.
Scoring each pathway (e.g., based on pathway length or decomposition quality) and returning the top-ranked pathways as references.

Feel free to ask any detailed questions about adapting the environment and integrating algorithms in LightZero. We are eagerly looking forward to your contributions!（Part of the answer was assisted by chatgpt-4o.）

shushushulian · 2025-01-15T07:19:29Z

Thank you very much for your enthusiastic answers! I am in the process of integrating my environment into the LightZero framework.LightZero is really a clean, efficient and very useful algorithmic framework that enhances my learning. When I am done I will be happy to share this environment with you.

puyuan1996 added the discussion Discussion of a typical issue or concept label Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems related to the creation of a new chemical inverse synthesis planning environment in the LightZero framework #317

Problems related to the creation of a new chemical inverse synthesis planning environment in the LightZero framework #317

shushushulian commented Jan 9, 2025

puyuan1996 commented Jan 13, 2025 •

edited

Loading

shushushulian commented Jan 15, 2025

Problems related to the creation of a new chemical inverse synthesis planning environment in the LightZero framework #317

Problems related to the creation of a new chemical inverse synthesis planning environment in the LightZero framework #317

Comments

shushushulian commented Jan 9, 2025

puyuan1996 commented Jan 13, 2025 • edited Loading

1. Modeling the Problem as an MDP

(1) State

(2) Action

(3) Reward

(4) State Transition

(5) Termination Condition

2. Designing the Environment for the LightZero Framework

(1) Environment Interface

(2) State Representation

(3) Dynamic Action Space

(4) State Transition Logic

(5) Reward Design

3. Generating Multiple Reference Pathways

shushushulian commented Jan 15, 2025

puyuan1996 commented Jan 13, 2025 •

edited

Loading