Skip to content

Commit

Permalink
Adjust section Exploring the Scrabble environment
Browse files Browse the repository at this point in the history
  • Loading branch information
alexhernandezgarcia committed Jun 25, 2024
1 parent 4707510 commit 1c43251
Showing 1 changed file with 59 additions and 40 deletions.
99 changes: 59 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,79 +109,98 @@ python main.py +experiments=grid/corners logger.do.online=True

Finally, also note that by default, PyTorch will operate on the CPU because we have not observed performance improvements by running on the GPU. You may run on GPU with `device=cuda`.

## Exploring the Scrabble Environment
## Exploring the Scrabble environment

To better understand the GFlowNet components, let us explore the Scrabble environment in more detail below.
To better understand the functionality and implementation of GFlowNet environments, let us explore the Scrabble environment in more detail.

When initializing any GFlowNet agent, it's useful to explore the properties of the environment. The library offers various functionalities for this purpose. Below are some detailed examples, among others:
1. Instantiating a Scrabble environment

1. Checking the Initial State
```python
from gflownet.envs.scrabble import Scrabble
env = Scrabble()
```

2. Checking the initial (source) state

You can observe the initial state of the environment. For Scrabble environment, this would be an empty board or sequence:
Every environment has a `state` attribute, which gets updated as actions are performed. The initial state correspond to the `source` state:

```python
env.state
>>> [0, 0, 0, 0, 0, 0, 0]
env.equal(env.state, env.source)
>>> True
```

2. Exploring the Action Space
In the Scrabble environment, the state is represented by a list of letter indices, padded by 0's up to the maximum word length (7 by default).

3. Checking the action space

The actions of every environment are represented by tuples, and the set of all possible actions makes the action space:

```python
env.get_action_space()
env.action_space
>>> [(1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,), (11,), (12,), (13,), (14,), (15,), (16,), (17,), (18,), (19,), (20,), (21,), (22,), (23,), (24,), (25,), (26,), (-1,)]
```
For Scrabble environment, the action space is all english alphabet letters indexed from 1 to 26. The action (-1,) represents the end-of-sequence (EOS) action, indicating the termination of word formation.

3. Taking a Random Step
In the Scrabble environment, the actions to append a letter from the English alphabet is represented by a single-element tuple with the letter index, from 1 to 26. The action space also contains (-1,) which represents the end-of-sequence (EOS) action, indicating the termination of word formation.

```python
new_state, action_taken, valid = env.step_random()
print("New State:", new_state)
print("Action Taken:", action_taken)
print("Action Valid:", valid)

>>> New State: [24, 0, 0, 0, 0, 0, 0]
>>> Action Taken: (24,)
>>> Action Valid: True
env.eos
>>> (-1,)
```

This function randomly selects a valid action (adding a letter or ending the sequence) and applies it to the environment. The output shows the new state, the action taken, and whether the action was valid.
4. Performing a step

4. Performing a Specific Action
We can apply one action from the action space to perform a state transition via the `step()` method:

```python
action = (1,) # Action to add 'A'
new_state, performed_action, is_valid = env.step(action)
print("Updated State:", new_state)
print("Performed Action:", performed_action)
print("Was the Action Valid:", is_valid)
>>> Updated State: [24, 1, 0, 0, 0, 0, 0]
>>> Performed Action: (1,)
>>> Was the Action Valid: True
print("Updated state:", new_state)
print("Performed action:", performed_action)
print("Action was valid:", is_valid)
>>> Updated state: [1, 0, 0, 0, 0, 0, 0]
>>> Performed action: (1,)
>>> Action was valid: True
env.equal(env.state, new_state)
>>> True
```

5. Displaying the State as a human readable
This function randomly selects a valid action (adding a letter or ending the sequence) and applies it to the environment. The output shows the new state, the action taken, and whether the action was valid.

5. Performing a random step

We can also use the method `step_random()` to perform a randomly sampled action:

```python
env.state2readable(env.state)
>>> 'X A'
new_state, performed_action, is_valid = env.step_random()
print("Updated state:", new_state)
print("Performed action:", performed_action)
print("Action was valid:", is_valid)
>>> Updated state: [1, 24, 0, 0, 0, 0, 0]
>>> Performed action: (24,)
>>> Action was valid: True
```

6. Interpreting Actions as a human readable
6. Unfolding a full random trajectory

Similarly, we can also unfold a complete random trajectory, that is a sequence of actions terminated by the EOS action:

```python
print("Action Meaning:", env.idx2token[action[0]])
>>> Action Meaning: A
final_state, trajectory_actions = env.trajectory_random()
print("Final state:", final_state)
print("Sequence of actions:", trajectory_actions)
print("Trajectory is done:", env.done)
>>> Final state: [1, 24, 10, 6, 4, 21, 21]
>>> Sequence of actions: [(1,), (24,), (10,), (6,), (4,), (21,), (21,), (-1,)]
>>> Trajectory is done: True
```

7. Sampling a Random Trajectory
7. Displaying the state as a human readable string

```python
new_state, action_sequence = env.trajectory_random()
print("New State:", new_state)
print("Action Sequence:" action_sequence)

>>> New State: [16, 16, 17, 20, 11, 16, 0]
>>> Action Sequence: [(16,), (16,), (17,), (20,), (11,), (16,), (-1,)]
env.state2readable()
>>> 'A X J F D U U'
```

8. Reset enviroment
Expand All @@ -192,9 +211,9 @@ env.state
>>> [0, 0, 0, 0, 0, 0, 0]
```

So far, we've discussed how to manually set actions or use random actions in the GFlowNet environment. This approach is useful for testing or understanding the basic mechanics of the environment. However, in practice, the goal of a GFlowNet agent is to learn from its experiences to take increasingly effective actions that are driven by a learned policy.
So far, we've seen how to manually set actions or use random actions in the GFlowNet environment. This approach is useful for testing or understanding the basic mechanics of the environment. However, in practice, the goal of a GFlowNet agent is to adjust the parameters of the policy model to sample actions that result in trajectories with likelihoods proportional to the reward.

As the agent interacts with the environment, it collects data about the outcomes of its actions. This data is used to train a policy network, which models the probability distribution of possible actions given the current state. Over time, the policy network learns to favor actions that lead to more successful outcomes with higher reward, optimizing the agent's performance.
As the agent interacts with the environment, it collects data about the outcomes of its actions. This data is used to train the policy networks, which model the probability of state transitions given the current state.

9. Sample a batch of trajectories from a trained agent

Expand Down

0 comments on commit 1c43251

Please sign in to comment.