Skip to content

Experiments and plotting with TensorBoard

idvhfd edited this page Jul 26, 2018 · 1 revision

Experiments

What are they?

In Marlo, when we use the term "experiment" we refer to a suite of operations to be performed using a given agent/environment combination, using a given set of parameters, in an automatic fashion, with various purposes, such as training, evaluating or testing agents. A set of training-related experiments will be provided with Marlo, featuring simple episodic runs of a given environment with a given agent. Using these built-in experiments is entirely optional; the way these work will be briefly touched on below - you are more than welcome and in fact encouraged to provide your own experiments and routines for your agents.

Usage

Using our experiments is simple: you simply need to import the experiments folder and then call one of the training functions:

import marlo
from marlo import experiments

Using an experiment (excerpt from guide on PPO agent):

# Start training/evaluation
experiments.train_agent_with_evaluation(
   agent=agent,
   env=env,
   eval_env=env,
   outdir=outdir,
   steps=steps,
   eval_n_runs=eval_n_runs,
   eval_interval=eval_interval,
   max_episode_len=timestep_limit,
   step_hooks=[
   	lr_decay_hook,
   	clip_eps_decay_hook,
   ],
)

How it works

The main function used in our train_agent experiment set is, unsurprisingly, the train_agent function. This takes a set of parameters: the agent, environment, maximum number of steps, results directory, maximum episodes length, step offset, evaluator to be used in evaluation (if any is provided), score to measure success, hooks to be used per step (if any provided, such as is the case for PPO0, maximum number of resets, and a logger object (if any exists).

The underlying functionality is simple. At the beginning of a training, the environment is reset:

    obs = env.reset()
    num_resets -= 1
    r = 0
    t = step_offset
    if hasattr(agent, 't'):
        agent.t = step_offset

    episode_len = 0

Following this, a main experiment loop is entered, which runs the experiment until an exit condition is met:

        while t < steps or num_resets > 0:
            # Use the agent's act_and_train function
            action = agent.act_and_train(obs, r)

            # Step the environment and save the returned values
            obs, r, done, info = env.step(action)

            # Advance the timestep and the episode rewards
            t += 1
            episode_r += r
            episode_len += 1

            # Stopping conditions, calling evaluation where necessary, field resetting...
            # ...

If the environment returns that the episode must be stopped, if the episode is too long or if the max number of steps has been reached, if an evaluator is present, then an evaluation is performed and everything is logged in the logging directory:

if evaluator is not None and num_resets > evaluator.n_runs:
    evaluated, score = evaluator.evaluate_if_necessary(t=t, episodes=episode_idx + 1)

This script is truncated for space preservation reasons; its full length can be found in the source code.

Tensorboard

For the purpose of Marlo, we decided to use Tensorboard for plotting and displaying data related to the agent's training. However, Tensorboard is a tool designed for use with Tensorflow, meaning it would not be very straightforward to use in a ChainerRL environment such as ours. Turns out, it is in fact rather straightforward!

For the purposes of this, we are using neka-nat's Tensorboard-Chainer, which is an extension of TensorboardX, which is in turn a multi-framework extension of Tensorboard.

Using Tensorboard-Chainer is simple; you first have to import the required classes from the library, then initialize your logger:

from tb_chainer import utils, SummaryWriter

timestr = time.strftime("%Y%m%d-%H%M%S")
agentClassName = agent.__class__.__name__[:10]
writer = SummaryWriter(r"tensorboard/tensorBoard_exp_"+timestr+"_"+agentClassName)

In this case, we've used a SummaryWriter with a directory name as the parameter. This bit of code is excerpt from the above experiments, which leads us to our next point.

How do WE use Tensorboard for you?

Why, exactly as stated above!

Our basic experiments come shipped with a very simple implementation of Tensorboard-Chainer. At the time of the writing of this article, the train_agent experiment provides a scalar graph comprised of the final rewards per each episode as well as a graph comprised of the cumulated rewards of each episode. This is very simple to implement in your own custom experiments, and looks a little something like this: writer.add_scalar('last reward in episode', r, t) and writer.add_scalar('reward', episode_r, t).

Yes, really - it's that straightforward

How can YOU use Tensorboard?

On top of the exemplified scalars above, you can also:

  • Add an image: writer.add_image('Image', x, t)
  • Add text: writer.add_text('Text', 'text logged at step:'+str(t), t)
  • Add audio: writer.add_audio('Audio', x, t)
  • Add a histogram: writer.add_histogram(name, chainer.cuda.to_cpu(param.data), t)

But the main use of this library we thought you'd be interested in is drawing graphs with named entities. For the sake of brevity, we will not paste the code for that here, however neka-nat provides a good example of named layers under a named scope in their project repository, which we recommend to follow should you be interested in pursuing this further.