Skip to content

Commit

Permalink
Fix logic with training resumption (#404)
Browse files Browse the repository at this point in the history
* Fix logic with training resumption

* fix
  • Loading branch information
fhieber authored May 19, 2018
1 parent 948e4ec commit 3f8cb0b
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 7 deletions.
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ Note that Sockeye has checks in place to not translate with an old model that wa
Each version section may have have subsections for: _Added_, _Changed_, _Removed_, _Deprecated_, and _Fixed_.

## [1.18.13]

### Fixed
- Fixed two bugs with training resumption:
1. removed overly strict assertion in the data iterator for model states before the first checkpoint.
2. removed deletion of Tensorboard log directory.

### Added
- Added support for config files. Command line parameters have precedence over the values read from the config file.
Minimal working example:
Expand All @@ -22,7 +28,6 @@ Each version section may have have subsections for: _Added_, _Changed_, _Removed
validation_source: valid.source.txt
validation_target: valid.target.txt
```
### Changed
The full set of arguments is serialized to `out/args.yaml` at the beginning of training (before json was used).

Expand Down
3 changes: 0 additions & 3 deletions sockeye/data_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -1537,9 +1537,6 @@ def load_state(self, fname: str):
inverse_data_permutations = np.load(fp)
data_permutations = np.load(fp)

# Because of how checkpointing is done (pre-fetching the next batch in
# each iteration), curr_idx should always be >= 1
assert self.curr_batch_index >= 1
# Right after loading the iterator state, next() should be called
self.curr_batch_index -= 1

Expand Down
3 changes: 0 additions & 3 deletions sockeye/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -1017,9 +1017,6 @@ def __init__(self,
try:
import mxboard
logger.info("Logging training events for Tensorboard at '%s'", self.logdir)
if os.path.exists(self.logdir):
logger.info("Deleting existing Tensorboard log directory '%s'", self.logdir)
shutil.rmtree(self.logdir)
self.sw = mxboard.SummaryWriter(logdir=self.logdir, flush_secs=60, verbose=False)
except ImportError:
logger.info("mxboard not found. Consider 'pip install mxboard' to log events to Tensorboard.")
Expand Down

0 comments on commit 3f8cb0b

Please sign in to comment.