Releases: awslabs/sockeye
Releases · awslabs/sockeye
2.3.22
[2.3.22]
Fixed
- The previous commit introduced a regression for vocab creation. The results was that the vocabulary was created on the input characters rather than on tokens.
[2.3.21]
Added
- Extended parallelization of data preparation to vocabulary and statistics creation while minimizing the overhead of sharding.
[2.3.20]
Added
- Added debug logging for restrict_lexicon lookups
[2.3.19]
Changed
- When training only the decoder (
--fixed-param-strategy all_except_decoder
), disable autograd for the encoder and embeddings to save memory.
[2.3.18]
Changed
- Updated Docker builds and documentation. See sockeye_contrib/docker.
2.3.17
[2.3.17]
Added
- Added an alternative, faster implementation of greedy search. The '--greedy' flag to
sockeye.translate
will enable it. This implementation does not support hypothesis scores, batch decoding, or lexical constraints."
[2.3.16]
Added
- Added option
--transformer-feed-forward-use-glu
to use Gated Linear Units in transformer feed forward networks (Dauphin et al., 2016; Shazeer, 2020).
[2.3.15]
Changed
- Optimization: Decoder class is now a complete HybridBlock (no forward method).
2.3.14
[2.3.14]
Changed
- Updated to MXNet 1.8.0
- Removed dependency support for Cuda 9.2 (no longer supported by MXNet 1.8).
- Added dependency support for Cuda 11.0 and 11.2.
- Updated Python requirement to 3.7 and later. (Removed backporting
dataclasses
requirement)
[2.3.13]
Added
- Target factors are now also collected for nbest translations (and stored in the JSON output handler).
[2.3.12]
Added
- Added
--config
option toprepare_data
CLI to allow setting commandline flags via a yaml config. - Flags for the
prepare_data
CLI are now stored in the output folder underargs.yaml
(equivalent to the behavior ofsockeye_train
)
[2.3.11]
Added
- Added option
prevent_unk
to avoid generating<unk>
token in beam search.
2.3.10
[2.3.10]
Changed
- Make sure that the top N best params files retained, even if N > --keep-last-params. This ensures that model
averaging will not be crippled when keeping only a few params files during training. This can result in a
significant savings of disk space during training.
[2.3.9]
Added
- Added scripts for processing Sockeye benchmark output (
--output-type benchmark
):- benchmark_to_output.py extracts translations
- benchmark_to_percentiles.py computes percentiles
2.3.8
[2.3.8]
Fixed
- Fix problem identified in issue #925 that caused learning rate
warmup to fail in some instances when doing continued training
[2.3.7]
Changed
- Use dataclass module to simplify Config classes. No functional change.
[2.3.6]
Fixed
- Fixes the problem identified in issue #890, where the lr_scheduler
does not behave as expected when continuing training. The problem is
that the lr_scheduler is kept as part of the optimizer, but the
optimizer is not saved when saving state. Therefore, every time
training is restarted, a new lr_scheduler is created with initial
parameter settings. Fix by saving and restoring the lr_scheduling
separately.
[2.3.5]
Fixed
- Fixed issue with LearningRateSchedulerPlateauReduce.repr printing
out num_not_improved instead of reduce_num_not_improved.
[2.3.4]
Fixed
- Fixed issue with dtype mismatch in beam search when translating with
--dtype float16
.
[2.3.3]
Changed
- Upgraded
SacreBLEU
dependency of Sockeye to a newer version (1.4.14
).
2.3.2
[2.3.2]
Fixed
- Fixed edge case that unintentionally skips softmax for sampling if beam size is 1.
[2.3.1]
Fixed
- Optimizing for BLEU/CHRF with horovod required the secondary workers to also create checkpoint decoders.
[2.3.0]
Added
- Added support for target factors.
If provided with additional target-side tokens/features (token-parallel to the regular target-side) at training time,
the model can now learn to predict these in a multi-task setting. You can provide target factor data similar to source
factors:--target-factors <factor_file1> [<factor_fileN>]
. During training, Sockeye optimizes one loss per factor
in a multi-task setting. The weight of the losses can be controlled by--target-factors-weight
.
At inference, target factors are decoded greedily, they do not participate in beam search.
The predicted factor at each time step is the argmax over its separate output
layer distribution. To receive the target factor predictions at inference time, use
--output-type translation_with_factors
.
Changed
load_model(s)
now returns a list of target vocabs.- Default source factor combination changed to
sum
(wasconcat
before). SockeyeModel
class has three new properties:num_target_factors
,target_factor_configs
,
andfactor_output_layers
.
2.2.8
[2.2.8]
Changed
- Make source/target data parameters required for the scoring CLI to avoid cryptic error messages.
[2.2.7]
Added
- Added an argument to specify the log level of secondary workers. Defaults to ERROR to hide any logs except for exceptions.
[2.2.6]
Fixed
- Avoid a crash due to an edge case when no model improvement has been observed by the time the learning rate gets reduced for the first time.
[2.2.5]
Fixed
- Enforce sentence batching for sockeye score tool, set default batch size to 56
[2.2.4]
Changed
- Use softmax with length in DotAttentionCell.
- Use
contrib.arange_like
in AutoRegressiveBias block to reduce number of ops.
[2.2.3]
Added
- Log the absolute number of
<unk>
tokens in source and target data
[2.2.2]
Fixed
- Fix: Guard against null division for small batch sizes.
[2.2.1]
Fixed
- Fixes a corner case bug by which the beam decoder can wrongly return a best hypothesis with -infinite score.
2.2.0
[2.2.0]
Changed
-
Replaced multi-head attention with interleaved_matmul_encdec operators, which removes previously needed transposes and improves performance.
-
Beam search states and model layers now assume time-major format.
[2.1.26]
Fixed
- Fixes a backwards incompatibility introduced in 2.1.17, which would prevent models trained with prior versions to be used for inference.
[2.1.25]
Changed
- Reverting PR #772 as it causes issues with
amp
.
[2.1.24]
Changed
- Make sure to write a final checkpoint when stopping with
--max-updates
,--max-samples
or--max-num-epochs
.
[2.1.23]
Changed
- Updated to MXNet 1.7.0.
- Re-introduced use of softmax with length parameter in DotAttentionCell (see PR #772).
[2.1.22]
Added
- Re-introduced
--softmax-temperature
flag forsockeye.score
andsockeye.translate
.
2.1.21
[2.1.21]
Added
- Added an optional ability to cache encoder outputs of model.
[2.1.20]
Fixed
- Fixed a bug where the training state object was saved to disk before training metrics were added to it, leading to an inconsistency between the training state object and the metrics file (see #859).
[2.1.19]
Fixed
- When loading a shard in Horovod mode, there is now a check that each non-empty bucket contains enough sentences to cover each worker's slice. If not, the bucket's sentences are replicated to guarantee coverage.
[2.1.18]
Fixed
- Fixed a bug where sampling translation fails because an array is created in the wrong context.
2.1.17
[2.1.17]
Added
-
Added
layers.SSRU
, which implements a Simpler Simple Recurrent Unit as described in
Kim et al, "From Research to Production and Back: Ludicrously Fast Neural Machine Translation" WNGT 2019. -
Added
ssru_transformer
option to--decoder
, which enables the usage of SSRUs as a replacement for the decoder-side self-attention layers.
Changed
- Reduced the number of arguments for
MultiHeadSelfAttention.hybrid_forward()
.
previous_keys
andprevious_values
should now be input together asprevious_states
, a list containing two symbols.