Releases · awslabs/sockeye

30 Sep 09:41

fhieber

2.3.22

a0bc3f0

2.3.22

[2.3.22]

Fixed

The previous commit introduced a regression for vocab creation. The results was that the vocabulary was created on the input characters rather than on tokens.

[2.3.21]

Added

Extended parallelization of data preparation to vocabulary and statistics creation while minimizing the overhead of sharding.

[2.3.20]

Added

Added debug logging for restrict_lexicon lookups

[2.3.19]

Changed

When training only the decoder (--fixed-param-strategy all_except_decoder), disable autograd for the encoder and embeddings to save memory.

[2.3.18]

Changed

Updated Docker builds and documentation. See sockeye_contrib/docker.

Assets 3

17 Jun 10:50

fhieber

2.3.17

ef908e3

2.3.17

[2.3.17]

Added

Added an alternative, faster implementation of greedy search. The '--greedy' flag to sockeye.translate will enable it. This implementation does not support hypothesis scores, batch decoding, or lexical constraints."

[2.3.16]

Added

Added option --transformer-feed-forward-use-glu to use Gated Linear Units in transformer feed forward networks (Dauphin et al., 2016; Shazeer, 2020).

[2.3.15]

Changed

Optimization: Decoder class is now a complete HybridBlock (no forward method).

Assets 2

07 Apr 11:50

fhieber

2.3.14

587eb7f

2.3.14

[2.3.14]

Changed

Updated to MXNet 1.8.0
Removed dependency support for Cuda 9.2 (no longer supported by MXNet 1.8).
Added dependency support for Cuda 11.0 and 11.2.
Updated Python requirement to 3.7 and later. (Removed backporting dataclasses requirement)

[2.3.13]

Added

Target factors are now also collected for nbest translations (and stored in the JSON output handler).

[2.3.12]

Added

Added --config option to prepare_data CLI to allow setting commandline flags via a yaml config.
Flags for the prepare_data CLI are now stored in the output folder under args.yaml
(equivalent to the behavior of sockeye_train)

[2.3.11]

Added

Added option prevent_unk to avoid generating <unk> token in beam search.

Assets 2

08 Feb 10:00

fhieber

2.3.10

c3870e3

2.3.10

[2.3.10]

Changed

Make sure that the top N best params files retained, even if N > --keep-last-params. This ensures that model
averaging will not be crippled when keeping only a few params files during training. This can result in a
significant savings of disk space during training.

[2.3.9]

Added

Added scripts for processing Sockeye benchmark output (--output-type benchmark):
- benchmark_to_output.py extracts translations
- benchmark_to_percentiles.py computes percentiles

Assets 2

08 Jan 08:10

fhieber

2.3.8

b6d8d35

2.3.8

[2.3.8]

Fixed

Fix problem identified in issue #925 that caused learning rate
warmup to fail in some instances when doing continued training

[2.3.7]

Changed

Use dataclass module to simplify Config classes. No functional change.

[2.3.6]

Fixed

Fixes the problem identified in issue #890, where the lr_scheduler
does not behave as expected when continuing training. The problem is
that the lr_scheduler is kept as part of the optimizer, but the
optimizer is not saved when saving state. Therefore, every time
training is restarted, a new lr_scheduler is created with initial
parameter settings. Fix by saving and restoring the lr_scheduling
separately.

[2.3.5]

Fixed

Fixed issue with LearningRateSchedulerPlateauReduce.repr printing
out num_not_improved instead of reduce_num_not_improved.

[2.3.4]

Fixed

Fixed issue with dtype mismatch in beam search when translating with --dtype float16.

[2.3.3]

Changed

Upgraded SacreBLEU dependency of Sockeye to a newer version (1.4.14).

Assets 2

18 Nov 13:41

fhieber

2.3.2

26c02b1

2.3.2

[2.3.2]

Fixed

Fixed edge case that unintentionally skips softmax for sampling if beam size is 1.

[2.3.1]

Fixed

Optimizing for BLEU/CHRF with horovod required the secondary workers to also create checkpoint decoders.

[2.3.0]

Added

Added support for target factors.
If provided with additional target-side tokens/features (token-parallel to the regular target-side) at training time,
the model can now learn to predict these in a multi-task setting. You can provide target factor data similar to source
factors: --target-factors <factor_file1> [<factor_fileN>]. During training, Sockeye optimizes one loss per factor
in a multi-task setting. The weight of the losses can be controlled by --target-factors-weight.
At inference, target factors are decoded greedily, they do not participate in beam search.
The predicted factor at each time step is the argmax over its separate output
layer distribution. To receive the target factor predictions at inference time, use
--output-type translation_with_factors.

Changed

load_model(s) now returns a list of target vocabs.
Default source factor combination changed to sum (was concat before).
SockeyeModel class has three new properties: num_target_factors, target_factor_configs,
and factor_output_layers.

Assets 2

05 Nov 14:06

fhieber

2.2.8

cbe9bff

2.2.8

[2.2.8]

Changed

Make source/target data parameters required for the scoring CLI to avoid cryptic error messages.

[2.2.7]

Added

Added an argument to specify the log level of secondary workers. Defaults to ERROR to hide any logs except for exceptions.

[2.2.6]

Fixed

Avoid a crash due to an edge case when no model improvement has been observed by the time the learning rate gets reduced for the first time.

[2.2.5]

Fixed

Enforce sentence batching for sockeye score tool, set default batch size to 56

[2.2.4]

Changed

Use softmax with length in DotAttentionCell.
Use contrib.arange_like in AutoRegressiveBias block to reduce number of ops.

[2.2.3]

Added

Log the absolute number of <unk> tokens in source and target data

[2.2.2]

Fixed

Fix: Guard against null division for small batch sizes.

[2.2.1]

Fixed

Fixes a corner case bug by which the beam decoder can wrongly return a best hypothesis with -infinite score.

Assets 2

04 Oct 17:22

fhieber

2.2.0

9014405

2.2.0

[2.2.0]

Changed

Replaced multi-head attention with interleaved_matmul_encdec operators, which removes previously needed transposes and improves performance.
Beam search states and model layers now assume time-major format.

[2.1.26]

Fixed

Fixes a backwards incompatibility introduced in 2.1.17, which would prevent models trained with prior versions to be used for inference.

[2.1.25]

Changed

Reverting PR #772 as it causes issues with amp.

[2.1.24]

Changed

Make sure to write a final checkpoint when stopping with --max-updates, --max-samples or --max-num-epochs.

[2.1.23]

Changed

Updated to MXNet 1.7.0.
Re-introduced use of softmax with length parameter in DotAttentionCell (see PR #772).

[2.1.22]

Added

Re-introduced --softmax-temperature flag for sockeye.score and sockeye.translate.

Assets 2

27 Aug 13:31

fhieber

2.1.21

f68a217

2.1.21

[2.1.21]

Added

Added an optional ability to cache encoder outputs of model.

[2.1.20]

Fixed

Fixed a bug where the training state object was saved to disk before training metrics were added to it, leading to an inconsistency between the training state object and the metrics file (see #859).

[2.1.19]

Fixed

When loading a shard in Horovod mode, there is now a check that each non-empty bucket contains enough sentences to cover each worker's slice. If not, the bucket's sentences are replicated to guarantee coverage.

[2.1.18]

Fixed

Fixed a bug where sampling translation fails because an array is created in the wrong context.

Assets 2

20 Aug 18:25

fhieber

2.1.17

92a020a

2.1.17

[2.1.17]

Added

Added layers.SSRU, which implements a Simpler Simple Recurrent Unit as described in
Kim et al, "From Research to Production and Back: Ludicrously Fast Neural Machine Translation" WNGT 2019.
Added ssru_transformer option to --decoder, which enables the usage of SSRUs as a replacement for the decoder-side self-attention layers.

Changed

Reduced the number of arguments for MultiHeadSelfAttention.hybrid_forward().
previous_keys and previous_values should now be input together as previous_states, a list containing two symbols.

Assets 2

Releases: awslabs/sockeye

2.3.22

[2.3.22]

Fixed

[2.3.21]

Added

[2.3.20]

Added

[2.3.19]

Changed

[2.3.18]

Changed

2.3.17

[2.3.17]

Added

[2.3.16]

Added

[2.3.15]

Changed

2.3.14

[2.3.14]

Changed

[2.3.13]

Added

[2.3.12]

Added

[2.3.11]

Added

2.3.10

[2.3.10]

Changed

[2.3.9]

Added

2.3.8

[2.3.8]

Fixed

[2.3.7]

Changed

[2.3.6]

Fixed

[2.3.5]

Fixed

[2.3.4]

Fixed

[2.3.3]

Changed

2.3.2

[2.3.2]

Fixed

[2.3.1]

Fixed

[2.3.0]

Added

Changed

2.2.8

[2.2.8]

Changed

[2.2.7]

Added

[2.2.6]

Fixed

[2.2.5]

Fixed

[2.2.4]

Changed

[2.2.3]

Added

[2.2.2]

Fixed

[2.2.1]

Fixed

2.2.0

[2.2.0]

Changed

[2.1.26]

Fixed

[2.1.25]

Changed

[2.1.24]

Changed