Add support to dropout. #29

aliciafmachado · 2024-09-08T15:56:16Z

Add dropout.
Add basic tests for dropout and trainer with dropout enabled.
Add example in app.
Add evalMode flag so that dropout can be disabled during eval.

Intended to resolve Issue: #1

…ble dropout during evaluation.

aliciafmachado

There are some commits going back and forth on some things that I did not fully understood at first, so feel free to squash the commits before merging to main to avoid confusion. Otherwise I can recreate the pull request and fix the commit history.

I also have a few questions / discussion topics:

I added support to dropout but we need something to manage random seeds so that we can seed properly. Should we create an issue for that?
I tried to add dropout based on T5 architecture, but I decided to not add it after the FF layer and in the output. For the FF, I don't think it makes sense since we have a single layer and we apply dropout before the residual connection after the FF network. For the output, I don't see any additional computations after getting out of the stack, so I think it would only increase noise if we were to add another dropout there (I also did not see an additional dropout on the output for the haiku implementation linked in the issue to add dropout).

iislucas

Looks great, a few small things.

iislucas · 2024-09-12T08:58:05Z

animated-transformer/src/lib/transformer/transformer_gtensor.ts

@@ -225,13 +229,20 @@ function gelu(x: tf.Tensor) {
 export function computeAttnHead(
  spec: AttnHeadComputeSpec,
  params: AttnHeadParams<TensorKind>,
-  seqInput: GTensor<'batch' | 'pos' | 'inputRep'>
+  seqInput: GTensor<'batch' | 'pos' | 'inputRep'>,
+  evalMode: boolean = false


lets drop evalMode flag, and just depend on the spec having dropoutRate set different at eval vs inference time.

iislucas · 2024-09-12T08:59:01Z

animated-transformer/src/lib/transformer/dropout.ts

+  export function dropout<G extends string, D extends G>(
+    dropoutRate: number,
+    g: GTensor<G>,
+    deterministic: boolean,


Lets remove deterministic, and just check if rate is 0.

iislucas · 2024-09-12T09:15:38Z

animated-transformer/src/lib/transformer/transformer_gtensor.ts

  let unNormedSeqOuput = inputToFF
    .contract(ff.w, ['inputRepToFF'])
    .pointwiseAdd(ff.bIn)
    .applyPointWiseTfFn(gelu)
    .pointwiseAdd(ff.bOut);
+
+  // Dropout before layer norm and residual connection.
+  let unNormedSeqOuputAfterDropout = unNormedSeqOuput;


Lets use this: https://github.com/Shivanandroy/simpleT5 as the reference for where to put it for T5. And maybe name this function computeT5AttnHead, and then later we can make a gpt2 one.

I suggest we use the T5 implementation in Jax (T5X): https://github.com/google-research/t5x/blob/705247b743d26a33d0c058b41c72ad030e51891b/t5x/examples/t5/network.py#L222

iislucas · 2024-09-12T09:17:48Z

animated-transformer/src/lib/trainer/basic_transformer_trainer.spec.ts

+    const layerSpec: transformer.TransformerParamLayerSpec = {
+      nHeads: 1,
+      hasPosEncoding: true,
+      computeSpec: { residuals: true, dropoutRate: 0.1 },


Maybe add one test also for dropout rate of 1, and then test that loss doesn't decrease.

…to follow the T5X implementation.

aliciafmachado · 2024-10-06T16:53:11Z

Will rebase once #36 is submitted and then pass a generator so that the dropout is reproducible, and then you can take a second look @iislucas.

aliciafmachado added 5 commits September 1, 2024 18:31

Add dropout skeleton.

d8ea0d8

Add test for trainer when there is a Dropout layer.

53f72f7

Make dropout a spec and not a parameter, and add a test.

07a1711

Add flag to disable dropout during evaluation.

70d0040

Add more tests to dropout and pass flag to computeTransformer to disa…

cf25062

…ble dropout during evaluation.

aliciafmachado changed the title ~~Add more tests to dropout and pass flag to computeTransformer to disable dropout during evaluation.~~ Add support to dropout. Sep 8, 2024

aliciafmachado commented Sep 8, 2024

View reviewed changes

aliciafmachado marked this pull request as ready for review September 8, 2024 16:06

iislucas reviewed Sep 13, 2024

View reviewed changes

iislucas assigned aliciafmachado Sep 18, 2024

Improve dropout setup, and fix where the dropout is used in the code …

fa9b3f6

…to follow the T5X implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to dropout. #29

Add support to dropout. #29

aliciafmachado commented Sep 8, 2024 •

edited by iislucas

Loading

aliciafmachado left a comment

iislucas left a comment

iislucas Sep 12, 2024

iislucas Sep 12, 2024

iislucas Sep 12, 2024

aliciafmachado Sep 14, 2024

iislucas Sep 17, 2024

iislucas Sep 12, 2024 •

edited

Loading

aliciafmachado Sep 14, 2024

aliciafmachado commented Oct 6, 2024

Add support to dropout. #29

Are you sure you want to change the base?

Add support to dropout. #29

Conversation

aliciafmachado commented Sep 8, 2024 • edited by iislucas Loading

aliciafmachado left a comment

Choose a reason for hiding this comment

iislucas left a comment

Choose a reason for hiding this comment

iislucas Sep 12, 2024

Choose a reason for hiding this comment

iislucas Sep 12, 2024

Choose a reason for hiding this comment

iislucas Sep 12, 2024

Choose a reason for hiding this comment

aliciafmachado Sep 14, 2024

Choose a reason for hiding this comment

iislucas Sep 17, 2024

Choose a reason for hiding this comment

iislucas Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

aliciafmachado Sep 14, 2024

Choose a reason for hiding this comment

aliciafmachado commented Oct 6, 2024

aliciafmachado commented Sep 8, 2024 •

edited by iislucas

Loading

iislucas Sep 12, 2024 •

edited

Loading