Copy task #6

tristandeleu · 2015-09-22T18:40:53Z

Copy task

I will gather all the progress on the Copy task in this issue. I will likely update this issue regularly (hopefully), so you may want to unsubscribe from this issue if you don't want to get all the spam.

tristandeleu · 2015-09-22T18:41:04Z

Training the NTM on short sequences

I have trained the NTM on sequences of length <= 5. The model converges after ~130k iterations, and successfully copies test sequences up to length 5. Here is an example on a test sample of length 5

The weight vectors show the adresses where the heads read and write on the memory (the memory has 128 items) as we read the input sequence. As we can see the NTM first writes on the memory a representation of the input (left of the red marker in Write Weights) and then reads from the same adresses (right of the red marker in Read Weights).

It does not generalizes to longer sequences yet though. When I test on longer sequences it sometimes repeats the last input vector multiple times (we can also clearly see it on the Read Weights).

But most of the time it fails and reads a random location in the memory instead (thus giving noise as the output)

Parameters of the experiment

Same parameters settings as in #4.

tristandeleu · 2015-09-23T13:23:43Z

Training on short sequences & generalization (cheating)

The successful example from #6 (comment) shows that the head writes on random locations in the memory, but is still able to retrieve these locations afterwards. A more natural approach (and what the results from DeepMind show) would be to write on adjacent addresses at every time step, and it would suggest that either:

The convolutional shift does not work as intended, as it is supposed to only shift to at most one location (left or right) on the memory. However some early unit tests show that the shift works well
The content addressing and/or the gating mechanism (equations 5 & 7) take too much credit, even though they are not required for this task.

I decided to run a test where I simply skip the content addressing and gating to see if it would indeed have the intended behavior (writing on consecutive addresses). It turns out it worked really well (when it converges, see below) and shows amazing generalization capabilities. In this example I still trained the NTM on sequences of length <= 5, but it seems to perfectly predict sequences even up to length 50 (over a few testing examples). One test even showed perfect generalization (with sequences of length up to 128, as the memory has only 128 addresses).

Parameters of the experiment

Overall same settings as in #4, without any content addressing or gating (in other words w_g = w_tm1 for both heads). The weights were also initialized as [1, 0, ..., 0] (instead of EquiProba) to force the head to write the first vector on a single location in the memory instead of spreading the information over the whole memory.

Issues

~~It seems that some of the parameters modified for this experiment made the training unstable and NaNs appear after a few hundreds iterations. I have to investigate deeper on this.~~
Fixed in f09289e: The power function was non differentiable wrt the exponent in 0, which may happen for hard addressing.

tristandeleu added the progress label Sep 22, 2015

tristandeleu mentioned this issue Sep 22, 2015

Generalization performance #7

Closed

tristandeleu mentioned this issue Sep 28, 2015

Repeat Copy task #8

Open

tristandeleu mentioned this issue Oct 21, 2015

Associative Recall task #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy task #6

Copy task #6

tristandeleu commented Sep 22, 2015

tristandeleu commented Sep 22, 2015

tristandeleu commented Sep 23, 2015

Copy task #6

Copy task #6

Comments

tristandeleu commented Sep 22, 2015

Copy task

tristandeleu commented Sep 22, 2015

Training the NTM on short sequences

Parameters of the experiment

tristandeleu commented Sep 23, 2015

Training on short sequences & generalization (cheating)

Parameters of the experiment

Issues