Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy task #6

Open
tristandeleu opened this issue Sep 22, 2015 · 2 comments
Open

Copy task #6

tristandeleu opened this issue Sep 22, 2015 · 2 comments
Labels

Comments

@tristandeleu
Copy link
Collaborator

Copy task

I will gather all the progress on the Copy task in this issue. I will likely update this issue regularly (hopefully), so you may want to unsubscribe from this issue if you don't want to get all the spam.

@tristandeleu
Copy link
Collaborator Author

Training the NTM on short sequences

I have trained the NTM on sequences of length <= 5. The model converges after ~130k iterations, and successfully copies test sequences up to length 5. Here is an example on a test sample of length 5
copy-5-success
The weight vectors show the adresses where the heads read and write on the memory (the memory has 128 items) as we read the input sequence. As we can see the NTM first writes on the memory a representation of the input (left of the red marker in Write Weights) and then reads from the same adresses (right of the red marker in Read Weights).

It does not generalizes to longer sequences yet though. When I test on longer sequences it sometimes repeats the last input vector multiple times (we can also clearly see it on the Read Weights).
copy-5-repeat
But most of the time it fails and reads a random location in the memory instead (thus giving noise as the output)
copy-5-repeat2

Parameters of the experiment

Same parameters settings as in #4.

@tristandeleu
Copy link
Collaborator Author

Training on short sequences & generalization (cheating)

The successful example from #6 (comment) shows that the head writes on random locations in the memory, but is still able to retrieve these locations afterwards. A more natural approach (and what the results from DeepMind show) would be to write on adjacent addresses at every time step, and it would suggest that either:

  • The convolutional shift does not work as intended, as it is supposed to only shift to at most one location (left or right) on the memory. However some early unit tests show that the shift works well
  • The content addressing and/or the gating mechanism (equations 5 & 7) take too much credit, even though they are not required for this task.

I decided to run a test where I simply skip the content addressing and gating to see if it would indeed have the intended behavior (writing on consecutive addresses). It turns out it worked really well (when it converges, see below) and shows amazing generalization capabilities. In this example I still trained the NTM on sequences of length <= 5, but it seems to perfectly predict sequences even up to length 50 (over a few testing examples). One test even showed perfect generalization (with sequences of length up to 128, as the memory has only 128 addresses).
copy-20-cheat

Parameters of the experiment

Overall same settings as in #4, without any content addressing or gating (in other words w_g = w_tm1 for both heads). The weights were also initialized as [1, 0, ..., 0] (instead of EquiProba) to force the head to write the first vector on a single location in the memory instead of spreading the information over the whole memory.

Issues

It seems that some of the parameters modified for this experiment made the training unstable and NaNs appear after a few hundreds iterations. I have to investigate deeper on this.
Fixed in f09289e: The power function was non differentiable wrt the exponent in 0, which may happen for hard addressing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant