WHAT IS CLAIMED IS:

1. A system comprising:

a recurrent neural network implemented by one or more computers,

wherein the recurrent neural network is configured to receive a respective neural network input at each of a plurality of time steps and to generate a respective neural network output at each of the plurality of time steps,

wherein the recurrent neural network includes an associative long-short term memory (LSTM) layer,

wherein the associative LSTM layer is configured to maintain N copies of an internal state for the associative LSTM layer, N being an integer greater than one, and wherein the associative LSTM layer is further configured to, at each of the plurality of time steps:

receive a layer input for the time step,

update each of the N copies of the internal state using the layer input for the time step and a layer output generated by the associative LSTM layer for a preceding time step, and

generate a layer output for the time step using the N updated copies of the internal state.

2. The system of claim 1, wherein updating each of the N copies of the internal state comprises:

determining a cell state update for the time step from the layer input at the time step and optionally the layer output for the preceding time step;

determining, for each of the N copies of the internal state, a corresponding transformed input key from the layer input at the time step and the layer output for the preceding time step; and

for each of the N copies of the internal state, determining the updated copy of the internal state from the copy of the internal state, the cell state update, and the corresponding transformed input key.

3. The system of claim 2, wherein determining, for each of the N copies of the internal state, a corresponding transformed input key from the layer input at the time step and the layer output for the preceding time step comprises:

determining an input key from the layer input at the time step and the layer output for the preceding time step; and

for each of the N copies of the internal state, determining the corresponding transformed input key for the copy by permuting the input key with a respective permutation matrix that is specific to the copy.

4. The system of claim 2 or 3, wherein updating each of the N copies of the internal state further comprises:

determining an input gate from the layer input at the time step and the layer output for the preceding time step, and

determining a forget gate from the layer input at the time step and the layer output for the preceding time step.

5. The system of claim 4, wherein determining the updated copy of the internal state from the copy of the internal state, the ceil state update, and the corresponding transformed input key comprises:

applying the forget gate to the copy of the internal state to generate an initial updated copy;

applying the input gate to the cell state update to generate a final cell state update; applying the corresponding transformed input key to the final cell state update to generate a rotated cell state update; and

combining the initial updated copy and the rotated cell state update to generate the updated copy of the internal state.

6. The system of any preceding claim, wherein generating the layer output for the time step comprises:

determining, for each of the N copies of the internal state, a corresponding transformed output key from the layer input at the time step and the layer output for the preceding time step;

modifying, for each of the N copies of the internal state, the updated copy of the internal state using the corresponding transformed output key;

combining the N modified copies to generate a combined internal state for the time step; and

determining the layer output from the combined internal state for the time step.

7. The system of claim 6, wherein combining the N modified copies comprises determining the average of the N modified copies.

8. The system of claim 6, wherein determining, for each of the N copies of the internal state, a corresponding transformed output key from the layer input at the time step and the layer output for the preceding time step comprises:

determining an output key from the layer input at the time step and the layer output for the preceding time step; and

for each of the N copies of the internal state, determining the corresponding transformed output key for the copy by permuting the output key with a respective permutation matrix that is specific to the copy.

9. The system of claim 6 or 7, wherein generating the layer output for the time step further comprises:

determining an output gate from the layer input at the time step and the layer output for the preceding time step, and wherein determining the layer output from the combined internal state for the time step comprises:

applying an activation function to the combined internal state to determine an initial layer output; and

applying the output gate to the initial layer output to determine the layer output for the time step.

10. A method compnsing the operations that the associative LSTM layer of any one of claims 1 -9 is configured to perform.

11. One or more computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to implement the system of any one of claims 1-9.