Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first LSTM layer having gates that are configured to, for each of multiple time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix. By including the compressed LSTM layer in the recurrent neural network, the recurrent neural network is configured to be able to process data more efficiently and use less data storage. A recurrent neural network having a compressed LSTM layer can be effectively trained to achieve word error rates comparable to full size, e.g., uncompressed, recurrent neural networks.