Embodiments relate to a configurable convolution engine that receives configuration information to perform convolution and other deep machine learning operations on streaming input data of various formats. The convolution engine may include a convolution core circuit and a spatial pooling circuit. The convolution core circuit performs convolution operations on input data to generate a first stream including first values of a first channel and second values of a second channel in an interleaved manner. The convolution core circuit may further perform post-processing operations, including inter-channel processing operations. The spatial pooling circuit performs per-channel operations on the output of the convolution core circuit, pooling subsets of the values of the first and second channel separately, and combining the spatially pooled values into an output stream having multiple channels in an interleaved manner.