A bit-depth optimization engine reduces the hardware cost of a neural network. When training data is applied to a neural network during training routines, accuracy cost and hardware costs are generated. A hardware complexity cost generator generates costs for weights near bit-depth steps where the number of binary bits required to represent a weight decreases, such as from 2N to 2N–1, where one less binary bit is required. Gradients are generated from costs for each weight, and weights near bit-depth steps are easily selected since they have a large gradient, while weights far away from a bit-depth step have near-zero gradients. The selected weights are reduced during optimization. Over many cycles of optimization, a low-bit-depth neural network is generated that uses fewer binary bits per weight, resulting in lower hardware costs when the low-bit-depth neural network is manufactured on an Application-Specific Integrated Circuit (ASIC).