Motivated by the Minimum Description Length (MDL) principle, we first derive an expression for the entropy of a neural network which measures its complexity explicitly in terms of its bit-size. Then, we formalize the problem of neural network compression as an entropy-constrained optimization objective. This objective generalizes many of the currently proposed compression techniques in the literature, in that pruning or reducing the cardinality of the weight elements can be seen as special cases of entropy reduction methods. Furthermore, we derive a continuous relaxation of the objective, which allows us to minimize it using gradient-based optimization techniques. Finally, we show that we can reach compression results, which are competitive with those obtained using state-of-the-art techniques, on different network architectures and data sets, e.g. achieving×71 compression gains on a VGG-like architecture.