In recent studies, we have witnessed the applicability of deep learning methods on resting-state functional Magnetic Resonance Image (rs-fMRI) analysis and on its use for brain disease diagnosis, e.g., early Mild Cognitive Impairment (eMCI) identification. However, to our best knowledge, many of the existing methods are generally limited from improving the performance in a target task, e.g., eMCI diagnosis, by the unexpected information loss in transforming an input into hierarchical or compressed representations. In this paper, we propose a novel network architecture that discovers enriched representations of the spatio-temporal patterns in rs-fMRI such that the most compressed or latent representations include the maximal amount of information to recover the original input, but are decomposed into diagnosis-relevant and diagnosis-irrelevant features. In order to learn those favourable representations, we utilize a self-attention mechanism to explore spatially more informative patterns over time and information-oriented techniques to maintain the enriched but decomposed representations. In our experiments over the ADNI dataset, we validated the effectiveness of the proposed network architecture by comparing its performance with that of the counterpart methods as well as the competing methods in the literature.