In this paper, we aim to maximally utilize multimodality neuroimaging and genetic data to predict Alzheimer’s disease (AD) and its prodromal status, i.e., a multi-status dementia diagnosis problem. Multimodality neuroimaging data such as MRI and PET provide valuable insights to abnormalities, and genetic data such as Single Nucleotide Polymorphism (SNP) provide information about a patient’s AD risk factors. When used in conjunction, AD diagnosis may be improved. However, these data are heterogeneous (e.g., having different data distributions), and have different number of samples (e.g., PET data is having far less number of samples than the numbers of MRI or SNPs). Thus, learning an effective model using these data is challenging. To this end, we present a novel three-stage deep feature learning and fusion framework, where the deep neural network is trained stage-wise. Each stage of the network learns feature representations for different combination of modalities, via effective training using maximum number of available samples. Specifically, in the first stage, we learn latent representations (i.e., high-level features) for each modality independently, so that the heterogeneity between modalities can be better addressed and then combined in the next stage. In the second stage, we learn the joint latent features for each pair of modality combination by using the high-level features learned from the first stage. In the third stage, we learn the diagnostic labels by fusing the learned joint latent features from the second stage. We have tested our framework on Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset for multi-status AD diagnosis, and the experimental results show that the proposed framework outperforms other methods.