Image classification is an essential and challenging task in computer vision. Despite its prevalence, the combination of the deep convolutional neural network (DCNN) and the Fisher vector (FV) encoding method has limited performance since the class-irrelevant background used in the traditional FV encoding may result in less discriminative image features. In this paper, we propose the foreground FV (fgFV) encoding algorithm and its fast approximation for image classification. We try to separate implicitly the class-relevant foreground from the class-irrelevant background during the encoding process via tuning the weights of the partial gradients corresponding to each Gaussian component under the supervision of image labels and, then, use only those local descriptors extracted from the class-relevant foreground to estimate FVs. We have evaluated our fgFV against the widely used FV and improved FV (iFV) under the combined DCNN-FV framework and also compared them to several state-of-the-art image classification approaches on ten benchmark image datasets for the recognition of fine-grained natural species and artificial manufactures, categorization of course objects, and classification of scenes. Our results indicate that the proposed fgFV encoding algorithm can construct more discriminative image presentations from local descriptors than FV and iFV, and the combined DCNN-fgFV algorithm can improve the performance of image classification.
- convolutional neural networks
- feature encoding
- foreground Fisher vector
- Image classification
ASJC Scopus subject areas
- Computer Graphics and Computer-Aided Design