TY - GEN
T1 - Parallel Feature Pyramid Network for Object Detection
AU - Kim, Seung Wook
AU - Kook, Hyong Keun
AU - Sun, Jee Young
AU - Kang, Mun Cheon
AU - Ko, Sung Jea
N1 - Funding Information:
communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (2014-0-00077, Development of global multi-target tracking and event prediction techniques based on real-time large-scale video analysis).
Funding Information:
Acknowledgements. This work was supported by Institute for Information &
Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - Recently developed object detectors employ a convolutional neural network (CNN) by gradually increasing the number of feature layers with a pyramidal shape instead of using a featurized image pyramid. However, the different abstraction levels of CNN feature layers often limit the detection performance, especially on small objects. To overcome this limitation, we propose a CNN-based object detection architecture, referred to as a parallel feature pyramid (FP) network (PFPNet), where the FP is constructed by widening the network width instead of increasing the network depth. First, we adopt spatial pyramid pooling and some additional feature transformations to generate a pool of feature maps with different sizes. In PFPNet, the additional feature transformation is performed in parallel, which yields the feature maps with similar levels of semantic abstraction across the scales. We then resize the elements of the feature pool to a uniform size and aggregate their contextual information to generate each level of the final FP. The experimental results confirmed that PFPNet increases the performance of the latest version of the single-shot multi-box detector (SSD) by mAP of 6.4% AP and especially, 7.8% AP small on the MS-COCO dataset.
AB - Recently developed object detectors employ a convolutional neural network (CNN) by gradually increasing the number of feature layers with a pyramidal shape instead of using a featurized image pyramid. However, the different abstraction levels of CNN feature layers often limit the detection performance, especially on small objects. To overcome this limitation, we propose a CNN-based object detection architecture, referred to as a parallel feature pyramid (FP) network (PFPNet), where the FP is constructed by widening the network width instead of increasing the network depth. First, we adopt spatial pyramid pooling and some additional feature transformations to generate a pool of feature maps with different sizes. In PFPNet, the additional feature transformation is performed in parallel, which yields the feature maps with similar levels of semantic abstraction across the scales. We then resize the elements of the feature pool to a uniform size and aggregate their contextual information to generate each level of the final FP. The experimental results confirmed that PFPNet increases the performance of the latest version of the single-shot multi-box detector (SSD) by mAP of 6.4% AP and especially, 7.8% AP small on the MS-COCO dataset.
KW - Feature pyramid
KW - Real-time object detection
UR - http://www.scopus.com/inward/record.url?scp=85055106425&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01228-1_15
DO - 10.1007/978-3-030-01228-1_15
M3 - Conference contribution
AN - SCOPUS:85055106425
SN - 9783030012274
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 239
EP - 256
BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
A2 - Ferrari, Vittorio
A2 - Sminchisescu, Cristian
A2 - Hebert, Martial
A2 - Weiss, Yair
PB - Springer Verlag
T2 - 15th European Conference on Computer Vision, ECCV 2018
Y2 - 8 September 2018 through 14 September 2018
ER -