Learning an object detector using zoomed object regions

Sung Jin Cho, Seung Wook Kim, Kwang Hyun Uhm, Hyong Keun Kook, Sung-Jea Ko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The single shot multi-box detector (SSD) is one of the first real-time detectors, which uses a convolutional neural network (CNN) and achieves the state-of-the-art detection performance. However, owing to the semantic gap between each feature layer of CNN, the SSD has a room for improvement. In this paper, we propose a novel training scheme to enhance the performance of the SSD. In object detection, ground truth (GT) box is a bounding box enclosing an object boundary. To improve the semantic level of the feature map, we generate additional GT boxes by zooming in to and out from the original GT boxes. Experimental results show that the SSD trained with our scheme outperforms the original one on public dataset.

Original languageEnglish
Title of host publicationICEIC 2019 - International Conference on Electronics, Information, and Communication
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9788995004449
DOIs
Publication statusPublished - 2019 May 3
Event18th International Conference on Electronics, Information, and Communication, ICEIC 2019 - Auckland, New Zealand
Duration: 2019 Jan 222019 Jan 25

Publication series

NameICEIC 2019 - International Conference on Electronics, Information, and Communication

Conference

Conference18th International Conference on Electronics, Information, and Communication, ICEIC 2019
CountryNew Zealand
CityAuckland
Period19/1/2219/1/25

Fingerprint

Semantics
Detectors
Neural networks
Object detection

Keywords

  • Computer vision
  • Neural network
  • Object detection

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Cho, S. J., Kim, S. W., Uhm, K. H., Kook, H. K., & Ko, S-J. (2019). Learning an object detector using zoomed object regions. In ICEIC 2019 - International Conference on Electronics, Information, and Communication [8706381] (ICEIC 2019 - International Conference on Electronics, Information, and Communication). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/ELINFOCOM.2019.8706381

Learning an object detector using zoomed object regions. / Cho, Sung Jin; Kim, Seung Wook; Uhm, Kwang Hyun; Kook, Hyong Keun; Ko, Sung-Jea.

ICEIC 2019 - International Conference on Electronics, Information, and Communication. Institute of Electrical and Electronics Engineers Inc., 2019. 8706381 (ICEIC 2019 - International Conference on Electronics, Information, and Communication).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cho, SJ, Kim, SW, Uhm, KH, Kook, HK & Ko, S-J 2019, Learning an object detector using zoomed object regions. in ICEIC 2019 - International Conference on Electronics, Information, and Communication., 8706381, ICEIC 2019 - International Conference on Electronics, Information, and Communication, Institute of Electrical and Electronics Engineers Inc., 18th International Conference on Electronics, Information, and Communication, ICEIC 2019, Auckland, New Zealand, 19/1/22. https://doi.org/10.23919/ELINFOCOM.2019.8706381
Cho SJ, Kim SW, Uhm KH, Kook HK, Ko S-J. Learning an object detector using zoomed object regions. In ICEIC 2019 - International Conference on Electronics, Information, and Communication. Institute of Electrical and Electronics Engineers Inc. 2019. 8706381. (ICEIC 2019 - International Conference on Electronics, Information, and Communication). https://doi.org/10.23919/ELINFOCOM.2019.8706381
Cho, Sung Jin ; Kim, Seung Wook ; Uhm, Kwang Hyun ; Kook, Hyong Keun ; Ko, Sung-Jea. / Learning an object detector using zoomed object regions. ICEIC 2019 - International Conference on Electronics, Information, and Communication. Institute of Electrical and Electronics Engineers Inc., 2019. (ICEIC 2019 - International Conference on Electronics, Information, and Communication).
@inproceedings{1be0a06fdf4d4e6dab010b03a28c41c4,
title = "Learning an object detector using zoomed object regions",
abstract = "The single shot multi-box detector (SSD) is one of the first real-time detectors, which uses a convolutional neural network (CNN) and achieves the state-of-the-art detection performance. However, owing to the semantic gap between each feature layer of CNN, the SSD has a room for improvement. In this paper, we propose a novel training scheme to enhance the performance of the SSD. In object detection, ground truth (GT) box is a bounding box enclosing an object boundary. To improve the semantic level of the feature map, we generate additional GT boxes by zooming in to and out from the original GT boxes. Experimental results show that the SSD trained with our scheme outperforms the original one on public dataset.",
keywords = "Computer vision, Neural network, Object detection",
author = "Cho, {Sung Jin} and Kim, {Seung Wook} and Uhm, {Kwang Hyun} and Kook, {Hyong Keun} and Sung-Jea Ko",
year = "2019",
month = "5",
day = "3",
doi = "10.23919/ELINFOCOM.2019.8706381",
language = "English",
series = "ICEIC 2019 - International Conference on Electronics, Information, and Communication",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "ICEIC 2019 - International Conference on Electronics, Information, and Communication",

}

TY - GEN

T1 - Learning an object detector using zoomed object regions

AU - Cho, Sung Jin

AU - Kim, Seung Wook

AU - Uhm, Kwang Hyun

AU - Kook, Hyong Keun

AU - Ko, Sung-Jea

PY - 2019/5/3

Y1 - 2019/5/3

N2 - The single shot multi-box detector (SSD) is one of the first real-time detectors, which uses a convolutional neural network (CNN) and achieves the state-of-the-art detection performance. However, owing to the semantic gap between each feature layer of CNN, the SSD has a room for improvement. In this paper, we propose a novel training scheme to enhance the performance of the SSD. In object detection, ground truth (GT) box is a bounding box enclosing an object boundary. To improve the semantic level of the feature map, we generate additional GT boxes by zooming in to and out from the original GT boxes. Experimental results show that the SSD trained with our scheme outperforms the original one on public dataset.

AB - The single shot multi-box detector (SSD) is one of the first real-time detectors, which uses a convolutional neural network (CNN) and achieves the state-of-the-art detection performance. However, owing to the semantic gap between each feature layer of CNN, the SSD has a room for improvement. In this paper, we propose a novel training scheme to enhance the performance of the SSD. In object detection, ground truth (GT) box is a bounding box enclosing an object boundary. To improve the semantic level of the feature map, we generate additional GT boxes by zooming in to and out from the original GT boxes. Experimental results show that the SSD trained with our scheme outperforms the original one on public dataset.

KW - Computer vision

KW - Neural network

KW - Object detection

UR - http://www.scopus.com/inward/record.url?scp=85065887181&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065887181&partnerID=8YFLogxK

U2 - 10.23919/ELINFOCOM.2019.8706381

DO - 10.23919/ELINFOCOM.2019.8706381

M3 - Conference contribution

AN - SCOPUS:85065887181

T3 - ICEIC 2019 - International Conference on Electronics, Information, and Communication

BT - ICEIC 2019 - International Conference on Electronics, Information, and Communication

PB - Institute of Electrical and Electronics Engineers Inc.

ER -