TY - GEN
T1 - UnionDet
T2 - 16th European Conference on Computer Vision, ECCV 2020
AU - Kim, Bumsoo
AU - Choi, Taeho
AU - Kang, Jaewoo
AU - Kim, Hyunwoo J.
N1 - Funding Information:
Acknowledgement. This work was supported by the National Research Council of Science & Technology (NST) grant by the Korea government (MSIT)(No.CAP-18-03-ETRI), National Research Foundation of Korea (NRF-2017M3C4A7065887), and Samsung Electronics, Co. Ltd.
PY - 2020
Y1 - 2020
N2 - Recent advances in deep neural networks have achieved significant progress in detecting individual objects from an image. However, object detection is not sufficient to fully understand a visual scene. Towards a deeper visual understanding, the interactions between objects, especially humans and objects are essential. Most prior works have obtained this information with a bottom-up approach, where the objects are first detected and the interactions are predicted sequentially by pairing the objects. This is a major bottleneck in HOI detection inference time. To tackle this problem, we propose UnionDet, a one-stage meta-architecture for HOI detection powered by a novel union-level detector that eliminates this additional inference stage by directly capturing the region of interaction. Our one-stage detector for human-object interaction shows a significant reduction in interaction prediction time (4 × ∼ 14 ×) while outperforming state-of-the-art methods on two public datasets: V-COCO and HICO-DET.
AB - Recent advances in deep neural networks have achieved significant progress in detecting individual objects from an image. However, object detection is not sufficient to fully understand a visual scene. Towards a deeper visual understanding, the interactions between objects, especially humans and objects are essential. Most prior works have obtained this information with a bottom-up approach, where the objects are first detected and the interactions are predicted sequentially by pairing the objects. This is a major bottleneck in HOI detection inference time. To tackle this problem, we propose UnionDet, a one-stage meta-architecture for HOI detection powered by a novel union-level detector that eliminates this additional inference stage by directly capturing the region of interaction. Our one-stage detector for human-object interaction shows a significant reduction in interaction prediction time (4 × ∼ 14 ×) while outperforming state-of-the-art methods on two public datasets: V-COCO and HICO-DET.
KW - Human-object interaction detection
KW - Object detection
KW - Real-time detection
KW - Visual relationships
UR - http://www.scopus.com/inward/record.url?scp=85097420199&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097420199&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-58555-6_30
DO - 10.1007/978-3-030-58555-6_30
M3 - Conference contribution
AN - SCOPUS:85097420199
SN - 9783030585549
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 498
EP - 514
BT - Computer Vision – ECCV 2020 - 16th European Conference, 2020, Proceedings
A2 - Vedaldi, Andrea
A2 - Bischof, Horst
A2 - Brox, Thomas
A2 - Frahm, Jan-Michael
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 23 August 2020 through 28 August 2020
ER -