TY - JOUR
T1 - A novel online action detection framework from untrimmed video streams
AU - Yoon, Da Hye
AU - Cho, Nam Gyu
AU - Lee, Seong Whan
N1 - Funding Information:
This work was supported by Institute of Information & Communications Technology Planning & Evaluation ( IITP ) grant funded by the Korea government (MSIT) [No. 2019-0-00079 , Department of Artificial Intelligence, Korea University ] and [No. 2014-0-00059 , Development of Predictive Visual Intelligence Technology].
Funding Information:
This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) [No. 2019-0-00079, Department of Artificial Intelligence, Korea University] and [No. 2014-0-00059, Development of Predictive Visual Intelligence Technology].
Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/10
Y1 - 2020/10
N2 - Online temporal action localization from an untrimmed video stream is a challenging problem in computer vision. It is challenging because of i) in an untrimmed video stream, more than one action instance may appear, including background scenes, and ii) in online settings, only past and current information is available. Therefore, temporal priors, such as the average action duration of training data, which have been exploited by previous action detection methods, are not suitable for this task because of the high intra-class variation in human actions. We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses and leverages a future frame generation network to cope with the limited information issue associated with the problem outlined above. Additionally, we augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions. We evaluate our method using two benchmark datasets, THUMOS’14 and ActivityNet, for an online temporal action localization scenario and demonstrate that the performance is comparable to state-of-the-art methods that have been proposed for offline settings.
AB - Online temporal action localization from an untrimmed video stream is a challenging problem in computer vision. It is challenging because of i) in an untrimmed video stream, more than one action instance may appear, including background scenes, and ii) in online settings, only past and current information is available. Therefore, temporal priors, such as the average action duration of training data, which have been exploited by previous action detection methods, are not suitable for this task because of the high intra-class variation in human actions. We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses and leverages a future frame generation network to cope with the limited information issue associated with the problem outlined above. Additionally, we augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions. We evaluate our method using two benchmark datasets, THUMOS’14 and ActivityNet, for an online temporal action localization scenario and demonstrate that the performance is comparable to state-of-the-art methods that have been proposed for offline settings.
KW - 3D convolutional neural network
KW - Future frame generation
KW - Long short-term memory
KW - Online action detection
KW - Untrimmed video stream
UR - http://www.scopus.com/inward/record.url?scp=85084532250&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2020.107396
DO - 10.1016/j.patcog.2020.107396
M3 - Article
AN - SCOPUS:85084532250
VL - 106
JO - Pattern Recognition
JF - Pattern Recognition
SN - 0031-3203
M1 - 107396
ER -