TY - GEN
T1 - Resmax
T2 - 25th International Conference on Pattern Recognition, ICPR 2020
AU - Kwak, Il Youp
AU - Kwag, Sungsu
AU - Lee, Junhee
AU - Huh, Jun Ho
AU - Lee, Choong Hoon
AU - Jeon, Youngbae
AU - Hwang, Jeonghwan
AU - Yoon, Ji Won
N1 - Funding Information:
Acknowledgment This work was conducted at Samsung Research. The authors would like to thank Samsung Research Security Team for the helpful discussions. IK was supported by the National Research Foundation of Korea(NRF) grant funded by Ministry of Science and ICT (2020R1C1C1A01013020)
Funding Information:
This work was conducted at Samsung Research. The authors would like to thank Samsung Research Security Team for the helpful discussions. IK was supported by the National Research Foundation of Korea(NRF) grant funded by Ministry of Science and ICT (2020R1C1C1A01013020)
Publisher Copyright:
© 2020 IEEE
PY - 2020
Y1 - 2020
N2 - The “2019 Automatic Speaker Verification Spoofing And Countermeasures Challenge” (ASVspoof) competition aimed to facilitate the design of highly accurate voice spoofing attack detection systems. the competition did not emphasize model complexity and latency requirements; such constraints are strict and integral in real-world deployment. Hence, most of the top performing solutions from the competition all used an ensemble approach, and combined multiple complex deep learning models to maximize detection accuracy - this kind of approach would sit uneasily with real-world deployment constraints. To design a lightweight system, we combined the notions of skip connection (from ResNet) and max feature map (from Light CNN), and evaluated the accuracy of the system using the ASVspoof 2019 dataset. With an optimized constant Q transform (CQT) feature, our single model achieved a replay attack detection equal error rate (EER) of 0.37% on the evaluation set, surpassing the top ensemble system from the competition that achieved an EER of 0.39%.
AB - The “2019 Automatic Speaker Verification Spoofing And Countermeasures Challenge” (ASVspoof) competition aimed to facilitate the design of highly accurate voice spoofing attack detection systems. the competition did not emphasize model complexity and latency requirements; such constraints are strict and integral in real-world deployment. Hence, most of the top performing solutions from the competition all used an ensemble approach, and combined multiple complex deep learning models to maximize detection accuracy - this kind of approach would sit uneasily with real-world deployment constraints. To design a lightweight system, we combined the notions of skip connection (from ResNet) and max feature map (from Light CNN), and evaluated the accuracy of the system using the ASVspoof 2019 dataset. With an optimized constant Q transform (CQT) feature, our single model achieved a replay attack detection equal error rate (EER) of 0.37% on the evaluation set, surpassing the top ensemble system from the competition that achieved an EER of 0.39%.
KW - Voice assistant security
KW - Voice presentation attack detection
KW - Voice spoofing attack
KW - Voice synthesis attack
UR - http://www.scopus.com/inward/record.url?scp=85110423250&partnerID=8YFLogxK
U2 - 10.1109/ICPR48806.2021.9412165
DO - 10.1109/ICPR48806.2021.9412165
M3 - Conference contribution
AN - SCOPUS:85110423250
T3 - Proceedings - International Conference on Pattern Recognition
SP - 4837
EP - 4844
BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 January 2021 through 15 January 2021
ER -