TY - GEN
T1 - A perceptual evaluation of generative adversarial network real-time synthesized drum sounds in a virtual environment
AU - Chang, Minwook
AU - Kim, Youngwon Ryan
AU - Kim, Gerard Jounghyun
N1 - Funding Information:
This research was partially supported by Inst. for Info. Comm. Tech. Promotion (IITP) grant funded by the Korean government (MSIP No.2017-0-00179),and the Global Frontier RD Program on Human-centered Interaction for Coexistence funded by the NRF of Korea (NRF-2015M3A6A3076490)
Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/15
Y1 - 2019/1/15
N2 - Conventional methods of real time sound effects in 3D graphical and virtual environments relied upon preparing all the needed samples ahead of time and simply replaying them as needed, or parametrically modifying a basic set of samples using physically based techniques such as the spring-damper simulation and modal analysis/synthesis. In this work, we propose to apply the generative adversarial network (GAN) approach to the problem at hand, with which only one generator is trained to produce the needed sounds fast with perceptually indifferent quality. Otherwise, with the conventional methods, separate and approximate models would be needed to deal with different material properties and contact types, and manage real time performance. We demonstrate our claim by training a GAN (more specifically WaveGAN) with sounds of different drums and synthesizing the sounds on the fly for a virtual drum playing environment. The perceptual test revealed that the subjects could not discern the synthesized sounds from the ground truth nor perceived any noticeable delay upon the corresponding physical event.
AB - Conventional methods of real time sound effects in 3D graphical and virtual environments relied upon preparing all the needed samples ahead of time and simply replaying them as needed, or parametrically modifying a basic set of samples using physically based techniques such as the spring-damper simulation and modal analysis/synthesis. In this work, we propose to apply the generative adversarial network (GAN) approach to the problem at hand, with which only one generator is trained to produce the needed sounds fast with perceptually indifferent quality. Otherwise, with the conventional methods, separate and approximate models would be needed to deal with different material properties and contact types, and manage real time performance. We demonstrate our claim by training a GAN (more specifically WaveGAN) with sounds of different drums and synthesizing the sounds on the fly for a virtual drum playing environment. The perceptual test revealed that the subjects could not discern the synthesized sounds from the ground truth nor perceived any noticeable delay upon the corresponding physical event.
KW - Generation of immersive environments and virtual worlds
KW - Machine learning for multimodal interaction
KW - Multimodal interaction and experiences in VR/AR
UR - http://www.scopus.com/inward/record.url?scp=85062185396&partnerID=8YFLogxK
U2 - 10.1109/AIVR.2018.00030
DO - 10.1109/AIVR.2018.00030
M3 - Conference contribution
AN - SCOPUS:85062185396
T3 - Proceedings - 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality, AIVR 2018
SP - 144
EP - 148
BT - Proceedings - 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality, AIVR 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st IEEE International Conference on Artificial Intelligence and Virtual Reality, AIVR 2018
Y2 - 10 December 2018 through 12 December 2018
ER -