TY - GEN
T1 - AMSS-Net
T2 - 29th ACM International Conference on Multimedia, MM 2021
AU - Choi, Woosung
AU - Kim, Minseok
AU - Martínez Ramírez, Marco A.
AU - Chung, Jaehwa
AU - Jung, Soonyoung
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT)(No. 2020R1A2C1012624, 2021R1A2C2011452). Also, This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(No. 2019R1A6A3A13095526).
Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/10/17
Y1 - 2021/10/17
N2 - This paper proposes a neural network that performs audio transformations to user-specified sources (e.g., vocals) of a given audio track according to a given description while preserving other sources not mentioned in the description. Audio Manipulation on a Specific Source (AMSS) is challenging because a sound object (i.e., a waveform sample or frequency bin) is 'transparent'; it usually carries information from multiple sources, in contrast to a pixel in an image. To address this challenging problem, we propose AMSS-Net, which extracts latent sources and selectively manipulates them while preserving irrelevant sources. We also propose an evaluation benchmark for several AMSS tasks, and we show that AMSS-Net outperforms baselines on several AMSS tasks via objective metrics and empirical verification.
AB - This paper proposes a neural network that performs audio transformations to user-specified sources (e.g., vocals) of a given audio track according to a given description while preserving other sources not mentioned in the description. Audio Manipulation on a Specific Source (AMSS) is challenging because a sound object (i.e., a waveform sample or frequency bin) is 'transparent'; it usually carries information from multiple sources, in contrast to a pixel in an image. To address this challenging problem, we propose AMSS-Net, which extracts latent sources and selectively manipulates them while preserving irrelevant sources. We also propose an evaluation benchmark for several AMSS tasks, and we show that AMSS-Net outperforms baselines on several AMSS tasks via objective metrics and empirical verification.
KW - audio manipulation
KW - neural networks
KW - text-guided
UR - http://www.scopus.com/inward/record.url?scp=85119378398&partnerID=8YFLogxK
U2 - 10.1145/3474085.3475323
DO - 10.1145/3474085.3475323
M3 - Conference contribution
AN - SCOPUS:85119378398
T3 - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
SP - 1775
EP - 1783
BT - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 20 October 2021 through 24 October 2021
ER -