TY - GEN
T1 - EARSHOT
T2 - 41st Annual Meeting of the Cognitive Science Society: Creativity + Cognition + Computation, CogSci 2019
AU - Magnuson, James S.
AU - You, Heejo
AU - Rueckl, Jay
AU - Allopenna, Paul
AU - Li, Monica
AU - Luthra, Sahil
AU - Steiner, Rachael
AU - Nam, Hosung
AU - Escabi, Monty
AU - Brown, Kevin
AU - Theodore, Rachel
AU - Monto, Nicholas
N1 - Funding Information:
Supported by NSF 1754284, NSF IGERT 1144399, & NSF NRT 1747486 (PI: J.S.M.); NICHD P01 HD0001994 (PI: J.R.); and NSF 1827591 (PI: R.M.T.).
Funding Information:
Supported by NSF 175, N4SF IG2ERT81419& 4,N4FS39 NRT 174(P7I:J.S4.M.);8NIC6HDP0H1D004(0PI :199 J.R.); and NSF 1785(P2I: R9.M.T1.).
Publisher Copyright:
© Cognitive Science Society: Creativity + Cognition + Computation, CogSci 2019.All rights reserved.
PY - 2019
Y1 - 2019
N2 - Despite the lack of invariance problem (the many-to-many mapping between acoustics and percepts), we experience phonetic constancy and typically perceive what a speaker intends. Models of human speech recognition have sidestepped this problem, working with abstract, idealized inputs and deferring the challenge of working with real speech. In contrast, automatic speech recognition powered by deep learning networks have allowed robust, real-world speech recognition. However, the complexities of deep learning architectures and training regimens make it difficult to use them to provide direct insights into mechanisms that may support human speech recognition. We developed a simple network that borrows one element from automatic speech recognition (long short-term memory nodes, which provide dynamic memory for short and long spans). This allows the network to learn to map real speech from multiple talkers to semantic targets with high accuracy. Internal representations emerge that resemble phonetically-organized responses in human superior temporal gyrus, suggesting that the model develops a distributed phonological code despite no explicit training on phonetic or phonemic targets. The ability to work with real speech is a major advance for cognitive models of human speech recognition.
AB - Despite the lack of invariance problem (the many-to-many mapping between acoustics and percepts), we experience phonetic constancy and typically perceive what a speaker intends. Models of human speech recognition have sidestepped this problem, working with abstract, idealized inputs and deferring the challenge of working with real speech. In contrast, automatic speech recognition powered by deep learning networks have allowed robust, real-world speech recognition. However, the complexities of deep learning architectures and training regimens make it difficult to use them to provide direct insights into mechanisms that may support human speech recognition. We developed a simple network that borrows one element from automatic speech recognition (long short-term memory nodes, which provide dynamic memory for short and long spans). This allows the network to learn to map real speech from multiple talkers to semantic targets with high accuracy. Internal representations emerge that resemble phonetically-organized responses in human superior temporal gyrus, suggesting that the model develops a distributed phonological code despite no explicit training on phonetic or phonemic targets. The ability to work with real speech is a major advance for cognitive models of human speech recognition.
KW - computational models
KW - deep learning
KW - neural networks
KW - spoken word recognition
UR - http://www.scopus.com/inward/record.url?scp=85139408581&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85139408581
T3 - Proceedings of the 41st Annual Meeting of the Cognitive Science Society: Creativity + Cognition + Computation, CogSci 2019
SP - 2248
EP - 2253
BT - Proceedings of the 41st Annual Meeting of the Cognitive Science Society
PB - The Cognitive Science Society
Y2 - 24 July 2019 through 27 July 2019
ER -