K-SPAN: A lexical database of Korean surface phonetic forms and phonological neighborhood density statistics

Jeffrey Holliday, Rory Turnbull, Julien Eychenne

Research output: Contribution to journalArticle

Abstract

This article presents K-SPAN (Korean Surface Phonetics and Neighborhoods), a database of surface phonetic forms and several measures of phonological neighborhood density for 63,836 Korean words. Currently publicly available Korean corpora are limited by the fact that they only provide orthographic representations in Hangeul, which is problematic since phonetic forms in Korean cannot be reliably predicted from orthographic forms. We describe the method used to derive the surface phonetic forms from a publicly available orthographic corpus of Korean, and report on several statistics calculated using this database; namely, segment unigram frequencies, which are compared to previously reported results, along with segment-based and syllable-based neighborhood density statistics for three types of representation: an “orthographic” form, which is a quasi-phonological representation, a “conservative” form, which maintains all known contrasts, and a “modern” form, which represents the pronunciation of contemporary Seoul Korean. These representations are rendered in an ASCII-encoded scheme, which allows users to query the corpus without having to read Korean orthography, and permits the calculation of a wide range of phonological measures.

Original languageEnglish
Pages (from-to)1939-1950
Number of pages12
JournalBehavior Research Methods
Volume49
Issue number5
DOIs
Publication statusPublished - 2017 Oct 1

Fingerprint

Phonetics
Databases
Lexical Database
Statistics
Neighborhood Density
Phonetic Form
Phonological Neighborhood
Orthographic
Data Base

Keywords

  • Korean
  • Lexical database
  • Lexicon
  • Phonological neighborhood density

ASJC Scopus subject areas

  • Experimental and Cognitive Psychology
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Psychology (miscellaneous)
  • Psychology(all)

Cite this

K-SPAN : A lexical database of Korean surface phonetic forms and phonological neighborhood density statistics. / Holliday, Jeffrey; Turnbull, Rory; Eychenne, Julien.

In: Behavior Research Methods, Vol. 49, No. 5, 01.10.2017, p. 1939-1950.

Research output: Contribution to journalArticle

@article{f193e57c9365486f8ed7326685afd218,
title = "K-SPAN: A lexical database of Korean surface phonetic forms and phonological neighborhood density statistics",
abstract = "This article presents K-SPAN (Korean Surface Phonetics and Neighborhoods), a database of surface phonetic forms and several measures of phonological neighborhood density for 63,836 Korean words. Currently publicly available Korean corpora are limited by the fact that they only provide orthographic representations in Hangeul, which is problematic since phonetic forms in Korean cannot be reliably predicted from orthographic forms. We describe the method used to derive the surface phonetic forms from a publicly available orthographic corpus of Korean, and report on several statistics calculated using this database; namely, segment unigram frequencies, which are compared to previously reported results, along with segment-based and syllable-based neighborhood density statistics for three types of representation: an “orthographic” form, which is a quasi-phonological representation, a “conservative” form, which maintains all known contrasts, and a “modern” form, which represents the pronunciation of contemporary Seoul Korean. These representations are rendered in an ASCII-encoded scheme, which allows users to query the corpus without having to read Korean orthography, and permits the calculation of a wide range of phonological measures.",
keywords = "Korean, Lexical database, Lexicon, Phonological neighborhood density",
author = "Jeffrey Holliday and Rory Turnbull and Julien Eychenne",
year = "2017",
month = "10",
day = "1",
doi = "10.3758/s13428-016-0836-8",
language = "English",
volume = "49",
pages = "1939--1950",
journal = "Behavior Research Methods",
issn = "1554-351X",
publisher = "Springer New York",
number = "5",

}

TY - JOUR

T1 - K-SPAN

T2 - A lexical database of Korean surface phonetic forms and phonological neighborhood density statistics

AU - Holliday, Jeffrey

AU - Turnbull, Rory

AU - Eychenne, Julien

PY - 2017/10/1

Y1 - 2017/10/1

N2 - This article presents K-SPAN (Korean Surface Phonetics and Neighborhoods), a database of surface phonetic forms and several measures of phonological neighborhood density for 63,836 Korean words. Currently publicly available Korean corpora are limited by the fact that they only provide orthographic representations in Hangeul, which is problematic since phonetic forms in Korean cannot be reliably predicted from orthographic forms. We describe the method used to derive the surface phonetic forms from a publicly available orthographic corpus of Korean, and report on several statistics calculated using this database; namely, segment unigram frequencies, which are compared to previously reported results, along with segment-based and syllable-based neighborhood density statistics for three types of representation: an “orthographic” form, which is a quasi-phonological representation, a “conservative” form, which maintains all known contrasts, and a “modern” form, which represents the pronunciation of contemporary Seoul Korean. These representations are rendered in an ASCII-encoded scheme, which allows users to query the corpus without having to read Korean orthography, and permits the calculation of a wide range of phonological measures.

AB - This article presents K-SPAN (Korean Surface Phonetics and Neighborhoods), a database of surface phonetic forms and several measures of phonological neighborhood density for 63,836 Korean words. Currently publicly available Korean corpora are limited by the fact that they only provide orthographic representations in Hangeul, which is problematic since phonetic forms in Korean cannot be reliably predicted from orthographic forms. We describe the method used to derive the surface phonetic forms from a publicly available orthographic corpus of Korean, and report on several statistics calculated using this database; namely, segment unigram frequencies, which are compared to previously reported results, along with segment-based and syllable-based neighborhood density statistics for three types of representation: an “orthographic” form, which is a quasi-phonological representation, a “conservative” form, which maintains all known contrasts, and a “modern” form, which represents the pronunciation of contemporary Seoul Korean. These representations are rendered in an ASCII-encoded scheme, which allows users to query the corpus without having to read Korean orthography, and permits the calculation of a wide range of phonological measures.

KW - Korean

KW - Lexical database

KW - Lexicon

KW - Phonological neighborhood density

UR - http://www.scopus.com/inward/record.url?scp=85011659636&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85011659636&partnerID=8YFLogxK

U2 - 10.3758/s13428-016-0836-8

DO - 10.3758/s13428-016-0836-8

M3 - Article

C2 - 28155186

AN - SCOPUS:85011659636

VL - 49

SP - 1939

EP - 1950

JO - Behavior Research Methods

JF - Behavior Research Methods

SN - 1554-351X

IS - 5

ER -