Mandarinenglish code switching speech corpus in south east asia seame

Publication

AuthorsDau-Cheng LyuOATien-Ping Tan OAEng-Siong Chng OAHaizhou Li OA

Year2015

JournalLanguage Resources and Evaluation

Abstract

This paper introduces the South East Asia Mandarin–English corpus, a 63-h spontaneous Mandarin–English code-switching transcribed speech corpus suitable for LVCSR and language change detection/identification research. The corpus is recorded under unscripted interview and conversational settings from 157 Singaporean and Malaysian speakers who spoke a mixture of Mandarin and English within a single sentence. About 82 % of the transcribed utterances are intra-sentential code-switching speech and the corpus will be release by LDC in 2015. This paper presents an analysis of the code-switching statistics of the corpus, such as the duration of monolingual segments and the frequency of language turns in code-switch utterances. We also summarize the development effort, details such as the processing time for transcription, validation and language boundary labelling. Lastly, we present textual analyses of code-switch segments examining the word length of monolingual segments in code-switch utterances and the most common single word and two-word phrase of such segments.

LanguageMandarin, English

No citing papers are currently in WordNorms

Language diarization for conversational code-switch speech with pronunciation dictionary adaptation2013

Language diarization for code-switch conversational speech2013

Spoken Language Recognition: From Fundamentals to Practice · Proceedings of the IEEE2013

Language policy changes 1979–1992: Politics and pedagogy (1994)2013

A first speech recognition system for Mandarin-English code-switch conversational speech2012

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups · IEEE Signal Processing Magazine2012

Deep Neural Networks for Acoustic Modeling in Speech Recognition2012

A Mandarin-English Code-Switching Corpus2012

CECOS: A Chinese-English code-switching speech database2011

The subspace Gaussian mixture model—A structured model for speech recognition · Computer Speech & Language2010

Language identification in code-switching speech using word-based lexical model2010

Tuning phone decoders for language identification2010

The Cambridge Handbook of Linguistic Code-switching · Cambridge University Press eBooks2009

Automatic Recognition of Cantonese-English Code-Mixing Speech2009

Language identification on code-switching utterances using multiple cues2008

Code Switching and Grammatical Theory2006

Language Identification by Using Syllable-Based Duration Classification on Code-Switching Speech · Lecture notes in computer science2006

Speech Recognition on Code-Switching Among the Chinese Dialects2006

Detection of language boundary in code-switching utterances by bi-phone probabilities2005

Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs · IEEE Transactions on Audio Speech and Language Processing2005

A Conditional Random Field Word Segmenter for Sighan Bakeoff 20052005

English in Singapore: Phonetic Research on a Corpus2005

Development of a Cantonese-English code-mixing speech corpus2005

Functions of Code Switching in Schoolchildren's Conversations · Bilingual Research Journal2004

Language boundary detection and identification of mixed-language speech based on MAP estimation2004

Code-switching in conversation: Language, interaction and identity . Ed. by Peter Auer. London & New York: Routledge, 1998. Pp. viii, 355. · Language2000

Code Switching in Conversation: Language, Interaction and Identity · The Modern Language Review2000

The "why" and "how" questions in the analysis of conversational code-switching · BIROn (Birkbeck, University of London)1999

One speaker, two languages. Ed. By Lesley Milroy and Pieter Muysken. Cambridge: Cambridge University Press, 1995. Pp. xii, 365. · Language1998

One Speaker, Two Languages: Cross-Disciplinary Perspectives on Code-Switching · Rocky Mountain Review of Language and Literature1996

A review of large-vocabulary continuous-speech · IEEE Signal Processing Magazine1996

Comparison of four approaches to automatic language identification of telephone speech · IEEE Transactions on Speech and Audio Processing1996

Duelling languages: Grammatical structure in codeswitching. By Carol Myers-Scotton. Oxford: Oxford University Press, 1993. Pp. xiv, 263. Cloth $45.00. · Language1995

One Speaker, Two Languages · Cambridge University Press eBooks1995

Social Motivations for Codeswitching: Evidence from Africa · African Studies Review1994

Socio-linguistics: A study of code-switching · Medical Entomology and Zoology1994

Code-Mixing in Hongkong Cantonese-English Bilinguals: Constraints and Processes.1993

Duelling Languages1993

Social Motivations For Codeswitching1993

Code‐switching and code‐mixing: The case of a child learning English and Chinese simultaneously · Journal of Multilingual and Multicultural Development1992

Code-Switching and Code-Mixing: The Case of a Child Learning English and Chinese Simultaneously. · Journal of Multilingual and Multicultural Development1992

Codeswitching with English: types of switching, types of communities · World Englishes1989

A tutorial on hidden Markov models and selected applications in speech recognition · Proceedings of the IEEE1989

A formal grammar for code‐switching1 · Paper in Linguistics1981

Mandarinenglish code switching speech corpus in south east asia seame

Publication

Abstract

Extracted information

Edits / History

Cited by

References