A list of Korean Acoustic Corpora

December 21, 2022

The Speech Corpus of Reading-Style Standard Korean (NIKL 2005; https://github.com/homink/speech.ko)

The Korean Corpus of Spontaneous Speech (http://koreascience.or.kr/article/JAKO201521159149292.page)
 Pansori-TEDxKR (https://github.com/yc9701/pansori-tedxkr-corpus)
CloveCall (https://github.com/ClovaAI/ClovaCall)


AIHub (https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=123)
1. Around 1,000 hours
2. Spontaneous speech
3. 2,000 speakers
4. Conversation between two people about various topics (e.g., weather, economics)
5. ERTI transcription rule
6. File: Segmented at the utterance level (long pause; format: 16kHz/16bits, headerless (endian) linear PCM) and transcribed (format: EUC-KR)

Zeroth (
https://github.com/goodatlas/zeroth)
KsponSpeech (https://aihub.or.kr/aidata/105)

Categories: NLP, Data

Original post: https://cheonkamjeong.blogspot.com/2022/12/nlp-list-of-korean-acoustic-corpora.html