LDC releases speech activity detector
Fall 2023 LDC Data Scholarship Program
Student applications for the Fall 2023 LDC Data Scholarship program are being accepted now through September 15, 2023. This program provides eligible students with no-cost access to LDC data. Students must complete an application consisting of a data use proposal and letter of support from their advisor. For application requirements and program rules, visit the LDC Data Scholarships page.
New publications:
2019 OpenSAT Public Safety Communications Simulation contains 141 hours of English speech recordings and transcripts used in the NIST Open Speech Analytic Technologies (OpenSAT) 2019 evaluation's automatic speech recognition, speech activity detection, and keyword search tasks. The data is part of the SAFE-T (Speech Analysis For Emergency Response Technology) corpus created by LDC which is comprised of speakers engaged in a collaborative problem-solving activity representative of public safety communications in terms of speech content, noise types, and noise levels. US English speakers played the board game Flash Point Fire Rescue. Background noise was played through a participant's headset during the recording session. Recording sessions consisted of 2 30-minute games. The corpus is divided into training, development, and evaluation data. 2023 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
Samrómur Queries Icelandic Speech 1.0 was developed by the Language and Voice Lab, Reykjavik University in cooperation with Almannarómur, Center for Language Technology. The corpus contains 20 hours of Icelandic prompted queries from 3,809 speakers representing 17,475 utterances.
Speech data was collected between October 2019 and December 2021 using the Samrómur website which displayed prompts to participants. The prompts were mainly from The Icelandic Gigaword Corpus, which includes text from novels, news, plays, and from a list of location names in Iceland. Additional prompts were taken from the Icelandic Web of Science and others were created by combining a name followed by a question or a demand. Prompts and speaker metadata are included in the corpus. 2023 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
No comments:
Post a Comment