LDC Membership Discounts for MY2020 Still Available
Spring 2020 Data Scholarship Program – deadline approaching
Introducing LanguageArc: A Citizen Linguist Portal
New Publications:
MagicData Chinese Mandarin Conversational Speech
BOLT Egyptian Arabic-EnglishWord Alignment -- SMS/Chat Training
TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017
__________________________________________________________
LDC Membership Discounts for MY2020 Still Available
Join LDC while membership savings are still available. Now through March 2, 2020, current MY2019 members who renew their LDC membership receive a 10% discount off the membership fee. New or returning member organizations receive a 5% discount through March 2. Membership remains the most economical way to access LDC releases. Visit Join LDC for details on membership options and benefits.
Spring 2020 Data Scholarship Program – deadline approaching
Students can apply for the Spring 2020 Data Scholarship Program now through January 15, 2020. The LDC Data Scholarship program provides students with no-cost access to LDC data. For more information on application requirements and program rules, please visit LDC Data Scholarships.
Introducing LanguageArc: A Citizen Linguist Portal
LanguageARC is a citizen science website for languages developed with a grant from the National Science Foundation (no. 170377). Contributors to this online community – “citizen linguists” – participate in a variety of tasks and activities that support linguistic research, such as identifying accents from audio clips, recording “tongue twisters,” and translating English sentences into other languages. Data collected from LanguageArc will be made freely available to the research community. New collection and annotation projects will be added on an ongoing basis, and researchers will soon be able to create their own LanugageArc projects with an easy-to-use Project Builder Toolkit. All are encouraged to explore the site and participate in the community. Comments, questions and suggestions are welcome via the site’s Contact page.
___________________________________________________________
New publications:
Spring 2020 Data Scholarship Program – deadline approaching
Introducing LanguageArc: A Citizen Linguist Portal
New Publications:
MagicData Chinese Mandarin Conversational Speech
BOLT Egyptian Arabic-EnglishWord Alignment -- SMS/Chat Training
TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017
__________________________________________________________
LDC Membership Discounts for MY2020 Still Available
Join LDC while membership savings are still available. Now through March 2, 2020, current MY2019 members who renew their LDC membership receive a 10% discount off the membership fee. New or returning member organizations receive a 5% discount through March 2. Membership remains the most economical way to access LDC releases. Visit Join LDC for details on membership options and benefits.
Spring 2020 Data Scholarship Program – deadline approaching
Students can apply for the Spring 2020 Data Scholarship Program now through January 15, 2020. The LDC Data Scholarship program provides students with no-cost access to LDC data. For more information on application requirements and program rules, please visit LDC Data Scholarships.
Introducing LanguageArc: A Citizen Linguist Portal
LanguageARC is a citizen science website for languages developed with a grant from the National Science Foundation (no. 170377). Contributors to this online community – “citizen linguists” – participate in a variety of tasks and activities that support linguistic research, such as identifying accents from audio clips, recording “tongue twisters,” and translating English sentences into other languages. Data collected from LanguageArc will be made freely available to the research community. New collection and annotation projects will be added on an ongoing basis, and researchers will soon be able to create their own LanugageArc projects with an easy-to-use Project Builder Toolkit. All are encouraged to explore the site and participate in the community. Comments, questions and suggestions are welcome via the site’s Contact page.
___________________________________________________________
New publications:
(1) Magic Data Chinese Mandarin
Conversational Speech was developed by Beijing Magic Data Technology Co., Ltd.
and consists of approximately 10 hours of Mandarin conversational speech from
60 speakers. Each conversation was recorded on multiple devices and is
presented in multiple forms, resulting in a total of approximately 60 hours of
audio with corresponding transcripts.
All participants were native speakers of Mandarin in Mainland China from accent regions across the country. Speakers were paired for conversations on a range of topics, including travel, fitness, games, sports and pets. Metadata such as topic, collection date, mobile device and speaker demographic information is available in the documentation accompanying this release.
Magic Data Chinese Mandarin Conversational Speech is
distributed via web download.
2019 Subscription Members will automatically receive copies
of this corpus. 2019 Standard Members may request a copy as part of their 16
free membership corpora. Non-members may license this data for a fee.
(2) BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training was developed by LDC and consists of 349,414 words of Egyptian Arabic and English parallel text enhanced with linguistic tags to indicate word relations.
*
(2) BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training was developed by LDC and consists of 349,414 words of Egyptian Arabic and English parallel text enhanced with linguistic tags to indicate word relations.
This release contains Egyptian Arabic source text message and chat conversations collected using two methods: new collection via LDC's collection platform, and donation of SMS or chat archives from BOLT collection participants. The source data is released as BOLT Egyptian Arabic SMS/Chat and Transliteration (LDC2017T07).
The BOLT word alignment task was built on treebank annotation. Egyptian Arabic source tree tokens were automatically extracted from tree files in LDC’s BOLT Egyptian Arabic Treebank, which had been tagged for part-of-speech and syntactically annotated. That data was then aligned and annotated for the word alignment task.
BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training is distributed via web download.
2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.
*
Source data for the annotations consists of Chinese, English and Spanish newswire and discussion forum text collected by LDC and is available in TAC KBP Evaluation Source Corpora 2016-2017 (LDC2019T12).
TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017 is distributed via web download.
2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.