Membership Discounts for MY2018 Still Available
New Publications:
New Publications:
___________________________________________________________________________
Membership Discounts for MY2018 Still Available
Join LDC while membership savings are still available. Now
through March 1, 2018, renewing MY2017 members will receive a 10% discount off
the membership fee. New or non-consecutive member organizations will receive a
5% discount. Membership remains the most economical way to access LDC
releases. This year’s planned publications include Multilanguage
Conversational Telephone Speech, IARPA Babel Language Packs (telephone speech
and transcripts), DIRHA (Distant-speech Interaction for Robust Home
Applications), TRAD (Chinese-French and Arabic-French parallel text), data from
BOLT, DEFT, LORELEI, RATS and TAC KBP, and more. Browse the Members pages for details
on membership options and benefits.
New publications:
(1) DEFT
Spanish Treebank was developed by LDC and the Language
and Computation Center (CLiC), University of Barcelona. It contains
treebank annotation of international Spanish newswire text and Latin American
Spanish discussion forum data created for the DARPA Deep Exploration and
Filtering of Text (DEFT) program. DEFT Spanish Treebank supported the program's
goal of deep natural language understanding.
Newswire source files were selected from Spanish Gigaword
Third Edition (LDC2011T12)
and were manually sentence-segmented for DEFT. Discussion forum source files
were selected from Spanish discussion forum source data collected by LDC,
consisting of continuous multi-posts of 100-1000 words.
This release contains 114 files (54,394 tokens) of newswire
data and 60 files (55,307 tokens) of discussion forum data all of which were
annotated with constituents and syntactic functions.
DEFT Spanish Treebank is distributed via web download.
2018 Subscription Members will receive copies of this
corpus. 2018 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for a fee.
*
(2) DIRHA
English WSJ Audio was developed as part of the Distant-Speech Interaction for Robust Home
Applications (DIRHA) Project which addressed natural spontaneous speech
interaction with distant microphones in a domestic environment. It is comprised
of approximately 85 hours of real and simulated read speech by six native
American English speakers. The target utterances were taken from CSR-I (WSJ0)
Complete (LDC93S6A),
specifically, the 5,000 word subset of read speech from Wall Street Journal
news text.
Speech was collected in a real apartment setting with
typical domestic background noise and inter/intra-room reverberation effects.
Annotations, speaker metadata and images of the apartment setting are also
included.
DIRHA English WSJ Audio is distributed via web download.
2018 Subscription Members will receive copies of this corpus
provided they have submitted a completed copy of the special license agreement.
2018 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.
*
(3) TRAD
Chinese-French Parallel Text -- Blog was developed by ELDA as part of the
PEA-TRAD project. It contains French translations of a subset of approximately
10,000 Chinese words from GALE Phase 1 Chinese Blog Parallel Text (LDC2008T06).
The PEA-TRAD project (Translation as a Support for Document
Analysis) was supported by the French Ministry of Defense (DGA). Its purpose
was to develop speech-to-speech translation technology for multiple languages
(e.g., Arabic, Chinese, Pashto) from a variety of domains.
The source data for TRAD Chinese-French Parallel Text is
Chinese blog text collected and translated into English by LDC for the DARPA
GALE (Global Autonomous Language Exploitation) program. Information about the
ELDA translation team, translation guidelines and validation results is
contained in the documentation accompanying this release.
TRAD Chinese-French Parallel Text -- Blog is distributed via
web download.
2018 Subscription Members will receive copies of this corpus
provided they have submitted a completed copy of the special license agreement.
2018 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.