New publications:
______________________________________________________
(1) Abstract Meaning Representation
(AMR) Annotation Release 2.0 was developed by
LDC, SDL/Language
Weaver, Inc., the University of
Colorado's Computational
Language and Educational Research group and
the Information Sciences
Institute at the University of Southern
California. It contains a sembank (semantic treebank) of over 39,260 English
natural language sentences from broadcast conversations, newswire, weblogs and
web discussion forums.
AMR captures “who is doing what
to whom” in a sentence. Each sentence is paired with a graph that represents
its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames,
non-core semantic roles, within-sentence coreference, named entity annotation,
modality, negation, questions, quantities, and so on to represent the semantic
structure of a sentence largely independent of its syntax.
LDC also released Abstract
Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12).
Abstract Meaning Representation
(AMR) Annotation Release 2.0 is distributed via web download.
2017 Subscription Members will
automatically receive copies of this corpus. 2017 Standard Members may request
a copy as part of their 16 free membership corpora. Non-members may license
this data for a fee.
*
(2) CHiME2 WSJ0 was developed as part of The 2nd CHiME
Speech Separation and Recognition Challenge and
contains approximately 166 hours of English speech from a noisy living room
environment. The CHiME Challenges focus on distant-microphone automatic speech
recognition (ASR) in real-world environments.
CHiME2 WSJ0 reflects the medium
vocabulary track of the CHiME2 Challenge.
The target utterances were taken from CSR-I (WSJ0) Complete (LDC93S6A), specifically, the 5,000 word subset of read speech from
Wall Street Journal news text. Data is divided into training, development and
test sets and includes baseline scoring, decoding and retraining tools.
CHiME2 WSJ0 is distributed
via web download.
2017 Subscription Members will
automatically receive copies of this corpus. 2017 Standard Members may request
a copy as part of their 16 free membership corpora. Non-members may license
this data for a fee.
*
In the field of speech production theory, data such as contained in this release may be used to study the relationship between vocal folds vibration and resulting voice quality.
None of the subjects had a history of a voice disorder. There was no native language requirement for recruiting subjects; participants were native speakers of various languages, including English, Mandarin Chinese, Taiwanese Mandarin, Cantonese and German.
UCLA High-Speed Laryngeal Video and Audio is distributed via
hard drive.
2017 Subscription Members will automatically
receive copies of this corpus. 2017 Standard Members may request a copy as part
of their 16 free membership corpora. Non-members may license this data for a
fee.