LDC at
ICASSP 2018
LDC at the
Philadelphia Science Carnival
New Publications:
New Publications:
_____________________________________________________________________
LDC at
ICASSP 2018
LDC will be exhibiting at ICASSP 2018, held this year April 15-20
in Calgary, Canada. Stop by booth B2 to learn more about recent developments at
the Consortium and new publications.
Also, be on the lookout for the following presentations featuring
LDC work:
Enhancement
and Analysis of Conversational Speech: JSALT 2017
Tuesday, April 17, 16:00 - 18:00
Session: Speech Analysis
Tuesday, April 17, 16:00 - 18:00
Session: Speech Analysis
Leveraging
LSTM Models for Overlap Detection in Multi-Party Meetings
Wednesday, April 18, 13:30 - 15:30
Session: Speaker Diarization & Identification
Wednesday, April 18, 13:30 - 15:30
Session: Speaker Diarization & Identification
A Novel
LSTM-based Speech Preprocessor for Speaker Diarization in Realistic Mismatch
Conditions
Wednesday, April 18, 13:30 - 15:30
Session: Speaker Diarization & Identification
Wednesday, April 18, 13:30 - 15:30
Session: Speaker Diarization & Identification
LDC will post conference updates via our Twitter feed and Facebook
page. We hope to see you there!
LDC at the
Philadelphia Science Carnival
LDC will share the fun of language with the community on Saturday,
April 28, with a booth at the Philadelphia Science Carnival. Visitors
will enjoy three language-oriented educational activities that include a
language identification game and Chinese character recognition.
The
Philadelphia Science Carnival is an annual event organized by Philadelphia’s
Franklin Institute to acquaint children and adults with the joys of science.
New publications:
(1)
Concretely
Annotated New York Times was developed by Johns Hopkins University's Human Language Technology Center of
Excellence. It adds multiple kinds and instances of automatically-generated
syntactic, semantic, and coreference annotations to The New York Times
Annotated Corpus (LDC2008T19).
Concrete is
a schema for representing structured, hierarchical, and overlapping linguistic
annotations. This release provides multiple tool outputs producing the same annotation
types as different annotation theories under a shared tokenization. Concretely
Annotated New York Times contains all of the 1.8 million articles in The New
York Times Annotated Corpus.
Concretely
Annotated New York Times is distributed via hard drive.
2018
Subscription Members will receive copies of this corpus provided they have
submitted a completed copy of the special license agreement. 2018 Standard
Members may request a copy as part of their 16 free membership corpora. Any organization that licensed The New York Times Annotated Corpus
(LDC2008T19) may request a copy of Concretely Annotated New York Times
(LDC2018T12) for a $250 media fee. Non-members
may license this data for a fee.
*
(2) H2, E2, ERK1 Children's Writing was
developed by the Cooperative State University
Baden-Württemberg, University of Education. It consists of approximately 2,000
texts written over four months by 173 German school children age six through
eleven years. The data in this corpus was collected by elementary schools in
Baden Württemberg, Germany, and digitized at the Cooperative State University
during the 2016/2017 school year. Three second, third, and fourth grade
classrooms participated in the collection. Texts were written within regular
class settings. The students were presented with a picture and were asked to
write a story to describe the picture or, if unable to write a text, to list
what they saw in the picture.
There were 173 total participants. 100 students were multilingual,
and further metadata is available for 166 of the 173 children. The following is
included for each text in the database: school week of collection; school type;
age; gender; grade/classroom; language spoken at home; and school materials
used.
LDC has also released H1 Children's Writing (LDC2016T01).
LDC has also released H1 Children's Writing (LDC2016T01).
H2, E2, ERK1
Children's Writing is distributed via web download.
2018 Subscription
Members will receive copies of this corpus provided they have submitted a
completed copy of the special license agreement. 2018 Standard Members may
request a copy as part of their 16 free membership corpora. Non-members may
license this data for a fee.
*
(3) TRAD
Arabic-French Parallel Text -- Newsgroup was developed by ELDA as part of the PEA-TRAD project. It contains French translations
of a subset of approximately 10,000 Arabic words from GALE Phase 1 Arabic
Newsgroup Parallel Text - Part 1 (LDC2009T03). The PEA-TRAD project
(Translation as a Support for Document Analysis) was supported by the French
Ministry of Defense (DGA). Its purpose was to develop speech-to-speech
translation technology for multiple languages (e.g., Arabic, Chinese, Pashto)
from a variety of domains. This release consists of 398 segments (translations
units) from 17 documents. The source data is Arabic newsgroup text collected
and translated into English by LDC for the DARPA GALE (Global Autonomous
Language Exploitation) program.
LDC
has also released TRAD Chinese-French Parallel Text -- Blog (LDC2018T02).
TRAD Arabic-French Parallel Text
-- Newsgroup is distributed
via web download.
2018 Subscription Members will receive
copies of this corpus provided they have submitted a completed copy of the
special license agreement. 2018 Standard Members may request a copy as part of
their 16 free membership corpora. Non-members may license this data for a fee.