Call for Papers - LTC 2019, LREC 2020
New Publications:
___________________________________________________________
Call for Papers
The 9th Language & Technology Conference (LTC 2019)
will take place on May 17-19, 2019 at the Adam Mickiewicz University in PoznaĆ,
Poland. LTC addresses Human Language Technologies as a challenge for computer
science, linguistics and related fields. Conference papers are due next week on
Wednesday, March 20, 2019 (midnight, any time zone). For more information,
visit the conference webpage.
The 12th Conference on Language Resources and
Evaluation (LREC 2020) will take place on May 13-15, 2020 at the Palais du
Pharo in Marseille, France. LREC aims to provide an overview of the
state-of-the-art, explore new R&D directions and emerging trends, and
exchange information regarding language resources and their applications,
evaluation methodologies and tools. Conference papers are due by November 25,
2019. For more information, including conference topics, visit the conference webpage.
New Publications:
(1) CALLFRIEND Egyptian Arabic Second Edition was developed by LDC and consists of approximately 25
hours of unscripted telephone conversations between native speakers of Egyptian
Arabic. This second edition updates the audio files to wav format, simplifies
the directory structure and adds documentation and metadata. The first edition
is available as CALLFRIEND Egyptian Arabic (LDC96S49).
All data was collected before July 1997. Participants could
speak with a person of their choice on any topic; most called family members
and friends. All calls originated in North America. The recorded conversations
last up to 30 minutes.
CALLFRIEND Egyptian Arabic Second Edition is distributed via
web download.
2019 Subscription Members will automatically receive copies
of this corpus. 2019 Standard Members may request a copy as part of their 16
free membership corpora. Non-members may license this data for a fee.
*
(2) Penn Discourse Treebank Version
3.0 is the third release in the Penn Discourse Treebank project, the
goal of which is to annotate the Wall Street Journal (WSJ) section of
Treebank-2 (LDC95T7) with discourse
relations. Penn Discourse Treebank Version 2 (LDC2008T05) contains
over 40,600 tokens of annotated relations. In Version 3, an additional 13,000
tokens were annotated, certain pairwise annotations were standardized, new
senses were included and the corpus was subject to a series of consistency
checks.
This corpus contains two tools: (1) The Annotator, used for
annotation and adjudication, and which can also be used for viewing the corpus;
and (2) The Conversion Tool for converting Version 2 annotation files into the
Version 3 format.
The documentation directory contains a manual describing
what is new in Version 3 and how Version 3 differs from Version 2; the methods
and guidelines used in annotating PDTB Version 3; and a range of statistics on
the tokens, including the frequency of each connective, its sense labels and
its modifiers. More information about the corpus and research carried out by
the developers and others using the corpus can be found on the PDTB website.
Penn Discourse Treebank Version 3.0 is distributed via web
download.
2019 Subscription Members will automatically receive copies
of this corpus. 2019 Standard Members may request a copy as part of their 16
free membership corpora. Non-members may license this data for a fee.
*
(3) VAST Chinese Speech and Transcripts
was developed by LDC for the VAST (Video Annotation for Speech Technologies)
project and is comprised of approximately 29 hours of Mandarin Chinese audio
extracted from amateur video content harvested from the web and corresponding
time-aligned transcripts.
Audio files were transcribed using XTrans,
which supports manual transcription across multiple channels, languages and
platforms. Transcribers followed a Quick-Rich Transcription style;
transcription guidelines are included in this release.
The aim of the VAST
project was to collect and annotate data in several languages to support the
development of speech technologies such as speech activity detection, language
identification, speaker identification, and speech recognition.
VAST Chinese Speech and Transcripts is distributed via web
download.
2019 Subscription Members will automatically receive copies
of this corpus. 2019 Standard Members may request a copy as part of their 16
free membership corpora. Non-members may license this data for a fee.