Invitation to Join for Membership Year 2014
Spring 2014 LDC Data Scholarship Program
LDC to Close for Thanksgiving Break
New publications:
Chinese Treebank 8.0
CSC Deceptive Speech
Invitation
to
Join for Membership Year (MY) 2014
Membership Year (MY) 2014 is open for joining. We would like to invite all current and previous members of LDC to renew their membership as well as welcome new organizations to join the Consortium. For MY2014, LDC is pleased to maintain membership fees at last year’s rates – membership fees will not increase. Additionally, LDC will extend discounts on membership fees to members who keep their membership current and who join early in the year.
The details of our early renewal discounts for MY2014 are as follows:
· Organizations who joined for MY2013 will receive a 5% discount when renewing. This discount will apply throughout 2014, regardless of time of renewal. MY2013 members renewing before Monday, March 3, 2014 will receive an additional 5% discount, for a total 10% discount off the membership fee.
· New members as well as organizations who did not join for MY2013, but who held membership in any of the previous MYs (1993-2012), will also be eligible for a 5% discount provided that they join/renew before March 3, 2014.
Not-for-Profit/US Government
Standard US$2400 (MY 2014 Fee)
US$2280 (with 5% discount)*
US$2160 (with 10% discount)**
Subscription US$3850 (MY 2014 Fee)
US$3658 (with 5% discount)*
US$3465 (with 10% discount)**
For-Profit
Standard US$24000 (MY 2014 Fee)
US$22800 (with 5% discount)*
US$21600 (with 10% discount)**
Subscription US$27500 (MY 2014 Fee)
US$26125 (with 5% discount)*
US$24750 (with 10% discount)**
* For new members, MY2013 Members renewing for MY2014, and any previous year Member who renews before March 3, 2014
** For MY2013 Members renewing before March 3, 2014
Publications for MY2014 are still being planned; here are the working titles of data sets we intend to provide:
2009 NIST Language Recognition Evaluation
Callfriend Farsi Speech and Transcripts
GALE data -- all phases and genres
Hispanic-English Speech
MADCAT Phase 4 Training
MALACH Czech ASR
NIST OpenMT Five Language Progress Set
Membership Year (MY) 2014 is open for joining. We would like to invite all current and previous members of LDC to renew their membership as well as welcome new organizations to join the Consortium. For MY2014, LDC is pleased to maintain membership fees at last year’s rates – membership fees will not increase. Additionally, LDC will extend discounts on membership fees to members who keep their membership current and who join early in the year.
The details of our early renewal discounts for MY2014 are as follows:
· Organizations who joined for MY2013 will receive a 5% discount when renewing. This discount will apply throughout 2014, regardless of time of renewal. MY2013 members renewing before Monday, March 3, 2014 will receive an additional 5% discount, for a total 10% discount off the membership fee.
· New members as well as organizations who did not join for MY2013, but who held membership in any of the previous MYs (1993-2012), will also be eligible for a 5% discount provided that they join/renew before March 3, 2014.
Not-for-Profit/US Government
Standard US$2400 (MY 2014 Fee)
US$2280 (with 5% discount)*
US$2160 (with 10% discount)**
Subscription US$3850 (MY 2014 Fee)
US$3658 (with 5% discount)*
US$3465 (with 10% discount)**
For-Profit
Standard US$24000 (MY 2014 Fee)
US$22800 (with 5% discount)*
US$21600 (with 10% discount)**
Subscription US$27500 (MY 2014 Fee)
US$26125 (with 5% discount)*
US$24750 (with 10% discount)**
* For new members, MY2013 Members renewing for MY2014, and any previous year Member who renews before March 3, 2014
** For MY2013 Members renewing before March 3, 2014
Publications for MY2014 are still being planned; here are the working titles of data sets we intend to provide:
2009 NIST Language Recognition Evaluation
Callfriend Farsi Speech and Transcripts
GALE data -- all phases and genres
Hispanic-English Speech
MADCAT Phase 4 Training
MALACH Czech ASR
NIST OpenMT Five Language Progress Set
In addition to receiving new publications, current year members of LDC also enjoy the benefit of licensing older data at reduced costs; current year for-profit members may use most data for commercial applications.
Spring 2014 LDC Data Scholarship Program
Applications are now being accepted through Wednesday, January 15, 2014, 11:59PM EST for the Spring 20143 LDC Data Scholarship program! The LDC Data Scholarship program provides university students with access to LDC data at no-cost. During previous program cycles, LDC has awarded no-cost copies of LDC data to over 35 individual students and student research groups.
This program is open to students pursuing both undergraduate and graduate studies in an accredited college or university. LDC Data Scholarships are not restricted to any particular field of study; however, students must demonstrate a well-developed research agenda and a bona fide inability to pay. The selection process is highly competitive.
The application consists of two parts:
(1) Data Use Proposal. Applicants must submit a proposal describing their intended use of the data. The proposal should state which data the student plans to use and how the data will benefit their research project as well as information on the proposed methodology or algorithm.
Applicants should consult the LDC Catalog for a complete list of data distributed by LDC. Due to certain restrictions, a handful of LDC corpora are restricted to members of the Consortium. Applicants are advised to select a maximum of one to two datasets; students may apply for additional datasets during the following cycle once they have completed processing of the initial datasets and publish or present work in some juried venue.
(2) Letter of Support. Applicants must submit one letter of support from their thesis adviser or department chair. The letter must verify the student's need for data and confirm that the department or university lacks the funding to pay the full Non-member Fee for the data or to join the Consortium.
For further information on application materials and program rules, please visit the LDC Data Scholarship page.
Students can email their applications to the LDC Data Scholarship program. Decisions will be sent by email from the same address.
The deadline for the Spring 2014 program cycle is January 15, 2014, 11:59PM EST.
LDC to Close for Thanksgiving Break
LDC will be closed on Thursday, November 28, 2013 and Friday, November 29, 2013 in observance of the US Thanksgiving Holiday. Our offices will reopen on Monday, December 2, 2013.
New publications
Chinese Treebank 8.0 consists of approximately 1.5 million words of annotated and parsed text from Chinese newswire, government documents, magazine articles, various broadcast news and broadcast conversation programs, web newsgroups and weblogs.
Chinese Treebank 8.0 consists of approximately 1.5 million words of annotated and parsed text from Chinese newswire, government documents, magazine articles, various broadcast news and broadcast conversation programs, web newsgroups and weblogs.
The
Chinese Treebank project began at the University of Pennsylvania
in 1998, continued at the University of Colorado and then moved
to Brandeis University. The project’s goal is
to provide a large, part-of-speech tagged and fully bracketed
Chinese language corpus. The first delivery, Chinese Treebank
1.0, contained 100,000 syntactically annotated words from Xinhua
News Agency newswire. It was later corrected and released in
2001 as Chinese Treebank 2.0
(LDC2001T11) and consisted of
approximately 100,000 words. The LDC released Chinese Treebank 4.0
(LDC2004T05), an updated version
containing roughly 400,000 words, in 2004. A year later, LDC
published the 500,000 word Chinese Treebank 5.0
(LDC2005T01). Chinese Treebank 6.0
(LDC2007T36), released in 2007,
consisted of 780,000 words. Chinese Treebank 7.0
(LDC2010T08), released in 2010,
added new annotated newswire data, broadcast material and web
text to the approximate total of one million words. Chinese
Treebank 8.0 adds new annotated data from newswire, magazine
articles and government documents.
There
are 3,007 text files in this release, containing 71,369
sentences, 1,620,561 words, 2,589,848 characters (hanzi or
foreign). The data is provided in UTF-8 encoding, and the
annotation has Penn Treebank-style labeled brackets. Details of
the annotation standard can be found in the segmentation, POS-tagging and
bracketing guidelines included in the release. The data is
provided in four different formats: raw text, word segmented,
POS-tagged, and syntactically bracketed formats. All files were
automatically verified and manually checked.
Chinese
Treebank 8.0 is distributed via web download. 2013
Subscription Members will automatically receive two copies of
this data on disc.2013 Standard Members may request a copy as
part of their 16 free membership corpora. Non-members may
license this data for a fee.
*
CSC Deceptive Speech was developed by Columbia University, SRI International and University of Colorado Boulder. It consists of 32 hours of audio interview from 32 native speakers of Standard American English (16 male, 16 female) recruited from the Columbia University student population and the community. The purpose of the study was to distinguish deceptive speech from non-deceptive speech using machine learning techniques on extracted features from the corpus.
The
participants were told that they were participating in a
communication experiment which sought to identify people who fit
the profile of the top entrepreneurs in America. To this end,
the participants performed tasks and answered questions in six
areas. Tthey were later told that they had received low scores
in some of those areas and did not fit the profile. The subjects
then participated in an interview where they were told to
convince the interviewer that they had actually achieved high
scores in all areas and that they did indeed fit the profile.
The task of the interviewer was to determine how he thought the
subjects had actually performed, and he was allowed to ask them
any questions other than those that were part of the performed
tasks. For each question from the interviewer, subjects were
asked to indicate whether the reply was true or contained any
false information by pressing one of two pedals hidden from the
interviewer under a table.
Interviews
were conducted in a double-walled sound booth and recorded to
digital audio tape on two channels using Crown CM311A Differoid
headworn close-talking microphones, then down sampled to 16kHz
before processing.
The
interviews were orthographically transcribed by hand using the
NIST EARS transcription guidelines. Labels for local lies were
obtained automatically from the pedal-press data and
hand-corrected for alignment, and labels for global lies were
annotated during transcription based on the known scores of the
subjects versus their reported scores. The orthographic
transcription was force-aligned using the SRI telephone speech
recognizer adapted for full-bandwidth recordings. There are
several segmentations associated with the corpus: the implicit
segmentation of the pedal presses, derived semi-automatically
sentence-like units (EARS SLASH-UNITS or SUs) which were hand
labeled, intonational phrase units and the units corresponding
to each topic of the interview.
CSC
Deceptive Speech is distributed on 1 DVD-ROM. 2013
Subscription Members will automatically receive two copies of
this data provided they have completed and returned the User License
Agreement for CSC Deceptive Speech (LDC2013S09). 2013 Standard Members may
request a copy as part of their 16 free membership corpora.
Non-members may license this data for a fee.
For FY14, you list "NIST OpenMT Five Language Progress Set." Can you say what the five languages will be?
ReplyDeleteMike Maxwell (I'm signed in, but it looks like I'm posting as
"Unknown (Google)" for some reason)