Monday, June 17, 2024

LDC June 2024 Newsletter

LDC data and commercial technology development

New publications:

Diaspora Tibetan Speech

AIDA Scenario 2 Practice Topic Annotation

_________________________________________________________________

LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing page for further information.

New publications:
 
Diaspora Tibetan Speech was developed at Yale University. It contains 28 hours of Tibetan elicited speech by 73 speakers from the diaspora Tibetan community in Kathmandu, Nepal, along with transcripts, elicitation materials and speaker metadata.

Recordings were collected in 2016. All speakers were adults and varied in age as well as age of diaspora. A substantial number of speakers were born in Nepal. Each speaker contributed one recording comprising a series of elicitation tasks: some demographic information; a word list and numbers; some sentences in isolation; a scripted story; and free speech based on "frog story" type illustrations.  Annotation and metadata formats include PDF and Word (some transcripts), Excel (some transcripts, speaker metadata) and Praat TextGrids (word and number lists). 

2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. 

*

AIDA Scenario 2 Practice Topic Annotation was developed by LDC and is comprised of annotations for 29 English, Russian and Spanish documents (text, image and video) from AIDA Scenario 2  Practice Topic Source Data (LDC2024T04), specifically, the set of practice documents designated for annotation in Phase 2.

Annotations are presented as tab separated files in the following categories for each topic:

  • Mentions: single references in source data to a real-world entity or filler, event, or relation. 
  • Slots: pre-defined roles in an event or relation filled by an argument (entity mention).
  • Linking: entity mentions linked to entries in the knowledge base as a method of indicating the real-world entity to which an entity referred.

2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.

 

 

No comments:

Post a Comment