Showing posts with label podcast. Show all posts
Showing posts with label podcast. Show all posts

Tuesday, January 29, 2013

LDC 20th Anniversary Podcast: Christopher Cieri



LDC is moving towards the end of its Anniversary year, but that does not mean that we don’t have a few more treats for you. This month’s podcast features LDC’s Executive Director, Christopher Cieri.

Chris is involved with every aspect of the Consortium, including planning, development, operations, sponsored projects, external relations and financial performance. In this podcast, Chris reflects on the road that took him to LDC, some of his early responsibilities and recent consortium activities. 


Friday, January 11, 2013

LDC 20th Anniversary Podcast: Mohamed Maamouri



Happy New Year and welcome back to the LDC Blog. For our first post of the year, we present the fourth podcast in our anniversary series featuring LDC Senior Researcher, Mohamed Maamouri.

Mohamed directs the Arabic Treebank group and spearheads the development of Arabic resources and projects. The latter includes the leading role in LDC’s collaboration with Georgetown University Press to develop updated versions of three dialectal Arabic dictionaries (Iraqi, Moroccan, Syrian). Mohamed specializes in Arabic linguistics, reading, language development, corpus linguistics and sociolinguistics. In this podcast, he reflects on his personal and professional experiences and comments on Arabic resource development at LDC.

Friday, December 14, 2012

LDC 20th Anniversary Podcasts: Yiwola Awoyale and Moussa Bamba

The third podcast in the series shifts gears and introduces two LDC researchers, Yiwola Awoyale and Moussa Bamba, whose work focuses on West African languages. 

Yiwola has been teaching linguistics, Yoruba language studies and various aspects of African linguistics since 1975. At LDC, he developed the Global Yoruba Lexical Database, a set of related dictionaries based on Yoruba and its diaspora. Moussa’s work in the Manding languages of the Niger-Congo family has resulted in the release of the Mawukakan Lexicon, to be followed by similar resources for Maninkakan, Bambara, and Jula. 

In their podcast, Yiwola and Moussa discuss how they came to LDC, their current research and how it benefits multiple communities.

Click here for Yiwola and Moussa's podcast.

Wednesday, December 5, 2012

LDC 20th Anniversary Podcasts: Natalia Bragilevskaya, Ilya Ahtaridis and Marian Reed

LDC has put together a second podcast for your listening enjoyment. This issue builds upon the origin of LDC while focusing on LDC's interactions with the outside world. Natalia is the head of LDC's Business Office while Ilya and Marian are the twin engines of LDC's External Relations group (LDC's Membership Coordinator and Marketing Coordinator, respectively). As was the case with David Graff's October 2012 podcast, John Vogel conducted each of these interviews.

As a reminder, LDC podcasts are issued as part of our celebration of our 20th Anniversary year. Please stay tuned for later editions.

Click here for Natasha, Ilya and Marian's podcast.  

Thursday, October 18, 2012

LDC October 2012 Newsletter


New publications:
LDC2012T20
LDC2012T18




Fall 2012 LDC Data Scholarship Recipients
LDC is pleased to announce the student recipients of the Fall 2012 LDC Data Scholarship program!  This program provides university and college students with access to LDC data at no-cost. Students were asked to complete an application which consisted of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. We received many solid applications and have chosen six  proposals to support.   The following students will receive no-cost copies of LDC data:
Jaffar Atwan - National University of Malaysia (Malaysia), Phd  candidate, Information Science and Technology.  Jaffar has been awarded a copy of Arabic Newswire Part 1 (LDC2001T55) for his work in information retrieval.

Sarath Chandar - Indian Institute of Technology, Madras (India), MS candidate, Computer Science and Engineering.  Sarath has been awarded a copy of Treebank-3 (LDC99T42) for his work in grammar induction.

Kuruvachan K. George - Amrita Vishwa Vidyapeetham (India), Phd Candidate, Electrical and Computer Engineering.  Kuruvachan has been awarded a copy of Fisher English Part 2 (LDC2005S13/T19) and 2008 NIST Speaker Recognition Evaluation data (LDC2011S05/07/08/11) for his work in speaker recognition.
Eduardo Motta - Pontifícia Universidade Católica do Rio de Janeiro (Brazil), Phd candidate, Information Sciences.  Eduardo has been awarded a copy of English Web Treebank (LDC2012T13) for his work in machine learning.
Genevieve Sapijaszko - University of Central Florida (USA), Phd Candidate, Electrical and Computer Engineering.  Genevieve has been awarded a copy TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) and YOHO Speaker Verification (LDC94S16) for her work in digital signal processing.

John Steinberg - Temple University (USA), MS candidate, Electrical and Computer Engineering.  John has been awarded a copy of CALLHOME Mandarin Chinese Lexicon (LDC96L15) and CALLHOME Mandarin Chinese Transcripts (LDC96T16) for his work in speech recognition.
LDC Exhibiting at NWAV 41
LDC will be exhibiting at the 41st New Ways of Analyzing Variation Conference (NWAV 41) in late October. This marks the fifth time that LDC has been an NWAV exhibitor and we are proud to show our continued support of the sociolinguistic research community.
The conference runs from October 25-28 and the exhibition hall will be open from October 26-28, 2012. Please stop by to say hello!

LDC 20th Anniversary Workshop Wrap-up
In early September, LDC hosted a workshop entitled “The Future of Language Resources” in celebration of  our 20th anniversary. Visit the Program page to browse speaker abstracts and to access pdfs of the presentations. Thanks to the speakers and attendees for making the workshop a success!

LDC 20th Anniversary Podcasts
To further celebrate our 20th Anniversary, LDC is conducting  interviews of long-time staff members for their unique perspectives on the Consortium’s growth and evolution over the past two decades. The first interview podcast debuts this month and features Dave Graff, LDC’s Lead Programmer. Visit the LDC blog to access the podcast.
Other podcasts will  be  published via the LDC blog, so stay tuned to that space.

Language Resource Wiki
The Language Resource Wiki catalogs data, software, descriptive grammars and other resources for a variety of languages but especially those with a paucity of generally available resources for research. LDC is actively seeking editors knowledgeable in these and other languages to develop and maintain the pages, which are readable by anyone but writable only by editors. The wiki currently has resource listings for: Bengali, Berber, Breton, Ewe, Greek (Ancient), Indonesian, Hindi, Latin, Panjabi, Pashto, Sorani (Central Kurdish), Russian, Tagalog, Tamil, and Urdu, and for the following Sign Languages: American, British, Catalan, Dutch, Flemish, German, Japanese, New Zealand, Polish, Spanish, and Swiss German.

New publications
(1) GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire was developed by LDC and contains 169,080 tokens of word aligned Chinese and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program. 
Some approaches to statistical machine translation include the incorporation of linguistic knowledge in word aligned text as a means to improve automatic word alignment and machine translation quality. This is accomplished with two annotation schemes: alignment and tagging. Alignment identifies minimum translation units and translation relations by using minimum-match and attachment annotation approaches. A set of word tags and alignment link tags are designed in the tagging scheme to describe these translation units and relations. Tagging adds contextual, syntactic and language-specific features to the alignment annotation.
The Chinese word alignment tasks consisted of the following components:
Identifying, aligning, and tagging 8 different types of links
Identifying, attaching, and tagging local-level unmatched words
Identifying and tagging sentence/discourse-level unmatched words
Identifying and tagging all instances of Chinese 的(DE) except when they were a part of a semantic link.
GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire is distributed via web download. 2012 Subscription Members will automatically receive two copies of this data on disc. 2012 Standard Members may request a copy as part of their 16 free membership corpora. 
*

(2) GALE Phase 2 Arabic Broadcast News Parallel Text was developed by LDC, and along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA GALE (Global Autonomous Language Exploitation) Program. This corpus contains Modern Standard Arabic source text and corresponding English translations selected from broadcast news (BN) data collected by LDC between 2005 and 2007 and transcribed by LDC or under its direction.
GALE Phase 2 Arabic Broadcast News Parallel Text includes seven source-translation pairs, comprising 29,210 words of Arabic source text and its English translation. Data is drawn from six distinct Arabic programs broadcast between 2005 and 2007 from Abu Dhabi TV, based in Abu Dhabi, United Arab Emirates; Al Alam News Channel, based in Iran; Aljazeera, a regional broadcast programmer based in Doha, Qatar; Dubai TV, based in Dubai, United Arab Emirates; and Kuwait TV, a national television station based in Kuwait. The BN programming in this release focuses on current events topics. 
The files in this release were transcribed by LDC staff and/or transcription vendors under contract to LDC in accordance with the Quick Rich Transcription guidelines developed by LDC. Transcribers indicated sentence boundaries in addition to transcribing the text. Data was manually selected for translation according to several criteria, including linguistic features, transcription features and topic features. The transcribed and segmented files were then reformatted into a human-readable translation format and assigned to translation vendors. Translators followed LDC's Arabic to English translation guidelines. Bilingual LDC staff performed quality control procedures on the completed translations.
GALE Phase 2 Arabic Broadcast News Parallel Text is distributed via web download. 2012 Subscription Members will automatically receive two copies of this data on disc. 2012 Standard Members may request a copy as part of their 16 free membership corpora. 

Thursday, October 11, 2012

LDC 20th Anniversary Podcasts: David Graff

As part of our 20th Anniversary celebrations, LDC is conducting interviews of long-time staff members for their unique perspectives on the Consortium's growth and evolution over the past two decades and for some insights into the future. We expect to make these interviews available as audio, video and text. The interviews are conducted by John Vogel, LDC part-time staffer, musician and video artist.

We begin with a series of podcasts. The first podcast features David Graff, LDC's Lead Programmer. Dave has been at LDC since its first days as a small organization occupying one of the many offices in University of Pennsylvania's Williams Hall. Dave has been involved in many aspects of LDC's work over the years; he currently designs tools that support corpus creation, annotation and quality assessment and has a direct role in the production of most LDC publications.

We hope you enjoy Dave's reflections on life at LDC.

Click here for Dave's podcast.