Linguistic Data Consortium: February 2022

Tuesday, February 15, 2022

LDC February 2022 Newsletter

LDC Membership Discounts Expire March 1

New Publications:

The Child Subglottal Resonances Database

Spoken Digits in Hindi and Indian English

LDC Membership Discounts Expire March 1

There is still time to save on 2022 membership fees. Renew your LDC membership, rejoin the Consortium, or become a new member by March 1 to receive a discount of up to 10%. For more information on membership benefits and options, visit Join LDC.

New publications:

(1) The Child Subglottal Resonances Database was developed by Washington University and University of California Los Angeles and consists of 15.5 hours of simultaneous microphone and subglottal accelerometer recordings from 19 male and 9 female child speakers of American English aged 7-17.

The subglottal system is composed of the airways of the tracheobronchial tree and the surrounding tissues. It powers airflow through the larynx and vocal tract, allowing for the generation of most of the sound sources used in languages around the world. The subglottal resonances (SGRs) are the natural frequencies of the subglottal system. During speech, the subglottal system is acoustically coupled to the vocal tract via the larynx. SGRs can be measured from recordings of the vibration of the skin of the neck during phonation by an accelerometer, much like speech formants are measured through microphone recordings.

The corpus consists of 34 monosyllables in a phonetically neutral carrier phrase (“I said a ____ again”), with a median of 6 repetitions of each word by each speaker, resulting in 5,247 individual microphone (and accelerometer) waveforms. Speaker metadata is included.

The Child Subglottal Resonances Database is distributed via web download.

2022 Subscription Members will automatically receive copies of this corpus. 2022 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

(2) Spoken Digits in Hindi and Indian English was developed by the Birla Institute of Technology and Science Pilani and contains two hours of speech from Hindi and English speakers with regional accents from across India saying the digits 1-10. The data was collected in person on a mobile handset recorder app, by one-to-one online communications over social apps, and from social media sites. Each audio file represents a single spoken digit in either Hindi or Indian English. Background noise was mostly retained. Some data was recorded in a noise-free environment or cleaned after recording to avoid abrupt noises such as car horns. Speaker metadata is included.

Spoken Digits in Hindi and Indian English is distributed via web download.

2022 Subscription Members will automatically receive copies of this corpus provided they have submitted a completed copy of the special license agreement. 2022 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.