LDC data and commercial technology development
New publications:
CALLFRIEND Russian Text
New publications:
CALLFRIEND Russian Speech was developed by LDC and consists of 48 hours of telephone conversations (100 recordings) between native speakers of Russian. The calls were recorded in 1999 as part of the CALLFRIEND collection, a project designed primarily to support research in automatic language identification. One hundred native Russian speakers living in the continental United States each made a single phone call, lasting up to 30 minutes, to a family member or friend living in the United States.
All recordings involved domestic calls routed through LDC’s automated telephone collection platform and stored as 2-channel (4-wire) 8-KHz mu-law samples taken directly from a public telephone network via a T-1 circuit. Each audio file is a FLAC-compressed MS-WAV (RIFF) format audio file containing 2-channel, 8-KHz, 16-bit PCM sample data.
This release includes call metadata, including speaker gender, the number of speakers on each channel and call duration.
Corresponding transcripts and a lexicon are available in CALLFRIEND Russian Text (LDC2023T09).
2023 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
CALLFRIEND Russian Text contains the corresponding transcripts and a lexicon for CALLFRIEND Russian Speech, that is, 48 hours of telephone conversations (100 recordings) between native Russian speakers.
The transcripts have four main fields on each line (begin_offset, end_offset, speaker_label, transcript_text) separated by tabs. Each contains a list of time-stamped segments in order according to their begin_offset values, with no blank lines.
The lexicon covers the word forms in the 97 transcript files. The main lexicon table contains three columns per row: Cyrillic orthography, phonetic transliteration and numeric representation of syllabic stress.
Corresponding speech data is available as CALLFRIEND Russian Speech (LDC2023S08).
2023 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
No comments:
Post a Comment