E-MELD School of Best Practice


Case Studies of Resource Digitization

There are many tools and technologies for entering Endangered Language documentation. These case studies show how some of these tools have been used for putting documentation into best practice format, and to contextualize information presented in the Classroom.

If you select a language below, you will be able to follow the step-by-step process by which the language documentation was converted from a legacy format.

Each case study is accompanied by sample language data, presented to illustrate best practices. This is not intended to be a complete documentation of the language; this data was prepared in accordance with best practices and is in various stages of completion.

Another purpose of the School is to introduce users to the languages and communities presented. You will find quick links to language information in the right sidebars of each Case Study.


Biao Min

From Notecards to the Web: Biao Min

For four months in 1982, while at the Central Nationalities Institute under a Graduate Program Fellowship, David Solnit collected all of the existing field data on Biao Min. He wrote the Biao Min data down on paper index cards, one word per card. These notecards were then put aside, in a closet, for over a decade. By following this case study, you can learn how to digitze language documentation collected on notecards.


Shoebox Legacy Data: Mocoví

Verónica Grondona has been collecting data on Mocoví since 1991 in the Chaco region of Argentina. She originally stored her data in the Shoebox program, proprietary software which was current at the time. To learn more about transferring Shoebox legacy data into more explicit XML formats, follow this case study.


Filemaker Data: Potawatomi

Laura Buszard-Welcher, a 2003-2004 post-doctoral research fellow at Wayne State University, received Potawatomi legacy data which she converted to a text file. Using Filemaker Pro, a flat file data structure was created to house the imported data. Follow this case study to learn about transforming documentation in a Filemaker Pro working form into an XML file format with terminology linked to concepts in the GOLD ontology.

TASX: Monguor

Most of the Monguor documentation was collected by 4 teams of native speakers organized by Dr. Arienne Dwyer of the U. of Kansas, aided by Dr. Kevin Stuart, Dr. Xianzhen Wang, and Dr. Dechun Li. After the audio was split off from the video, the local teams did a first-pass transcription in Transcriber, producing a Monguor orthographic transcription and a free translation into either Chinese or Tibetan. The Kansas team time-linked this transcription to the audio file and translated the Chinese or Tibetan into English. The team came to prefer TASX to Transcriber, in part because of its ability to time-align video. Follow this case study, to learn how the E-MELD team put the video file and TASX XML output on the web for playing in multiple browsers.

Multiple Formats: Ega

The Ega legacy resources consisted of a lexicon, interlinear texts, annotated recordings, and linguistic descriptions; and the problems which arose from these include legacy fonts, lexicon structure, phonetic and prosodic annotations, and terminology from highly specialized linguistic traditions. Follow this case study to see how the E-MELD team at the U. of Melbourne performed the task of character, markup, and file format conversion; the case study consists of a report from the team accompanied by some of the scripts used in the conversion process.

Online Video: Tofa

David Harrison collected Tofa data during field expeditions in 2000 and 2001. Since the new documentation consists of annotated audio and video recordings, the E-MELD team will be able to use this language as an example for aligning video with the transcriptions and audio on the web-accessible database system.

From Cassette to the Web: Sáliba

The Sáliba data, collected in 1996 by linguist Nancy Morris, includes a standardized 15-page word list and a 40-minute cassette tape which now reside in the files of SIL international. By following this path, you will learn how Paul Frank and Gary Simons of SIL made archive-quality digital versions of the word list and the audio recording, produced digital encodings of the transcribed word list, created an XML file to capture the details of the recording, and created a metadata description of the entire set of archival materials.

From Shoebox to the Web (IGT): Kayardild

The Kayardild data was collected by Dr. Nicholas Evans. The data set includes an interlinear glossed text, audio recordings, a lexicon and metadata. Follow this case study to learn how Dr. Baden Hughes and Ms. Cathy Bow digitized the interlinear glossed text as an XML file, following best practice recommendations.

From Cassette to Easy-Access Software: Dena'ina

The Dena'ina data was supplied by Dr. James Kari. Follow this case study to learn how to convert recordings of traditional stories on cassette tape to user-friendly HTML displays of aligned text and audio using ELAN and XSL.

From Older Field Methods to Best Practice: Sisaala

The Sisaala data was collected and supplied by Mr. Steven Moran. The data set includes a lexicon, audio recordings and video recordings. Follow this case study to learn how to digitize to best practice text and audio data, and how to register metadata.

Ultrasound Analysis: Navajo

The Navajo data was provided by Dr. Doug Whalen. Follow this case study to learn how to use ultrasound analysis to aid in phonetic transcription, especially concerning timing of articulation events.


This site is continually under development. Your feedback is most welcome.
© 2003 The LINGUIST List
NSF Logo

Related Links
Documentation Projects
Notecards: Biao Min
Shoebox Files: Mocoví
Filemaker: Potawatomi
TASX: Monguor
Multiple Formats: Ega
Online Video: Tofa
Cassette: Sáliba
IGT: Kayardild
ELAN: Dena'ina
BP: Sisaala
Ultrasound: Navajo
Documentation projects