Archivist Start Page

Page Index

Why is it important to preserve endangered languages documentation?

Around the world, languages are dying at an unprecedented rate. When the last speaker of a language dies, the language dies too; only documentary sources remain. Please visit the Endangered Languages page for more information.

How do linguistic materials differ from other primary sources?

Documentation of endangered languages may include audio and video recordings of narratives and conversations, lexicons and wordlists, and interlinear glossed text, along with a description of the grammar of the language.

How important is it to preserve the audio and video recordings?

Unlike an oral history project, in which a thorough transcription of the interview might sometimes be viewed as an adequate surrogate for the audio recording, the recordings of an endangered language are essential, both for community members maintaining or revitalizing their language, and for linguists doing research. Although linguists make careful phonetic transcriptions of sound recordings, these are a poor substitute for the actual sounds. Therefore, preservation of audio recordings is of paramount importance and should take priority over digitization of other materials. For more information on digitization of language recordings, visit our Audio page.

With the exception of documentation of sign languages, video recordings are certainly useful for linguistic research, but not essential. The most important thing to remember is to digitize the audio portion of the recording separately, in an uncompressed format. Visit our Video page for more information.

Are there special considerations for the digitization of textual materials?

Endangered language documentation will often include a short wordlist (just words with brief definitions) or a lexicon (a more extensive and more carefully analyzed list, often including part of speech and information on word formation). To learn more, visit our Lexicons page.

Interlinear glossed text is a format commonly used by linguists, displaying a sentence or phrase in several tiers. The first tier is often a phonetic or orthographic representation, the second a gloss based on the segments of each word, and the third a free translation, though more levels may appear. The following is a simple example from Spanish:

Los gato-s jueg-an en la playa
DET.PL.M cat-PL play-3P.PL on DET.SG.F beach
"The cats play on the beach."

To learn more, visit our IGT page.

Because endangered languages often have no written form, they are commonly transcribed in International Phonetic Alphabet (IPA) or in some other orthographic system. To ensure an unambiguous representation of whichever script is being used, text should be encoded in Unicode. To learn more, visit our Unicode page.

Where can I find examples of digitization projects?

To read about examples of digital language documentation, visit our Case Studies section.

What kind of metadata is important for linguistic materials?

To learn about the OLAC (Open Language Archives Community) extensions of Dublin Core, and about other metadata standards, visit our Metadata page.

Where can I find more information about endangered language archives?

To learn about linguistic archives, visit our Archives page.

User Contributed Notes
Archivist Start Page
+ Add a comment

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search