The model typically used in IGT incorporates three tiers: a transcribed (either phonetically or orthographically) text, a gloss and a free translation. This three tiered model is a minimum - when an L1 text is merely translated into L2, too much ambiguity remains for the data to be of much use to future or even present generations of linguists.
Bow Bird and Hughes recommend a 4-level model that incorporates text, phrase, word and morpheme levels. They summarize the typical components of each level.
The text level should include the complete unit of data under study, as well as any metadata collected. It should include any notes or comments on the text, and free translation into L2. Any unanalyzed elements of the text should be included here, open to future modifications.
The phrase level breaks down the text into segments (usually sentences) for a broad analysis. The phrases may be freely translated into L2.
The word level breaks down the L1 data into words, which can pose a parsing challenge with languages that lack white-space separation between words.
The morpheme level breaks down the L1 data into morphemes, which can also pose a parsing challenge. At the morpheme level, two modes of representation are used: a plain language translation of word roots and a terminological identification of grammatical morphemes.
While the 4-level model is recommended by BBH, other levels of representation may be desired for specific data. Other possible levels include:
- multiple metadata
- references to audio/video files (and alignment references)
- multilingual translations
- notes from different generations of linguists
- tonetic transcription (prosodic information)