The Descriptive Grammar as a (Meta)Database
|University of Pittsburgh and the Max Planck Institute for Evolutionary Anthropology|
This paper presents a general model for the structure of the traditional
descriptive grammar based on a survey of four printed grammars, each
of which was chosen as representative of a different "genre":
a "best-practice" grammar, Haspelmath's (1993) Lezgian grammar;
a grammar representing the traditions of a specific area/family,
Maganga and Schadeberg's (1992) grammar of Kinyamwezi, a Bantu
language; a grammar from the Routledge Descriptive Grammars series,
Huttar and Huttar's (1994) grammar of Ndyuka; and a "legacy"
grammar, Williamson's (1965) grammar of Ijaw, which remains an
important resource for the language despite making use of a
dated syntactic formalism.
This study is intended to be exploratory more than definitive. Its primary goal is to stimulate debate on the way information found in descriptive grammars is structured, with the ultimate goal of developing a workable model for the digital analog of printed descriptive grammars. Section 1 will present important features found in all of the grammars surveyed. Section 2 will discuss interesting features specific to individual grammars. Section 3 will present a general model for the traditional descriptive grammar, understanding it to be a series of annotations over a lexicon and set of texts. Section 4 will give a possible XML representation of that model. Section 5 will offer a brief conclusion and discuss possible future directions for research on modeling grammars.
|1. General features of descriptive grammars|
|1.0 Four basic features|
Four main features were found that were common to all of the
grammars in the survey which appear to form the "core" of the traditional
The pages in figures 1 and 2 further illustrate that the basic structure of a section is that it can (i) contain other sections, (ii) contain descriptive prose relating to the phenomenon being described, and (iii) contain examples illustrating that phenomenon. In this particular case, the examples are in the form of interlinear text. As will be discussed in section 1.4, I refer to the examples used in descriptive grammars as exemplars, to emphasize their status as examples specifically chosen to illustrate some phenomenon.
In the next four sections, I will discuss each of the four basic features of descriptive grammars listed above in more detail. In section 1.5, I will briefly cover another common feature of the grammars, less widely used, but important in some areas, structured description.
|1.1 Three types of ontologies|
In the last section, the fact that the Ndyuka grammar made implicit reference to a general
ontology for linguistic description was mentioned. The use of ontologies in
descriptive grammars, however, goes far beyond this.
At least two other types of linguistic ontologies are regularly employed which, here, will
be called subcommunity and local ontologies. Brief descriptions of
these three types of ontologies are given immediately below.
The use of local ontologies is particularly noticeable in cases like that seen in the excerpt from Williamson's (1965:28–29) Ijaw grammar given in the figures 3 and 4 below where the label for some grammatical class is completely arbitrary (see, specifically, the section labeled "1.7.2 Tone classes"). In this particular case, different tonal classes of words found in Ijaw are given the labels Class I, Class II, Class III, Class IV, and Class V. It might, of course, be the case that an analysis could be applied to these classes which would allow the words in each class to be given a label drawn from a general or subcommunity ontology. However, in this particular description, as it stands, the classes are given labels drawn from a local ontology.
Something that is important to note about the use of local ontologies—which can be seen in both the Ndyuka and the Ijaw excerpts—is that they tend to be used to subdivide phenomena which are classified using a term drawn from a more general ontology. For example, Williamson's Classes I–V for Ijaw are designated as being tonal classes—a concept which would be needed for languages other than Ijaw. It's also important to note that while the labels for the tone classes might be part of a local ontology, a prose description of those labels may draw on terms from a more general ontology. In the case of these Ijaw tone classes, for example, terms like low, rising, and isolation, which have general currency, are employed in characterizing the various classes. These aspects of the use of local ontologies have an important implication: In grammatical description, ontologies are not used in a self-contained way. Rather, terms drawn from different ontologies can be intermingled in the description of some phenomenon.
We have yet to see an example of the use of a subcommunity ontology. An example of this can be seen in figure 5, an excerpt from the Kinyamwezi grammar, which indicates the form of noun class prefixes in the language.
Bantu languages are famous for their rich noun class system, which has a number of grammatical reflexes, including a system of nominal prefixes. The noun classes of Bantu languages are consistent enough across the family that they are reconstructible for Proto-Bantu, and there is a generally agreed upon terminology set for referring to them using the numbers one through twenty one (only noun class numbers one through eighteen are seen in figure 5). The form of the particular noun classes will differ from language to language. However, by identifying some noun class prefix in a given Bantu language with a number between one and twenty one, a descriptive claim is made that that prefix is etymologically related to a prefix given the same number in another Bantu language. So, unlike the numbered tone classes of Ijaw, the numeric designations for the noun classes seen for Kinyamwezi in figure 5 are not drawn from a local ontology since the same numbering system is used for other languages. Rather, they are drawn from a subcommunity ontology, in particular the Bantu subcommunity.
As with local ontologies, the terms in subcommunity ontologies may be partially defined with terms from a general ontology. In the Bantu case just described, while the use of the numbers one through twenty one to designate noun classes may be peculiar to the Bantu subcommunity, the term noun class has a broader use as does the term prefix. We see, then, that local ontologies and subcommunity ontologies are not defined in a conceptual "vacuum". Rather, they tend to build on concepts which are part of general ontologies.
|1.2 Nested sections|
Most book-length documents are divided into sections of one type
or another. So, it is not particularly surprising that grammars are similarly
divided. Importantly, the sections found in grammars tend to show a relatively
high degree of standardization. For example, all of the grammars in the
survey contain a section titled "Phonology".
A general pattern seems to be that the sectioning of grammars is sensitive to, but not dictated by, some sort of general linguistic ontology. This ontology is not explicitly employed, of course. Nevertheless, some notion of appropriate categorization of different phenomena clearly informs the way the sections are organized. Consider, for example, the subdivisions of the chapter entitled "Verbal inflection" of the Lezgian grammar given below in table 1.
It's important to point out that, even if an ontology drives some aspects of the organization of the sections in a grammar, the exigencies of producing a coherent linearly-organized document like a book will sometimes force deviations and compromises from a purely "ontological" organization. In table 1, we see, for example, the inclusion of a section called "Illustrative partial paradigms". While paradigms presumably have a place in an ontology of concepts relating to verbs, "illustrative" paradigms would not seem to belong to such an ontology. Rather, they are included for the convenience of the reader trying to understand the morphological patterns in the Lezgian verb. Similarly, the appropriateness of a section on "Periphrastic tense-aspect categories" is questionable in a chapter on verbal inflection. The inclusion of this section reflects a tension between organizing the chapter along the narrow concept of inflection and bringing together grammatical forms with similar semantic functions. There is a type of ontological "clash" in the language where similar function is not expressed by similar form. In the production of a physical grammar, the clash must be resolved somehow and, in this instance, we see that functional-based grouping was favored over formally-based grouping.
Ontologically-sensitive sectioning was a feature of all the grammars surveyed. This is not surprising, of course, given that ontologies are a reflection of how linguists categorize grammatical phenomena and a grammar is intended to be a description of a wide range of phenomena of a given language.
A final characteristic of the sectioning of grammars which should be mentioned is that each section is typically given a unique numeric label which is used for referring to that section. This feature of descriptive grammars is an apparent reaction to the fact that, while a physical descriptive grammar is a linearly arranged book, the structure of the information in a grammar is not linear. Rather, it is highly interconnected, and a given piece of information doesn't necessarily belong in only one "place". When the organization of a grammar forces the primary descriptions of related phenomena to appear separately from each other, section references can be used in the prose to connect them descriptively.
|1.3 Descriptive prose|
Some important features found in the descriptive prose in grammars,
in addition to free-form prose itself are given below:
Importantly, while it can be the case that descriptive prose is explicitly associated with an exemplar via a reference, sometimes the association is merely implicit. (Exemplars are discussed in more detail in the immediately following section.) This can also be seen in the excerpt from the Lezgian grammar given in figure 6. In section "10.4.1.3", the first chunk of descriptive prose makes explicit reference to the exemplar data in "(396)". The second chunk of prose is only implicitly associated with the exemplar data in "(397)" by virtue of directly preceding that exemplar.
The linking of particular chunks of prose to particular exemplars can create something like subsections to a given section since a standard way of indicating this linking is to place a set of exemplars immediately after the relevant prose, with multiple distinct sets of exemplars contained within one section. However, this exemplar-based sectioning is not explicit, in contrast to the sort of sections described above in section 1.2.
|1.4 Exemplar data|
I use the term exemplar here to refer to language data used in
grammars to exemplify the phenomena under discussion. An exemplar is understood to
be different from an example of some grammatical phenomenon in that
it is specifically chosen by the author of a grammar to assist in descriptions of that phenomenon. It
can generally be assumed that the particular examples chosen to serve
as exemplars more clearly illustrate the phenomenon under discussion than
many of the other examples would.
Two major types of exemplars were found in the grammars of the survey: lexical exemplars and textual exemplars. Lexical exemplars take the form of either words or morphemes accompanied by glosses, typically arranged in a table. Textual exemplars typically take on the form of interlinear text. Exemplars may or may not be given unique labels.
Figure 7 is an excerpt from the Lezgian grammar showing both lexical and textual exemplars. The lexical exemplars, appear with the label "(345)". The core of the lexical exemplars consists of a word, a gloss, and a grammatical label. In addition, each exemplar is accompanied by a lexically related form for comparative purposes (in parentheses). There are four textual exemplars in figure 7, the first two, labeled "(343)" and "(344)" are interlinear phrases, and the second two, labeled "(346)" and "(347)" are interlinear sentences.
Some of the exemplars in figure 7 have an important feature: They diverge slightly from a standard presentation format to allow them to better illustrate the phenomena they are exemplifying. This can be seen for the lexical exemplars in that they are accompanied by lexically related forms. One instance of a textual exemplar diverging from standard presentation format can be seen in the part of the exemplar labeled "(346)" that is highlighted in red. The word-by-word glossing has been further annotated for constituency—specifically marking an infinitive clause containing a participial phrase—in order to clarify which part of the sentence is exemplifying the phenomenon under discussion. A comparable device, bolding some words in a textual exemplar, was encountered in the Ndyuka grammar.
Some of the textual exemplars in figure 7 show an additional kind of annotation deviating from strict interlinear format. They also contain external references to the source of the exemplar. These external references are highlighted in blue.
An important aspect of the use of exemplars is that, in some cases, exemplars are grouped with other exemplars. This can be seen, for example, in the set of exemplar data labeled "(345)" in figure 7. The general use of such grouping is to indicate that each member of the set either illustrates the same phenomenon or plays a part in illustrating some phenomenon. However, it is not the case that an ungrouped set of exemplars should be assumed to illustrate different phenomena. The Ndyuka grammar, for example, did not make use of any explicit grouping convention despite numerous instances of exemplars which would have been reasonable to treat as members of a logical set.
|1.5 Structured description|
A final, less prominent, feature found in grammars of the survey combines some features
of descriptive prose and exemplars. This is what I term, structured
description. This is description, typically in tabular format and offset from the prose,
covering a particularly coherent domain of a language's grammar. The most frequently
occurring type of structured description is tabular presentations of a language's phoneme
inventory. Such an inventory is clearly description, but, unlike descriptive prose,
the description has a very particular format grouping segments by generally-accepted
phonetic and/or phonological categories.
However, structured description is not restricted to relatively standardized realms like phoneme inventories. It is also used for phenomena which apply to a sufficiently large class of constituents that a generalized schema can be given. The parts of the excerpt from the Kinyamwezi grammar highlighted in red in figure 8 give an example of a structured description summarizing the tone patterns for certain verbal forms using a type of morpheme-to-tone association template.
Some instances of structured description bear resemblance to theoretically-oriented formalizations of grammatical phenomena. The boundaries between structured description and formal "description" are not immediately clear, and, in fact, as will be discussed briefly in section 2.4, one of the grammars in the survey, the Ijaw grammar, made extensive use of formal rules in describing the language's grammar.
|2. Particular features of the four grammars|
2.1 The Lezgian grammar
Of the four grammars in the survey, I designated the Lezgian grammar as the
"best-practice" grammar because it is widely recognized as exceptionally well designed,
even including some innovative features which anticipate recent developments
in computer-assisted linguistics. Some interesting features of the grammar
are given below, and I'll discuss each of them in more detail in turn.
Figure 10 illustrates a page from the index of example sentences found in the Lezgian grammar. The keys to the index are the numbers for the various examples in the grammar and values associated with each key are other example numbers. Notably, the examples referred to are not limited to the exemplars in the grammar but also include examples of various phenomena found in texts provided with the grammar. This sort of index provides much of the functionality which creators of digital resources hope to make possible by providing online or offline search facilities.
The final feature particular to the Lezgian grammar I will mention here is the fact that it makes a typographic distinction between what are classified as language-specific categories like "Ergative" case or "Involuntary Agent" construction, which are capitalized and what are considered universal or semantic categories, like "complement clause" or "adverbial modifier".
Given the discussion in section 1.1, this is a particularly interesting feature since it shows an explicit recognition that grammars tend to employ terminology drawn from different ontologies. In this particular case, when a category is drawn from a general ontology, no capitalization is used. However, when a category is drawn from a local ontology, capitalization is employed.
In addition, by using capitalization of recognizable terms for "language-specific" terms, instead of, say, constructing new terminology entirely, there is an implicit recommended mapping of the language-specific term to a general term. For example, the fact that the label "Ergative" is employed for a particular case form in Lezgian can be taken as a recommendation that that case be mapped to prototypical "ergative" case. The fact that "Ergative" is capitalized is an indication that arguments marked with this case in Lezgian may not have all the features typically associated with ergative case arguments. However, it can reasonably be expected to have the core properties of such arguments.
|2.2 The Ndyuka grammar|
The primary feature particular to the Ndyuka grammar which I will discuss
is its sectioning, which is
is based on Comrie and Smith's (1977) Questionnaire for language
The Questionnaire was designed to set forth a range of questions to ask when working on the grammatical description of a language. The use of the Questionnaire for a given language is meant to ensure that the description of that language has adequate grammatical coverage as well as to facilitate cross-linguistic comparison of that language with other languages described using the Questionnaire, since grammars based on the Questionnaire, for the most part, would be expected to have similar sectioning. For illustrative purposes, the first page of the Questionnaire is given in figure 11 below.
Any grammar following the Questionnaire should, ideally, use the exact sectioning outlined in the Questionnaire itself. So, for example, in the Ndyuka grammar, there is a section with the identifier "18.104.22.168.1" on Yes-no questions.
A feature of the Ndyuka grammar imposed on it by Questionnaire-based sectioning is that, like with the Lezgian grammar, there are numerous places where it is explicitly indicated that the language lacks some grammatical phenomena. For example, question "22.214.171.124" of the Questionnaire essentially asks if obviation is found in the language. Since obviation is not a grammatical phenomenon in Ndyuka, section "126.96.36.199" of the Ndyuka grammar simply states that nominals are not marked for obviation.
|2.3 The Kinyamwezi grammar|
The most noteworthy aspect of the Kinyamwezi grammar, with respect to the present survey,
is its extensive use of a subcommunity ontology. This grammar, of course, was chosen
to exemplify subcommunity grammars—so, this is not surprising.
One need only look at the table of contents of the grammar to see the use of terms which, while being well-known in the Bantuist community, could not be expected to be well-known outside of it. For example, a section of the chapter on consonants is entitled, "Dahl's Rule"—this refers to a particular historical dissimilation process which is important in Bantu historical phonology. The section on nouns has a subsection entitled "The Augment", again a Bantu-specific term. Similarly, the chapter on verbal derivation contains section titles using the word "extension"—a reference to a particular subclass of suffixes found on Bantu verbs.
The use of terms specific to Bantu linguistics, importantly, does not make this grammar unusable to people from outside that community because the descriptive prose, in general, either defines community-specific terms or describes the relevant phenomenon clearly enough that it is not necessary to know the precise definition of the term to interpret the grammatical facts of the language.
Given the large number of Bantu languages and the extensive similarities found among them, the use of a subcommunity ontology plays a similar role to the use of the Questionnaire format for the Ndyuka grammar. It facilitates cross-linguistic comparison. However, unlike the Ndyuka case it is not general cross-linguistic comparison which is made easier. Rather, comparison with other Bantu languages is facilitated. We can think of the Bantu term set as an ontology which is "optimized" for one particular language family. In some areas, like verbal suffixes, it makes finer-grained distinctions than a general ontology would, while it entirely ignores phenomena which are not well-represented in Bantu languages.
|2.4 The Ijaw grammar|
The Ijaw grammar was chosen to be part of the survey for its use of a "legacy"
formalism—specifically, notational devices of early transformational grammar.
While there are numerous instances of legacy formalisms to be found in linguistic
description, I chose this grammar, in particular, because
it is still cited fairly frequently as it represents the only grammar (to my knowledge)
of a typologically unusual Niger-Congo language.
Some of the effects of the use of a legacy formalism can be seen in the grammar's overall structure. For example, phenomena generally classified as syntactic are spread out over four chapters, "Phrase-structure rules", "Verb phrase transformations", "Noun phrase transformations", and "Sentence transformations". While the idea of having multiple chapters for syntactic phenomena is not necessarily tied to any particular formal theory of grammar, these particular divisions are clearly derived from early transformation grammar, and it is unlikely that they would be employed in any grammar produced today.
In addition, the use of this formalism also affects the nature of the description of particular phenomena. Figure 12 gives an excerpt from the chapter on phrase-structure rules wherein aspects of the basic syntax of sentences are discussed. While the discussion is still accessible to a present-day reader, the particular format of the description, using phrasal expansion rules to describe possible sentences, is not in common use today in descriptive grammars.
While formal representations of grammatical phenomena may have a place within language documentation (see for example, Bender, et al. (2004)), they have not been widely used by authors of descriptive grammars. This is presumably because grammatical formalisms have changed too rapidly to be seen as valuable tools for the creation of language documentation which is intended to last for decades—if not centuries.
The existence of legacy formalisms in grammars like the Ijaw grammar should probably not influence best-practice recommendations for the production of new grammars. However, they do need to be considered in the formulation of best-practice recommendations for the conversion of existing print grammars to digital formats.
|3. Towards a model of the structure of a descriptive grammar|
In this section, I will propose a basic model for the structure
of descriptive grammars based on the features found in the grammars
in the survey. This model will be incomplete insofar as it will not encompass
all of the features found in each grammar. Rather, it will aim to
isolate the features common to all of the grammars in order to establish
a foundation upon which more particular features can be added. This reflects
the fact that the primary aim of this paper is not to present a definitive
model for all descriptive grammars but rather to stimulate discussion
on standards for encoding the information found in descriptive grammars
The basic model for a descriptive grammar that I propose here is given in figure 13.
The basic model for a descriptive grammar given in figure 13 envisions it as a series of annotations on a lexicon and a set of texts. The presentational analog to an annotation found in the surveyed grammars is the section, described above in section 1.2. While, in some cases, a descriptive grammar is accompanied by a published lexicon and a set of texts (sometimes in the same volume as the grammar itself), I do not mean to imply figure 13 that such documents must physically (or electronically) accompany a grammar. Rather, the existence of a lexicon and body of texts is presupposed by a descriptive grammar, which contains generalizations over the lexicon and collected texts of a language. Furthermore, even if there are no particular resources corresponding to a lexicon or set of texts, a partial lexicon and a set of one-sentence "texts" could always be constructed on the basis of the exemplars in the grammar.
The model for descriptive grammars given in figure 13 treats the annotation, a type of relatively unstructured but highly expressive metadata, as the core of the grammar. I classify annotations as a kind of metadata since their content consists of generalizations over the more primary data found in the lexicon and texts. A proposed model for the annotations found in a descriptive grammar is given in figure 14. This structure is simplified somewhat, and other features of annotations are diagrammed in figure 15.
The model for a grammatical annotation given in figure 14 treats the annotation as having descriptive prose at its center with links from the descriptive prose to lexical and textual exemplars as well as to structured description. In addition, exemplars are linked to a (possibly abstract) lexicon or set of texts, and the descriptive prose might contain references to terms drawn from one of the three types of ontologies discussed above. The local and subcommunity ontologies are shown as being linked to the general ontology, indicating that it is common (and presumably best) practice to indicate how local or subcommunity terminology relates to generally understood terminology.
Figure 14 is a simplification of the structure of an annotation in a number of respects. First, it does not indicate whether the components of an annotation are obligatory or optional. In the grammars found in the survey, only the descriptive prose appeared to be obligatory. In addition, figure 14 treats exemplars themselves as part of the structure annotation, while it might be more accurate to consider them as external to the annotation and, instead, only place references to exemplars within the annotation itself. Certainly, from a presentational perspective, exemplars appear to be part of an annotation. However, it is not clear that the content of exemplars is part of the annotation's logical structure. Figure 14 also implies that links to the ontologies can only be made via descriptive prose while, in fact, links to the ontology could be made from any part of the annotation. (Such links were not presented to minimize visual clutter.) Similarly, figure 14 does not include the fact that structured descriptions can make reference to exemplars, just as descriptive prose can.
Perhaps the most crucial feature of an annotation which is omitted from figure 14 is the fact that an annotation can have two important kinds of relationships to other annotations. First, an annotation can contain other annotations—this is the analog to the nesting of sections in a printed grammar. Second, references to other annotations can be made within annotations. These two kinds of relationships are schematized in figure 15.
Figures 14 and 15 do not include two features of the sections found in descriptive grammars as part of the annotation: a label (used for referring to the section) and a title. While it is certainly important to represent these pieces of information, they would seem not to be part of the logical structure of an annotation. Rather, they are metadata for the annotation.
Before continuing on to section 4, where I will present a partial XML representation for the information found in descriptive grammars, it would be worthwhile to point out that the model of the descriptive grammar given in figure 13 as a system of annotations over a lexicon and set of texts can be understood as treating a grammar as a sort of metadatabase—that is a database of generalizations over primary data. The only aspect of the model presented here which deviates from the most typical database structure is the fact that annotations can be contained within other annotations. Databases are commonly conceived of as consisting of a series of records with a fixed, non-recursive structure. However, even though it diverges from a prototypical database structure, this aspect of the model makes the structure of a database of grammatical annotations only slightly more complicated than, say, a database of lexical items. Representing the possible nesting of annotations simply requires the addition of metadata indicating that a given annotation can have another annotation as its "parent".
|4. Towards an XML representation of the descriptive grammar|
It is not possible here to give a full XML document type definition (DTD)
or schema for a descriptive grammar following the model seen in figure 13
since this would require also having a markup schema for lexicons and
texts. There has been work on representing both kinds of resources in XML—so,
this is, fortunately, not an insurmountable problem. Chapter 12,
of the TEI guidelines (Sperberg-McQueen and Bernard 2002), for example,
is devoted to markup standards for dictionaries and E-MELD's
input tool can produce richly marked up XML lexicons. For text markup,
Bow, Baden, and Bird (2003) provides a system of XML markup which can
be applied to interlinear text. These existing standards could easily
be adopted in a markup schema for grammars and would be largely sufficient
to encode the sorts of lexical and text examples found within them.
The only crucial feature lacking in existing markup systems for lexical entries and interlinear data which would be required to fully represent the lexical and text exemplars in grammars is a system allowing an example to receive special annotation to clearly indicate how it is an exemplar for some particular feature. In the discussion of figure 7 in section 1.4, for example, we saw some cases from the Lezgian grammar where such special annotation was employed. Devising a system of markup for such annotation is outside the scope of the present proposal. It would seem to be a fairly complex task because the nature of the special annotation on exemplars can be quite varied and would require a separate survey in its own right, which should, presumably, include a survey of special exemplar annotation conventions found in theoretical work, where such annotation tends be very widely used (a syntactic tree, for example, could be understood as an example of such annotation).
Another feature of the grammars in the survey which is outside of the scope of this paper is modeling possible types of structured description, discussed in section 1.5. This, too, would seem to require a separate survey in its own right, which should also probably include theoretical work, in addition to descriptive work. First, it would have to be determined how many different types of structured description are used in linguistic analysis and, then, a representation would have to be developed for each type. Below, I give a DTD for the descriptive grammar which allows for the existence of structured description in an annotation. However, I do not give any model for any particular instance of structured description.
Putting the issues of special annotations for exemplars and structured description aside, a possible system for XML markup for descriptive grammars is given below in figure 16, which contains an XML fragment of a markup system consistent with the model of descriptive grammars discussed in section 3.
A DTD consistent with the XML fragment in figure 16 can be downloaded by clicking here.
(Internet Explorer for Windows users will need to specifically instruct their browser to view the DTD as source.)
The XML fragment in figure 16 formalizes the model discussion in section 3 in the following ways:
|5. Conclusion and topics for further research|
This paper has presented a model, derived from a survey of four printed grammars,
of the information found in descriptive grammars wherein they
are understood as a series of annotations over a lexicon and texts (figure 13). In addition,
it has given a model for the structure of annotations themselves, taking them to consist
primarily of descriptive prose, structured description, exemplars, and sub-annotations. The model
given for annotations also allows them to contain references to parts of other annotations,
to elements in a lexicon or set of texts, and to terms drawn from ontologies. In addition, a possible
XML representation for this model was given in section 4.
Section 4 pointed out several ways in which the present model—and the XML representation, in particular—is incomplete. First and foremost, it presupposes that lexicons and texts on which grammatical annotations are based have already been properly modeled. In addition, research is required to determine models for, and representations of, structured description and special annotations on exemplars.
In addition to dealing with these issues, there are several other possible directions for future research on descriptive grammars. Bender et al. (2004) are researching the possibility of bridging the gap between traditional description and formal description so that a machine-readable grammar can be built along side of human readable one. The formalization of grammatical description also provides an excellent test for the utility of ontologies, especially considering that descriptive grammars typically make use of multiple ontologies in an interconnected fashion. Finally, an important area not covered at all here are methods for transforming the basic representation of descriptive grammars provided here into other formats, in particular into human-readable documents.
Bender, Emily M., Dan Flickinger, Jeff Good and Ivan A. Sag. 2004.
Montage: Leveraging advances in grammar engineering, linguistic ontologies,
and markup for the documentation of underdescribed languages.
Proceedings of the Workshop on First Steps for Language Documentation of
Minority Languages: Computational Linguistic Tools for Morphology,
Lexicon and Corpus Compilation, LREC 2004, Lisbon, Portugal.
Bow, Cathy, Baden Hughes, and Steven Bird. 2003. Towards a general model of interlinear text. Proceedings of E-MELD Workshop 2003: Digitizing and Annotating Texts and Field Recordings. LSA Institute: Lansing MI, USA. July 11–13, 2003. Available at:
Comrie, Bernard and Norval Smith. 1977. Lingua descriptive studies: Questionnaire. Lingua
Farrar, Scott and Terry Langendoen. 2003. A linguistic ontology for the semantic web. GLOT International 7, 97–100.
Haspelmath, Martin. 1993. A grammar of Lezgian. Berlin: Mouton
Huttar, George L. and Mary L. Huttar. 1994. Ndyuka. London: Routledge.
Maganga, Clement and Thilo C. Schadeberg. 1992. Kinyamwezi: Grammar, texts, vocabulary. Köln: Rüdiger Köppe.
Sperberg-McQueen, C. M., and Lou Burnard (Eds.). 2002. TEI P4: Guidelines for Electronic Text Encoding and Interchange: XML-compatible edition. Available at: http://www.tei-c.org/P4X/.
Williamson, Kay. 1965. A grammar of the Kolokuma dialect of Ịjọ. Cambridge: Cambridge