| |
The Online Database of INterlinear text (ODIN),
indexes over 31,000 instances of Interlinear Glossed Text (IGT) harvested
from approximately 2,000 scholarly documents found on the Web. Linguists can locate IGT for over 640 languages, and can easily pull up any of the documents
these instances were discovered in. ODIN is an Open Languages Archives
Community (OLAC) data provider, so searches by language name and code can be
performed on either the LinguistList (http://www.linguistlist.org) (http://www.language-archives.org/tools/search/). The ODIN website provides an additional search
facility beyond language search: Advanced Search. Advanced Search allows the linguist
to search IGT by Grammatical Concepts, by Language Family,
and by Linguistic Constructions or Features, or any combination of
the three. A description of each of the Advanced Search features follows:
- Grammatical Concept
search allows the linguist to search over the grammatical markup terms that are
used in IGT, terms
such as NOM, ACC, ERG, PST, FUT, etc. Rather than a simple string
search, however, Grammatical Concept search normalizes the markup terms
to a set of concepts, as defined in the General Ontology of Linguistic Description
(GOLD,
http://www.linguistics-ontology.org/).
For instance, the linguist can specify a search term PastTense, and ODIN
will find all instances IGT which have past encoded as PAST, PST,
3SPAST, etc. (Note: Many of the term to concept mappings are hand vetted, so there are
gaps.) Further, the linguist can also ask to look for morphemes of a
particular type (e.g., prefix, suffix, proclitic, enclitic). Thus,
a typical query might be "ErgativeCase and PastTense expressed as
suffixes," with the resulting output being a list of IGT that satisfies
the query, displayed by language.
- Language Family
search allows the linguist to reduce the search space for a given query to a
specific language family, where the families used are defined in Ethnologue (http://www.ethnologue.com)
- Constructions/Features
search extends Advanced Search by allowing the linguist
to look for linguistic constructions and features that may not be
explicitly encoded in IGT. By enriching and aligning the gloss and
translation lines, ODIN can make guesses about constructions that may
exist in the source language data. The current list of
construction/feature queries follows (descriptions
for only a few are provided; full information about all the queries can
be found on the ODIN website):
- Conditional – The
conditional query relies on the English translation: if the English
contains clauses that begin with either "if" or "when", then a
conditional is likely. (Questions headed by "when" are ruled out.)
- Coordination –
Coordination looks for any coordinated structures by looking for the
typical coordinators "and", "or" or "but" in the English
translation.
- Counterfactual – A
small subset of Counterfactuals can be discovered in the English
translation by the presence of "if" followed by a verb phrase in the
subjunctive (marked with "were" or "would have"). Counterfactuals
are also sometimes marked up in IGT.
- Imperative – This
query takes a fairly conservative approach: it looks for sentences
in the English translation that begin with a verb (PTB tag "VB" or "VBP")
or the second person pronoun "you", and end with an exclamation
point ("!").
- Multiple Quantifier
- Multiple Wh
- Negation
- Passive – ODIN also
looks for passive structures in the English translation, which are
indicators of passive or passive-like structures in the source
language. The template searched for is simply a form of "to be"
followed by the past participle of the verb (tagged VBN).
- Possessive
- Question
- Raising – Raising
constructions are assumed if a raising verb, such as seem or appear,
is discovered. (A more sophisticated search for raising and control
constructions is being worked on.)
- Reflexive Anaphor
- Relative Clause
- Sentential Negation
- Wh and Quantifier
|
|