Alexis Dimitriadis i, Adam Saulwickii, & Menzo Windhouwerii, [i]Utrecht University, [ii] University of Amsterdam

Semantic Relations in Ontology Mediated Linguistic Data Integration

The Typological Database System (TDS) is a web–based service (under development) hosting an integrated ontology for unified querying of multiple independently developed typological databases. TDS currently contains information on circa 1,000 languages from five integrated databases. Component databases contain data on a range of linguistic features (including agreement, parts–of–speech, word order, stress placement and predication phenomena). Some future component databases also contain primarily linguistic data in the form of lexicons or glossed sentences. The TDS project has developed an ontology of linguistic concepts to facilitate integration of component databases. This paper discusses some of the key features of the current TDS ontology. These include the function of the ontology, the types of semantic concepts encoded in it, its structure, and its content.

The TDS ontology categorizes a "bias-neutral" vocabulary of linguistic concepts. This provides a method of unifying potentially idiosyncratic terminology (in component databases) and has the potential to increase access to information. Further, the ontology is used to manage retrieval of a great number of concepts and fields in component databases. The inventory of ontology concepts, classes and relationships forms a network graph that can be navigated or searched for keywords. Ideally, every database field is linked to related ontology concepts.

An important feature of our ontology is the means by which various types of linguistic concepts and relations are distinctly modelled. It is crucial to have a means of differentiating between distinct relations. For instance, the relation between the categories of ‘case’ and ‘grammatical case’ is not the same as the relation between the categories of ‘vowel length’ and ‘syllable weight’. We argue that by modelling these differences with distinct ontology patterns, the ontology can provide a means of reasoning and thus the basis of "smart searching". We present a range of domain–specific, semantic concepts and/or relations that are considered both distinct and basic, and discuss how they are expressed in the language of the ontology. These include thematic categorization, value attribution, hyponymy, meronymy, (loose) synonymy, paradigmatic relation, determinant, and form–function relation.

A further issue concerns the nature of the construction of queries. We describe a two–step process. The user first discovers fields relevant to the concepts of interest, either by traversing the ontology graph or by performing a "smart search" over its contents, i.e. where smart means extending the search with database fields linked to semantically related concepts. These fields are then accumulated, which forms a pre–query that a user may then refine and execute. Finally, it is envisaged that the ontology may be used to reveal errors in the logic of query construction, such as impossible value conjunctions. The user may be warned about such a query, but it can still be executed thereby permitting a search for possibly inconsistent data in distinct databases.