World Atlas of Language Structures:
An Interactive Cross-Linguistic Database for Typological Research
Hans-Jörg Bibiko (interactive electronic version), Martin Haspelmath,
Matthew S. Dryer, David Gil & Bernard Comrie

Max Planck Institute for Evolutionary Anthropology

2,641 languages arranged by families with topological data

     1.0 Abstract
     The WALS (World Atlas of Language Structures; Haspelmath et al. 2005) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of more than 40 authors (many of them the leading authorities on the subject). It will be published as a printed book in traditional atlas format, accompanied by a fully searchable electronic version that also allows various visualization effects.

The World Atlas of Language Structures consists of 142 maps with accompanying texts on diverse features (such as vowel inventory size, noun-genitive order, passive constructions, and 'hand'/'arm' polysemy), each of which is the responsibility of a single author (or team of authors). Each maps shows between 120 (35) and 1110 languages, each language being represented by a dot, and different dot colors showing different values of the features. Altogether more than 2,600 languages are shown on the maps, and more than 55,000 dots give information on features in particular languages.

The World Atlas of Language Structures thus makes information on the structural diversity of the world's languages available to a large audience, including interested nonlinguists as well as linguists who would not normally read grammars of exotic languages or specialized works by comparative linguists. Although endangered languages are not particularly emphasized, they are automatically foregrounded because of the large sample of languages represented on each map, where each language (independently of its number of speakers) is shown by a single dot.

The interactive database (available on first CD-ROM, later probably on the web) will allow the atlas user to view the maps in a variety of different forms, as well as to combine features, i.e. to generate compound features and to display these as well. The interactive database will also contain additional information on languages (genealogical classification, alternative names) and on each language-feature pair (bibliographical reference, example sentence). The interactive maps can be zoomed and panned, dot colors and shapes can be customized, a few map properties (rivers, country names, etc.) are switchable, and languages can be searched by language name, family and genus name, country, and region within country. With the mouse over effect the corresponded language name is shown immediately and with a click the language profile appears in a separate window. The generation of compound features will be very useful for typological research. For example, the user will be able to correlate the existence of an question-word-fronting rule with particular word order types, the existence of tone with the size of the consonant inventory, or the alignment type (accusative, ergative, active-inactive) with the head-dependent marking type. Furthermore geographical and genealogical information can be included.

     2.0 Main Features of the Atlas
  • Printed book published by Oxford University Press in summer 2005 with an interactive version on CD-ROM

  • World maps showing the geographical distribution of structural linguistic features

  • Each chapter consists of a map and an accompanying text of ca. 2.200 words

  • Each map is responsibility of an author or a team of authors

  • Data mostly come from published descriptions

     3.0 Some Statistics
  • 142 features
  • Topic
    Nominal Categories
    Nominai Syntax
    Verbal Categories
    Word Order
    Simple Clauses
    Complex sentences

  • 44 authors or author teams
  • 2643 different languages
  • On average, 398 languages per map
    • Minimum, 35 languages (sign language features)
    • Maximum, 1110 languages (order of object and verb)
  • All together 56.456 data points
  • 6698 bibliographical references

     4.0 Main Features of the Interactive Programme
  • Standalone version for MacOSX, MacOS9; Windows 98, 2000, XP with integrated database server
  • All maps are scalable vector graphics (best resolution for zooming)
  • Interactive maps can be zoomed, panned, symbols can be customized
  • Map properties are switchable (rivers, boundaries, country names, topological data, etc.)
  • Mouse over effect for showing the corresponded language name
  • Language profile with geographical, genealogical information, alternative names and typological features‘ overview
  • 142 predefined customisable feature maps (symbols - colour and shape -, merge and/or hide feature values)
  • Each language-feature pair with data sources
  • Generation of compound features of typological, genealogical, geographical data manually or combinatorially
  • Import/export functions
  • Several tools for generating user defined maps
  • Maps can be saved, printed and copied into the clipboard

     5.0 Screenshots
Geographical distribution of the Sino-Tibetan subgroups

Geographical distribution of the feature “Number of Genders”, 255 languages are shown

Zoomed area New Guinea, dots show the corresponding WALS code of the languages

Language profile of Hmong Njua

     6.0 References
     Haspelmath, Martin & Dryer, Matthew & Gil, David & Comrie, Bernard (eds.) 2005. World Atlas of         Language Structures. (Book with interactive CD-ROM) Oxford: Oxford University Press.