FIELD (Field Input Environment for Linguistic Data)
EMELD / LINGUIST List
|Project / Software Title :||FIELD (Field Input Environment for Linguistic Data)|
|Project / Software URL:||http://emeld.org/tools/field/beta/|
|Access / Availability:||This software is under development. A test version is available to the public at the project URL listed above. Data can be entered in the test version, but will not be saved in the database. If you would like to be a beta tester with full access to all of the functionality of FIELD and provide input as to its future development, please contact Anthony Aristar at firstname.lastname@example.org.|
INTRODUCTION. As linguists strive to document endangered languages, there is a pressing need for tools that facilitate data collection and analysis while conforming to best practice in digital archiving. To address this need, the EMELD project (Electronic Metastructure for Endangered Languages Data) has produced FIELD, a new software tool that supports the development and sharing of lexical databases, while creating digital archives in accordance with the best practice recommendations of the EMELD community. Originally developed as an in-house tool to facilitate the conversion of legacy datasets for endangered languages, FIELD has grown in functionality and in the process has generated considerable excitement in the EMELD community as the only publicly available tool that conforms to best practice in the digital archiving of lexical data.
SUPPORT FOR BEST PRACTICE. FIELD's support of the EMELD best practice recommendations is briefly outlined below:
1) Irreplaceable data on endangered languages should be archived in an XML file with a schema that conforms to best practice. FIELD users can export their data at any time as an XML file. (An option to export data as a tab-delimited text file is also available, since this format is widely supported by commercially available database programs.) The XML file is validated by a schema file to enable data interchange with other third party software (for example, software for text annotation and analysis). The XML file can then be rendered by stylesheets into viewer-friendly formats for on-line or print display. (The EMELD School of Best Practice has several examples of stylesheets, each designed for a different purpose).
2) XML markup tags should be provided by a common linguistic ontology. When creating a new FIELD lexicon, the user must first set up a 'language profile', choosing the set of grammatical concepts found in the language (lexical and morphosyntactic categories), and mapping their terms to those provided by common linguistic ontology. This mapping allows users to work with and display their preferred grammatical terminology, while behind the scenes, it is linked to a common terminology set, ensuring that the XML markup will be intelligible to future generations. (The ontology used by FIELD is GOLD-General Ontology for Linguistic Description-which is being developed by EMELD's University of Arizona Research Team). When users export their data, this mapping is incorporated into the XML markup of the archive file. Another benefit of terminology mapping is it enables searches across electronic language resources. The FIELD 'search across languages' function makes use of the mapping, and we anticipate that other ontology-based search engines will be developed in the near future.
3) Unicode character encoding. FIELD fully supports Unicode, using it for data input, display, and storage. In addition, we have developed several means of facilitating the entry of IPA and other commonly used international characters from within FIELD. The first is Charwrite, a program that opens an interactive IPA chart when users double-click in a text field. If a user enters a character and right-clicks (or control-clicks on a Mac) Charwrite opens a pop-up menu of similar characters that when selected can replace the entered character. The FIELD program also allows users to define a set of keyboard shortcuts to increase the speed of entering international characters.
TO PREVIEW FIELD. The development version of FIELD is available in the EMELD School of Best Practice at http://emeld.org/school/workroom/lexicon/index.html. FIELD currently houses substantial lexical data for six typologically diverse languages: Biao Min (Hmong-Mien), MocovÍ (Guaicuruan), Potawatomi and Ottawa (Algonquian), Mongour (Southeastern Mongolic), and Ega (Kwa), and we anticipate the addition of another three languages. These language databases are available for public searching at http://emeld.org/school/search/searchlang/.