Potawatomi Case Study: From Filemaker Data to the Web

Page Index




The content of this page was developed from the research of Dr. Laura Buszard-Welcher.

Introduction

In the fall of 2003, several years after developing a Filemaker Pro database for her Potawatomi data, Dr. Laura Buszard-Welcher began working with E-MELD to use her data to showcase Potawatomi as one of the ten languages in the E-MELD School of Best Practices. A sample set of data was exported from Filemaker Pro as a tab-delimited file and was uploaded to the FIELD database. This served as a test of the FIELD upload function, as well as presenting special challenges in data conversion. The Potawatomi documentation required modification of the FIELD database and the GOLD linguistic ontology. The outcome was a lexicon in XML format which can be printed as a dictionary through the use of XSLT stylesheets.

Scanning and OCR

Dr. Buszard-Welcher received the initial data in the form of a printout of the database of Dr. John Nichols, an Algonquianist who worked on the Potawatomi language in the 1970s. She scanned the printout and ran an OCR application to convert the images into characters. The file was then manually formatted into tab-delimited text. Using Filemaker Pro, she created a flat file data structure to house the imported data and then used this program to edit and expand the lexicon while working with Potawatomi speakers.

More on OCR vs. keyboard entry

Filemaker Pro: The legacy database

Although Filemaker Pro can be a useful software tool, Filemaker Pro provides an interesting test case for best practice standards. Its drawbacks include the lack of Unicode support and the fact that it is proprietary software. Furthermore, Filemaker Pro does not restrict the linguistic terminology used, e.g. for concepts such as parts of speech and feature values. Although this flexibility is often prized by linguists, use of idiosyncratic terminology jeopardizes the long-term intelligibility of the documentation. For that reason, it is recommended best practice to define terminology with reference to a standard ontology of linguistic concepts, such as GOLD. The FIELD tool is linked to GOLD and therefore automates this function.

More on Filemaker Pro as a linguistic database

The FIELD database

FIELD was developed specifically for entry of lexical data in best practice format; it is Unicode-compliant, and has the ability to output the data as an XML document. Furthermore, the FIELD tool is linked to GOLD (General Ontology for Linguistic Description).

More on the FIELD database

Data conversion

The structure of the legacy Filemaker Pro database created some challenges for uploading the data into FIELD. Because it was designed as a flat file, the underlying structure of each record was the same, with a total of six possible fields for inflected forms (Entry Display). In Filemaker Pro data entry, the researcher had to select a particular part of speech (Entry Form); this determined the labels for the inflected forms (Animate Intransitive Verb). Each part of speech had different requirements for inflected forms so, for example, the first inflected form for an animate noun would be a plural form, but the first inflected form for an animate intransitive verb would be a first person singular form. As a result, different kinds of data were found in the same database column. This is not recommended in database design, and it meant that each part of speech had to be separately uploaded into FIELD. Secondly, columns that were created to house the labels of these fields in the flat file (such as 'ni' for inanimate noun) had to be ignored during the upload.

View the entry display

View the entry form

View the animate intransitive verb

Although it was possible to overcome the challenges presented by the Filemaker Pro database, the experience serves as a reminder that language documentation presents special challenges to database designers. Wherever possible, it is important to follow recognized principles of database design. (For more on the use of databases in linguistics, see the E-MELD 2004 Workshop on Linguistic Databases and Best Practice.)

More on the use of databases in linguistics
(E-MELD 2004 Workshop)

Click on the image to see an example of the Filemaker Pro entry display:

Entry Display

Click on the image to see the number of different entry forms created to accomodate the flat file database:

Entry Forms

Click on the image to see an example of an entry for an animate intransitive verb:

Animate intransitive verb
More on data conversion

Terminology mapping

A major part of the uploading process involved mapping Potawatomi grammatical terms to the GOLD ontology being developed by E-MELD. A number of areas of the ontology were modified and expanded by the GOLD team based on the requirements of Potawatomi grammatical features, including gender, size, evaluation, polarity, proclitics, phrase units (main and subordinate clause forms) and participles.

More on terminology mapping

More on GOLD

Text storage

With the Potawatomi sample data housed in the FIELD database, it is now possible for Dr. Laura Buszard-Welcher to modify and edit the lexicon using the FIELD tool. The data can also be exported from FIELD as an XML file. XML stands for eXtensible Markup Language. It defines a standard way of encoding the structure of information in plain text format. It is an open standard of the World Wide Web Consortium that is based on extensible tags (extensible meaning that they are not pre-programmed, but can be defined by the creator). XML is currently considered best practice for the archival encoding of textual data, because it does not depend upon any particular software, and can be formatted through an XSL Stylesheet to be displayed in many formats. Furthermore, it is generally more self-descriptive than other electronic formats, which should make it more accessible to future generations.

More on the World Wide Web Consortium

More on XML

Text presentation

XSL stylesheets are used to create example displays of the documentation, and documents in other presentation formats can easily be created. Stylesheets can be used to transform XML documents into different file formats (for instance, HTML, text, or PDF), without changing the original XML document. A stylesheet could transform the same lexicon in XML into a learner's dictionary or an academic dictionary, in online or printed versions. Thus the project demonstrates the flexibility afforded by best practices. The first Potawatomi dictionary can be digitally created from the FIELD database, the exported XML file, and stylesheets. Below, you can see the same XML document transformed by XSL to provide print and online versions of the lexicon.

View the output from a sample stylesheet for an online version of the Potawatomi lexicon

View the output from a sample stylesheet for a print version of the Potawatomi lexicon

More on transforming XML through stylesheets

Follow the path of the Potawatomi data

  1. Get started: Summary of the Potawatomi conversion
  2. Scan and OCR: OCR or Keyboard page (Classroom)
  3. Linguistic Review of Filemaker Pro: Filemaker Pro (offsite)
  4. Add Text to Database: FIELD page (Workroom)
  5. Convert Data: Conversion page (Classroom)
  6. Map Terminology: GOLD Ontology (Workroom)
  7. Store Text: XML page (Classroom)
  8. Present Text: Stylesheets page (Classroom)

User Contributed Notes
From Filemaker Data to the Web: Potawatomi
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search