Kayardild Case Study:
From Shoebox to the Web (IGT)

Page Index

The content of this page was developed from the research of Dr. Nicholas Evans, Dr. Baden Hughes, Ms. Cathy Bow
and Dr. Steven Bird.


Kayardild, also known as Gayardilt, is a non-Pama-Nyungan language of the Tangkic family, although it has some Pama-Nyungan features due to sustained contact. It is spoken on Bentinck Island, in the Gulf of Carpenteria in northwest Queensland, Australia. The language is critically endangered, with fewer than 10 fluent speakers remaining.

The Kayardild data was collected by Dr. Nicholas Evans. Dr. Evans' data set includes audio recordings, an interlinear glossed text, a lexicon and metadata. All of the examples given on this page are drawn from the first ten sentences of "Darwin Moodoonuthi: The cave at Wamakurld", in Evans 1995:571-572.

Record audio

The audio recordings collected by Dr. Evans are presented in both wav and zip formats. They were later incorporated into an interlinear glossed text with timing references.

Wav sample

More on audio recordings

Interlinearize text

The interlinear glossed text was compiled according to the recommendations of best practices as suggested by Bow, Bird and Hughes (2003) and incorporates representations at the text, phrase, word and morpheme levels. Comments are included at the beginning of the document and the audio recordings are time referenced.

E-MELD presents two copies of this data. One is in an archival XML format, designed for portability and long-term intelligibility. The other is a presentation PDF format, designed for attractive and easily decipherable presentation.

XML uses descriptive tags to mark up data, which makes it more portable than other mark-up languages that use tags to format data. Using these tags, linguists can describe the data being presented in a hierarchal manner. For example, the tags <word> and </word> can surround a word in an IGT, and the tags <phrase> and </phrase> can surround a phrase. This way the data is human readable and, if the tags used are consistent with generally accepted standards (such as GOLD), the data can be universally understood.

Kayardild XML IGT
(may not display correctly on non-XML-compatible browsers)

More on XML

More on GOLD

PDF is a presentation format that captures a document with the formatting as in the original. Viewing a PDF file requires a reader, and the format is not appropriate for archival purposes. It does, however, make an attractive and easily deciphered document for presentation purposes.

Kayardild PDF IGT

More on IGT

Build a lexicon

The Kayardild lexicon, which contains many unanalyzed elements, was compiled in Shoebox; Shoebox is supported by SIL and is a program much favored by linguists because of its flexibility. However, Shoebox formats become obsolete quickly and are therefore not in accordance with the recommendations of best practices. Furthermore, it can be difficult to convert Shoebox data into recommended formats that have greater potential for long-term intelligibility.

SIL Shoebox page

Fortunately, the lexicon was also compiled using the FIELD tool, which creates an XML document of the data. XML is based on extensible tags; the tags are not pre-programmed, but can be defined by the creator. Because it does not depend upon any particular software and can be formatted through an XSL Stylesheet to be displayed in almost any format, XML is in accord with the recommendations of best practices for the archival encoding of textual data.

More on FIELD

More on lexicon development

Collect metadata

Metadata was collected and deposited in the ORE (OLAC Repository Editor). OLAC (Open Languages Archive Community) metadata focuses on 15 main elements which would ideally be present in any metadata file (however, it is entirely up to the metadata creator which elements to use, and how many times they should be iterated). The OLAC Repository Editor is a series of online forms used to submit information to the OLAC about language data or documentation. ORE will convert the data into the XML format accessible to the OLAC search engine.

Kayardild metadata
(may not display correctly on non-XML-compatible browsers)

More on ORE

More on metadata

Follow the path of the Kayardild data

  1. Get Started Summary of Kayardild conversion
  2. Digitize Audio: Audio page (Classroom)
  3. Interlinearize Text: IGT page (Classroom)
  4. Build Lexicon: Lexicon page (Classroom)
  5. Collect Metadata: Metadata page (Classroom)

User Contributed Notes
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search