Monguor  (蒙古儿,土) Case Study:
From TASX to the Web  

Page Index

The content of this page was developed from the research of Dr. Arienne Dwyer,
Dr. Wang Xianzhen, Dr. Limusishiden (Li Dechun) and Ms. Lu Wanfang.


There are thousands of minority languages which are either partially or completely undescribed, and the majority of these are spoken by dwindling populations. There is a dearth of field workers to collect data on these, and it is very unlikely that most of the languages will be documented before they disappear. The Monguor data presented here constitutes an attempt - and a successful one - to make up this shortfall in an uncommon way, by using native speakers of the language to collect the material we need.

Project Details

The Monguor data as originally collected consisted of video and audio recordings collected from the field workers by the local collection teams approximately every six weeks; the Field Coordinator was Dr. Wang Xianzhen. This data was then annotated using Transcriber, to produce a transcription in the Monguor orthography (based in large part on Pinyin) and a free translation in either Chinese or Tibetan. This transcription was then sent to a team at the University of Kansas for post-processing, at which time it was aligned, standardized, and translated from Chinese or Tibetan into English, using TASX. From these original documents and other material Dr Wang produced a lexicon, which consists of data input directly into TASX.

More on Transcriber

More on TASX

More on annotation

Convert data

For display on the E-MELD site, the XML documents generated by TASX were converted into the XML format used by the FIELD database. Since TASX uses XML as its exchange format, this transformation was straighforwardly accomplished using XSL and a stylesheet. Once they were in an XML format which conformed to the FIELD schema, they were easily uploaded into the FIELD database for quick searching.

More on FIELD

More on stylesheets

Create a lexicon

Once in the FIELD database, they were also ready for transformation through additional stylesheets written to render lexical information in a variety of presentation formats.

More on lexicons

Present data

The TASX XML files which had been made available to E-MELD included time-aligned transcriptions, morpheme-by-morpheme analyses, literal and free English translations, and Chinese translations. It was thus easy to convert these into a presentation format that could be displayed on the Web. A Java Applet was adapted by Dr. Edward Garrett to read the XML file, and play an accompanying video file through a standard browser; it displays the time-aligned transcription one line at a time, in synchrony with the playback of the video. Although you must have Java and Quicktime installed on your machine, you do not need a special browser in addition.

Java installation

Quicktime installation

View Monguor texts

More on presentation video formats

Follow the path of the Monguor data

  1. Get Started: Summary of the Monguor conversion
  2. Convert Data: Conversion page (Classroom)
  3. Create a Lexicon: FIELD tool (Workroom)
  4. Present Data: Stylesheets page (Classroom)

User Contributed Notes
E-MELD School of Best Practices: From TASX to the Web: Monguor
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search