Tofa Case Study: Online Video

Page Index

The content of this page was developed from the data of Dr. K. David Harrison.


The Tofa stories available here were recorded by Dr. K. David Harrison in 2000 and 2001, for a project funded by a grant from Volkswagen-Stiftung. The recordings were made in the field on a Sony ZR-10 camera in mini-DV format. Later they were transferred to MPEG video format and saved on DVD. Dr. Harrison used the ELAN annotation tool (developed by the Max Planck Institute in Nijmegen) to time-align transcriptions and free English translations. The video files and the EAF (Eudico Annotation Format) files which ELAN writes were then contributed by David Harrison to the E-MELD project. Further modifications have made it possible to present these files online.

Video digitization

The Tofa stories were videotaped in the field, using a Sony ZR-10 camera in mini-DV format. Because the quality of built-in microphones is inadequate for recording endangered languages, the sound was recorded in stereo on a Sony Electret Condenser Stereo microphone. Later, the sound was extracted and saved in a separate WAV file, and the video files were converted to MPEG format.

More on digitizing video

Video annotation

The digital video was then annotated using ELAN (EUDICO Linguistic Annotator), which creates multitiered files aligned with the video. The first tier was phonetic transcription, in IPA with modifications commonly used in Turkological studies. The second tier was a morphosyntactic analysis, and the third tier was a free English translation.

More on ELAN

Text storage

The file extension for the ELAN application is .EAF (EUDICO Annotation Format), but the file structure is based on XML. The E-MELD project then converted these EAF files into the XML format used at E-MELD. One of the advantages of storing text as XML is that it can be easily reformatted in a different structure; thus, it is especially helpful when data is being exchanged between researchers. XML stands for eXtensible Markup Language. It defines a standard way of encoding the structure of information in plain text format. It is an open standard of the World Wide Web Consortium that is based on extensible tags (extensible meaning that they are not pre-programmed, but can be defined by the creator). XML is currently considered best practice for the archival encoding of textual data, because it does not depend upon any particular software. It is generally more self-descriptive than other electronic formats, which should make it more accessible to future generations.

More on XML

Video presentation

Using an XSLT stylesheet, these XML files are transformed into versions suitable for display on the web. Stylesheets can be used to transform archival XML documents into different file formats (for instance, HTML, text, or PDF). In this case, the files can be viewed with a standard web browser; Java applets allow the Tofa video to be synchronized with the Tofa text display, so that all four annotation tiers are seen along with the video.

More on stylesheets

Follow the path of the Tofa data

  1. Get Started: Summary of the Tofa conversion
  2. Digitize Video: Video page (Classroom)
  3. Annotate Video: Annotation page (Classroom)
  4. Store Text: XML page (Classroom)
  5. Present Video: Stylesheets page (Classroom)

User Contributed Notes
E-MELD School of Best Practices: Online Video
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search