Navajo Case Study: Ultrasound Analysis

Page Index

The content of this page was contributed by Dr. Douglas Whalen.


In 2003, at Haskins Laboratories, Navajo native speaker Elise Whitehead was recorded using a wordlist emphasizing sh and s, laterals, and prefix-noun utterances. An audio recording was made, accompanied by ultrasound imaging, using the Haskins Optically Corrected Ultrasound System, or HOCUS (Whalen et al., 2005). This system allows them to relate the ultrasound's picture of the tongue surface to the rest of the vocal tract (the palate and rear pharyngeal wall) by compensating for movement of both the head and the ultrasound transceiver. In this way, they get a good look at what is going on throughout the vocal tract, including the upper pharynx, which is typically hard to image. Later the ultrasound was transferred to AVI archival video format. JD Ross Leahy used Transcriber and ELAN to time-align audio and video files with transcriptions contributed by Joyce McDonough. The silent movie file, the audio file, and the EAF (Eudico Annotation Format) file were then contributed by Doug Whalen to the E-MELD project. Further modifications have made it possible to present these files online.

Ultrasound for phonetic fieldwork

Ultrasound is useful in phonetic fieldwork because it enables a linguist to record accurate images of dynamic tongue shape during speech. By analyzing these images, a linguist can study the actions of the tongue root and sagittal groove, the interaction between vowels and lingual consonants, timing of articulatory events and overall tongue shape.

Ultrasound images can be recorded to a standard video recorder, and sound should be simultaneously recorded to a video recorder and/or audio recorder. All ultrasound units are safe for linguistic work. There are portable units that are relatively inexpensive and easily taken into the field.

More on ultrasound for phonetic fieldwork

Video digitization

JD Ross Leahy converted the original MOV file to AVI format with a freeware program called RAD video tools, produced by Bink and Smacker. As attempts at conversions with no compression produced corrupted files, Cinepak Codec by Radius compression algorithm for AVI formatted movies was chosen as the one that seemed to produce the best quality (at 100%).

More on digitizing video

Editing of the original movie

To protect the rights of participants whose permission had not been granted for use of remnants of their speech on the Internet, the audio portion of the movie had to be edited.

First, the audio was extracted from the original MOV file, by opening the MOV file into iTunes and burning an audio CD with the MOV file as one of the tracks, then removing the CD, replacing it in the CD drive, and importing the audio track back onto the hard drive as a WAV file.

The WAV file was edited by pasting white noise over the voices of speakers who declined to give permission. Care was taken to maintain the timing of the file so that it would still match up with the ultrasound imaging.

More on ethics


To delineate the movie into segments, the file was parsed in Transcriber. This created the time breaks in the file and aligned the Navajo orthographic transcriptions (previously recoded to a Unicode based font - Gentium) to respective speakers' audio and video recordings.

First ELAN (EUDICO Linguistic Annotator which creates multitiered files aligned with the video) was used to associate the silenced video and the edited .wav file; then the Transcriber file was imported into ELAN.

More on Transcriber
More on ELAN

Text storage

The annotation produced in ELAN is saved as an EAF (EUDICO Annotation Format) file as it is associated with ELAN, but the native format of annotation file is XML. The E-MELD project then converted this EAF file into the XML format used at E-MELD.

One of the advantages of storing text as XML is that it can be easily reformatted in a different structure; thus, it is especially helpful when data is being exchanged between researchers. XML stands for eXtensible Markup Language. It defines a standard way of encoding the structure of information in plain text format. It is an open standard of the World Wide Web Consortium that is based on extensible tags (extensible meaning that they are not pre-programmed, but can be defined by the creator). XML is currently considered best practice for the archival encoding of textual data, because it does not depend upon any particular software. It is generally more self-descriptive than other electronic formats, which should make it more accessible to future generations.

More on XML

Metadata creation

JD Ross Leahy produced a tab delimited metadata file (.txt) that incorporates elements of Dublin Core metadata set and OLAC (Open Languages Archive Community). This has been converted to an XML (Extensible Markup Language) file.

Metadata is information about resources. In this case, it is information about language resources: lexicons, audiotapes, transcribed texts, language descriptions, video recordings, etc. It is similar to card catalog information about library resources -- it enables discovery and retrieval of resources through standardized information.

The basic requirement for Metadata is that it should have a structured, unified and standard format so that it can be easily retrieved by mechanical, internet-based search engines like the OLAC harvester. Currently, Open Languages Archive Community (OLAC) metadata is based on the fifteen elements of the Dublin Core metadata set and is created with XML, Extensible Markup Language. Initiatives that attempt to standardize linguistic metadata, like OLAC, are important to the preservation of data.

More on Metadata

Video presentation

Using an XSLT stylesheet, the XML file is transformed into versions suitable for display on the web. Stylesheets can be used to transform archival XML documents into different file formats (for instance, HTML, text, or PDF). In this case, the file can be viewed with a standard web browser; Java applets allow the video to be synchronized with the text display, so that annotation is seen along with the video.

More on Stylesheets

Additional analysis and challenges to web presentation

When the tongue shapes are analyzed in the lab, MATLAB procedures are used to fit a spline to the data that is then othe best measure of the tongue surface. These extracted tongue shapes provide an easier curve to measure than the raw video; if the extracted curve is then corrected for head movement (via the HOCUS system), it provides even more information. However, at present, the display of this curve is dependent on the presence of MATLAB on the user's machine; MATLAB is far from universally available and is not inexpensive. Thus there is an issue of how much of the derivative data to present. A second movie could be made with images of the (corrected) MATLAB curve, but this would then still be difficult measure and interpret; keeping it as a mathematical object within MATLAB is much more amenable to further measurement. Thus it is not clear how much of this data should be included in the publicly accessible database.

There may be other annotation layers that would be of use as well. From the combination of the palate image and the tongue image, it is possible to estimate such articualtory parameters as the time of contact for a consonant, or the time of minimal movement during a vowel. It would be possible to time-code such events into a new markup layer, but it is again not clear whether these would be of general enough use to warrant their inclusion. One can imagine that researchers might want to find instances of utterance in which the minimal velocity for the tongue was closer to an initial rather than a final stop, but the number of such questions is unbounded and rather unpredictable. It seems more likely that it would be useful to allow researchers to generate such markup layers from the data according to their own criteria, but this would imply that a common representation of the data exists. MATLAB is currently close to being a de facto standard, but, again, it is a program that has to be purchased and installed for procedures calling it to be useful. This, and other issues, will have to await further developments before best practice can be recommended.

Follow the path of the Navajo Data

  1. Get started: Summary of the Navajo conversion
  2. Digitize audio: Audio pages (Classroom)
  3. Digitize video: Video page (Classroom)
  4. Convert characters to Unicode: Conversion page (Classroom)
  5. Align text: Interlinearized glossed text pages (classroom)
  6. Annotate video: Annotation page (Classroom)
  7. Store text: XML page (Classroom)
  8. Present video: Stylesheets page (Classroom)

User Contributed Notes
Ultrasound Analysis: Navajo
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search