Western Sisaala Case Study:
From Older Field Methods to BP
- Digitize text
- Create a database
- Digitize audio
- Digitize video
- Register metadata
- Follow the path of the Sisaala Data
The Western Sisaala documentation project began when data was collected from January 11, 2002 to April 12, 2002 during a Field Methods course taught by Martha Ratliff at Wayne State University. Along with the rest of the class, Steven Moran collected data in Microsoft Word from a language consultant that visited the class. He then decided to pursue fieldwork in Ghana since the classroom speaker wasn't as fluent as was originally thought. He started from scratch and adhered to the Best Practices that were then being formulated by E-MELD.
During Mr. Moran's three months of fieldwork, he collected a lexicon of 3500 entries in paper notebooks. After the fieldtrip was over, the notebooks needed to be digitized and archived by entering the data into a computer. When digitizing the text from his fieldnotes, he used his own set of character substitutes for speed, and used global replacement to replace them with the correct unicode character when he was finished. Mr. Moran avoided software and hardware obsolescence by creating an archival copy in plain text (.txt) format. He marked up semantic and structural content in XML, a tag-based language that is machine readable, and used Unicode character encoding because it is unambiguous.
In the field Mr. Moran used notebooks to transcribe and collect linguistic data. Out of the field, he entered the data into an MS Excel spreadsheet because current versions of Excel provide Unicode character encoding. Excel also allowed him to quickly digitize his field notebooks, and because it exports data in multiple formats he could create best practice archival copies of the data in plain text XML. However, Excel alone would not be an ideal way to archive linguistic data. Archival formats, for all media, should employ the use of transparent, non-proprietary file formats, which are the least vulnerable to obsolescence. The data was then exported into tab-delimited format and was loaded into the field database.
Mr. Moran made audio recordings of all language consultant elicitation sessions, and captured video/audio recordings of narratives with battery operated recording devices. The consultant sessions were recorded with the NOMAD JukeboxMP3 player and an Audio-Technica AT813B Lavalier Microphone. These 'born-digital' recordings adhere to the current best practice recommendations for digital archiving. To make these files accessible over the web, they had to be converted into smaller, more portable file formats. For this, Mr. Moran used copies of the audio recordings (in uncompressed WAV format) and converted them into an MP3 format with freely accessible software, which is widely available on the internet. The recordings could then be easily shared over the web via a website about Western Sisaala. The following are archival .wav files of Sisaala data:
Generally MP3 players are not recommended for recording, since most do so in a compressed MP3 format. However, the NOMAD has an Analog/Optical Line-in for direct recording from external audio devices, and it supports recording in an uncompressed WAV format at 44.1kHz. The NOMAD is small, about the size of a Discman, and has a rechargeable battery life of 22 hours. In 2003, the 40GB NOMAD cost around 350 dollars. To make his recordings, Mr. Moran hooked up an Audio-Technica AT831B - Cardioid Lavalier Condenser Microphone to the NOMAD. The AT831B is a good, sturdy Lavalier Mic that runs on batteries. It has an on/off switch, clips, and comes in a convenient microphone case. In 2005, an AT831B could be purchased for around 150 dollars. Together with the NOMAD and a mic to line-in cable, this recording setup allowed Mr. Moran to easily record and store all of his elicitation sessions and narrative recordings on portable hard disk.
The following is a presentation format .mp3 audio streaming file of a Sisaala song. Note that it is NOT intended as an archival copy.
Following best practices for video is more costly and most likely out of the financial range of students. The same general methodology applies to video as does to audio: capture the data in uncompressed formats such as .AVI, record at high quality audio, and provide technical metadata about the recording. Since Mr. Moran was on a student budget, but did not want to miss an opportunity to collect video recordings of narratives, he chose Sony's Cyber-shot digital camera. This allowed him to record video/audio in MPEG VX Fine at 640x480 resolution and 30fps. This is not ideal best practice for video; however, it allowed him to capture the only video recordings that exist of Western Sisaala speakers. At the time, the Sony Cyber-shot was one of the few cheap digital cameras that could capture both video and audio. This feature has since become more standard in digital cameras.
Mr. Moran also chose the Sony Cyber-shot because it runs on AA batteries, which he could easily carry with him in the field. Note that many, if not most, digital cameras have their own rechargeable power supply that requires them to be plugged in to an electrical outlet. In 2005, a Sony Cyber-shot cost around 300 US dollars. You may also wish to purchase larger flash memory than is standard. A 1GB Sony Memory stick cost 100 dollars in 2005. Mr. Moran used a 256kb memory stick and was able to record 12 minutes of video/audio before having to transfer the contents to a laptop.
Data that is lost is as extinct as a language without speakers or records. Therefore, one important but often overlooked aspect of creating linguistic resources is to make their existence known. Though you may wish to do this by putting your materials online, or by depositing them in your University's library, you should also create metadata which is accessible in the OLAC catalog. OLAC (Open Language Archives Community) is a partnership of archives and institutions that has created a virtual library of language resources, searchable through the Linguistic Data Consortium and The Linguist List. Mr. Moran registered his fieldwork resources with the OLAC catalog; by providing metadata for their search engine, he made the existence of his fieldwork data known to other researchers who may be interested in Western Sisaala or other languages in this area. To make your resources known to the OLAC catalog (also known as 'becoming a data provider' - though you do not necessarily need to provide the actual data), there is a simple interface provided at The Linguist List called the OLAC Repository Editor.
- Get started: Summary of the Sisaala conversion
- Digitize Text: Text section (Classroom)
- Create a Database: XML section (Classroom)
- Digitize Audio: Audio section (Classroom)
- Digitize Video: Video section (Classroom)
- Register Metadata: Metadata section (Classroom)
|About the Data|
Case Study: BP
Create a Database
|About the Language|
|User Contributed Notes
E-MELD School of Best Practices: From Older Field Methods to Best Practice
|+ Add a comment|
|+ View comments|