Linguist Start Page

Page Index

What ethical considerations need to be made?

While E-MELD is not intended to be a resource about ethics, the issue is of great importance and should be a primary consideration. Please visit the ethics page for more information. Also, please browse the ethics section of the Reading Room for further resources.

What kinds of tools should I use?

Before collecting data, it is a good idea to be aware of the various methods of recording and digitizing it. For advice on what factors should be considered in selecting software, as well as information about existing software, please visit the Classroom's Software page.

Which tools are available?

For information on software and hardware tools available, visit the Toolroom.

What software should I use for specific kinds of data?

What should I collect when I collect data?

Do I need to collect anything besides language data?

Along with data, it is imperative to collect metadata. To learn about metadata and how to compile it, visit the Classroom's Metadata page. To learn about and use the OLAC Repository Editor (ORE), a tool for metadata creation, visit the Workroom's Metadata page.

How do I build a corpus?

Of course, a linguistic corpus is also necessary. To learn how to collect material and build a corpus, visit the Classroom's Archives page. To learn how to record this corpus in an enduring format, visit the Classroom's XML pages. To learn how to create a lexicon, visit the Workroom's Lexical Analysis and Output page.

Should I add anything to the data?

Annotating a corpus assigns meaning to the data and enables future researchers to access your insights. To learn how to annotate a corpus, visit the Classroom's Annotation page. When annotating a corpus, attention should be paid to the terminology used. To learn about terminological mapping, visit the Workroom's Terminology page. To view the General Ontology for Linguistic Description (GOLD), visit the Ontology Tree.

How have other linguists done it?

To read examples of data conversion from legacy format to best practices format, vist the Case Studies. To learn about other documentation projects, in the light of best practices, visit the Documentation Projects section of the Case Studies and the UNESCO Register of Good Practices in Language Preservation.


User Contributed Notes
Linguist Start Page
+ Add a comment
  + View comments
Paul Kilpatrick, pwk@geneva.edu
02-Feb-2006
I've done some language salvage work and video recording in Mexico and Peru.

In the mid '80's, rather accidentally, I discovered a way to get more natural spontaneous speech.

I had set up both video and audio recorders to record three elderly Shipibo women in San Francisco, north of Pucallpa. I turned on both machines and left hoping I would get natural speech.

Viewing the tape later was a disappointment because other anthropologists/linguists had been there before and these women had a pretty canned narrative. The village 'chief' assured me these were the women I needed for my data, but in fact, he was simply using women who were old enough to remember some of the more violent inter-tribal history of that area. They had recited their history to so many anthropologists that their narrative had become very canned. Certainly there was NO simultaneous speech as in their normal conversation.

After 30 minutes, however, my audio recorder clicked off at the end of the cassette. My video was still running and caught some wonderful spontaneous (simultaneous) speech as they began to problem solve.

Introducing a problem into the taping situation seems to elicite more natural speech where there is an unexpected topic (the problem) and an unscripted discourse (the solution.)

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search