Digitization of Existing Images
A digital image is a picture represented electronically as bits or bytes. It is an electronic snapshot taken of a scene or scanned from existing documents such as photographs, manuscripts, printed texts, and artwork.
In the following pages we have tried to provide helpful information for preserving and presenting digital images. These pages discuss digitization quality issues, archival and presentation formats, and metadata for images. It may help to first familiarize yourself with some of the terminology that will be used.
The first decision that must be made regarding image capture concerns the purpose of the images being created. The end-users, required equipment, and storage facilities need to be kept in mind. Are the images simply for web delivery? See presentation format. Or are there preservation issues that must be considered? Then see archival format. Best practice for language documentation is to scan a high-quality archival image, and then create smaller presentation and thumbnail forms from that image. The higher quality the image needs to be, the higher the settings necessary for scanning.
Modern digital cameras enable digital images to be created comparatively easily. If you wish to archive these images, be aware of what format the camera is using. Some only store images in a compressed format, while others store both compressed and uncompressed images, the latter being considerably better for the archivist.
If you wish to store your existing photographs, film, or paper materials on a computer, these resources need to be digitized. The process of digitization involves scanning, adjusting the image, and uploading.
Digitization of language documentation often involves conversion of paper records, such as lexicons, grammars, or narratives, to an electronic format. Scanning a document creates an image of the page, but in order to create a textual file that can be edited, OCR (Optical Character Recognition) software is needed.
Metadata is information about resources. In this context, it is information about language resources: lexicons, audiotapes, transcribed texts, language descriptions, etc. It is analogous to card catalog information about library resources, in that it enables the discovery and retrieval of resources through standardized, machine-readable information. Metadata is becoming very important to the linguistics community, for it gives us the ability to find language resources in the vast and rapidly expanding realm of the Internet.
More information on image digitization is available in the school's reading room
OCR or Keyboard?