OCR or Keyboard?

Digitization of language documentation often involves conversion of paper records, such as lexicons, grammars, or narratives, to an electronic format. Scanning a document creates an image of the page, but in order to create a textual file that can be edited, OCR (Optical Character Recognition) software is needed. OCR reads in the images on the page and interprets them as characters, saving them in a format that the computer can search or index, such as Unicode or ASCII.

However, OCR is only useful for certain kinds of documents; for others, it is better to type in the data with the keyboard. Here are the questions to ask before deciding whether to OCR or to keyboard your documents.

In summary, OCR can be a valuable tool for the digitization of cleanly printed documents using standard Roman orthographies, but it requires careful proofreading. When in doubt, it may be better to enter text through the keyboard.

The content of this page was developed following the recommendations of the E-MELD working groups and the Library of Congress.

User Contributed Notes
OCR or Keyboard?
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search