How to Establish an Archive
- Develop a Mission Statement:
- Define Procedures and Policies for:
- Develop a Triage Strategy For the Following:
- Organize Data
In establishing an archive, many considerations must be made. Creators must develop a mission statement for the archive, choose standards for acceptance of materials, define procedures and policies, develop a triage strategy for materials, and create an effective organizational scheme. This page provides helpful information on each step of this process.
Publish a mission statement that clearly defines:
- The scope and scale of the collection (e.g. Algonquian languages);
- Where the resources will come from (e.g. legacy materials from researchers);
- Who will be the archive's primary users (e.g. speaker communities).
Standards may be tiered, or scaled, to reflect the realities of field recordings and legacy materials. For example, MiniDisc recorders produce compressed digital objects, which are not the best possible quality, but may be acceptable because of the relatively low cost and ease of use of these devices in the field.
Strive for highest quality, but do not let the best become the enemy of the good: if a 'less than standard' digital object is all there is, archive it anyway. Don't reject resources from speakers.
- Acquisition of materials, including a triage strategy for assessing collections and prioritizing the digitization schedule;
- Dissemination of materials, including access restrictions, interface languages, etc.;
- Quality assurance (verifying the validity of digital files over time);
- Deciding on digitization standards, changing standards to reflect changes in technology, and forward migration to new digital formats;
- Disaster recovery - backups, mirror sites, etc.
- Materials deposited by speakers.
- Recordings made on obsolete or fragile media.
- Current storage location (if too hot/cold wet/dry, materials are at risk).
- Data from extinct or moribund languages.
- Cultural treasures.
- Support for the goals of speaker communities.
- Breadth of coverage for some region.
- Depth of coverage for languages represented in the archive.
- Quality of supporting materials: metadata, transcriptions, and translations.
Arbitrating among these criteria must be done on a case by case basis and reflect the priorities of the archive as defined by the mission statement. The order given does not reflect any necessary schedule of priorities.
Recommended practice is to build or acquire a software infrastructure for maintaining metadata, administrative information and user information. The standard is neutral with respect to implementation: it may be a file based system that uses XML tagged headers read by PERL scripts, like the MPI Corpus Browser, or it may be a database system. AILLA and ANLC use custom built MySQL databases; PARADISEC is using a Filemaker PRO System, though it intends to replace this with a MySQL database.
Crucial considerations are:
- Cost: Custom software requires a skilled programmer and ongoing skilled maintenance; free systems may have sketchy support and may cease to exist with little warning; proprietary systems like FilemakerPro have license restrictions, charge for upgrades, and cannot be modified at the source level.
- Scalability: Databases will scale better over the long run, which means they can handle more records (hundreds of thousands) more efficiently than file based systems, but file based systems may be easier to implement and maintain.
- Metadata: How hard it is to encode, enter, and modify metadata with the system.
- Interfaces: How well the system serves the archive's interfaces (web based, paper, on site catalog terminals, etc.)
- Open Language Archives Community (OLAC)
- EU-US Working Group on Spoken-Word Audio Collections
- The Digital Library Federation
- The Pacific Manuscripts Bureau
- http://jodi.ecs.soton.ac.uk/Articles/v04/i02/Chapman/ ("Counting the Costs of Digital Preservation: Is Repository Storage Affordable?")
Creating a Corpus
How to Find an Archive
How to How to Establish an Archive