The power of the computer and the network as tools for enhancing our communication with the world around us is nothing short of majestic. In he history of culture, only the introduction of writing itself and of printing technology can be compared with this moment (O'Donnell 1995, 1).If this view is a little to the left of hyperbole for some, the following statement, that seems to mark the birth of the information age, should be considered: "The world has arrived at an age of cheap complex devices of great reliability, and something is bound to come of it" (Bush 1945, 3).
Any agent of social and cultural change this pervasive is bound to affect our institutions in like degree. The library is arguably the institution that stands to undergo the most radical change. This is due to the role that the library has played in society throughout history. Our libraries have historically functioned as the repository for the collective information of our culture. As the form of the information changes, the library must change concurrently. The responsibility doesn't end here, however; our libraries must also preserve and provide access to this information in egalitarian fashion. In other words, the collection must be maintained in a manner that insures its life, and the user must be able to successfully find it through the use of the bibliographical controls that the library provides. Regardless of the seemingly infinite issues that accompany the type of cultural and social revolution taking place today in global communications, the library needs to stay focused on its two elemental reasons for being: preservation and access. "A record, if it is to be useful to science, must be continuously extended [preserved], it must be stored, and above all it must be consulted [accessed]" (Bush 1945, 3). Another more contemporary author (Graham 1995, 332) contends that:
The ability of the scholarly community to give serious weight to electronic information depends upon their trust in such information being dependably available, with authenticity and integrity maintained...Users will expect information to be available that was placed in the library's care a long time ago.Here we see the importance of preservation from the point of view of the scholarly community. It is of utmost importance to the scholar that the information that he has both referenced and used as a foundation in his work be available indefinitely in the future. This is how our culture builds on its knowledge - in a cumulative fashion. The relationship between the advance of our culture and its dependance on the preservation of our knowledge to do so, is often alluded to:
Culture-any culture-depends on the quality of its record of knowledge. If that record is defective...the quality of the culture is at risk. The pursuit of knowledge is a process in which the emergence of new knowledge builds on and reconstructs the old. Knowledge cannot advance without consistent and reliable access to information sources, past and present (Waters 1995, 2).
...develop strategies for the entire range of knowledge media: stones, papyri, vellum, paper, video, audio, and digital. For analog information, we must develop triage strategies for the past; for digital, triage strategies at the point of acquisition or creation. (Battin 1993)The mission in all cases will have to remain focused on preservation of and access to all materials in the collection.
At present, we are further along on some of these points than others. I'd like to present an overview of the current developmental state of each of these three essential priorities.
The newest technological advance under construction has been dubbed ‘Internet II' (Deloughry 1996), and will increase bandwidth, or speed of data transmission, several fold.
One estimate from Cornell University has set the goal of delivering the following bandwidths within the next five years (Hirtle 1996):
According to Miller (1996), the prevalence of existing Internet structures insures that the National Digital Library of the future will incorporate them for transmission of its data. Digital information in this system can be instantly transmitted via linked communications networks that transcend time and space limitations (Kenney and Personius 1992). In fact, immediate and available access to pertinent information directly affects the quality of scholarly production and advancement (Waters 1991). Without a doubt, the real strength of digital technology is its capability of searching and retrieving documents quickly from remote locations. (Conway 1994). It is undeniable that the combination of digital technology and the broad National Information Infrastructure provides instant access to stored information that had been unthinkable just a few years ago. This technology also affords the added benefit of limiting usage and handling of the artifact that has been made available digitally.
One program that has greatly advanced the development of digital technology for preservation and access is the Brittle Books Program. This program was established in the mid- 1980s as a result of the critical condition of many books printed on acidic paper stock between 1870 and 1988. The National Endowment for the Humanities responded by funding a massive program intended to conserve these unique records of America's heritage. This on-going effort involves inter-library cooperation in the planning and implementation of the program whose goal is to reformat to microfilm three million crumbling books selected from high-quality research collections (Farr 1992). Recently, several digital library pilot studies have been awarded NEH grants as well, and as such can now be regarded as essential components of this vital preservation effort. The Brittle Books Program serves as an invaluable model for the digital archives concept because it involves considerable cooperation and planning to avoid duplication of effort. Global databases like OCLC and RLIN are available and necessary to make it possible for different libraries to queue materials and collections to be digitized in a cooperative fashion. Through these databases, libraries declare their intention to film items, and are awarded NEH funds to complete the job at their own pace. With these electronic markers in place, other libraries will avoid duplication of effort.
The selection principles used to target collections for microfilming should translate equally to digital reformatting. The targeted collections are "recognized by scholars and by the library community as having extraordinary past, present, and future research value" (Gwinn and Mosher 1983). Generally, the information content of materials is of primary importance in its selection process for preservation (Trader 1993). In this case, the information contained within the item is deemed more valuable than its artifactual or intrinsic value as an object for research. This yes/no gate is only the first step, however. Additionally, the collections that have been identified as being in poor shape due to brittleness, poor storage conditions, or high-use conditions are identified as high priority collections for preservation. Some items are so brittle or otherwise damaged that conservation is not an option, the only choice is to reformat. Generally, the collections that are to be filmed and/or digitized have to demonstrate both recognized research value and critical physical condition.
In several studies, many of these microfilmed books have also been subsequently digitized in pilot projects such as Project Open Book at Yale University (Conway 1996). However, two schools of thought exist today concerning which should be the initial step: film first, then digitize from the film master, or digitize first and create the microfilm from the electronic images. From the standpoint of image resolution, a primary determinant in this question, the "rule of 25.4" (Jones 1993) is a very strong argument in favor of the former procedure. In short, for a computer image to match the resolution of high resolution microfilm, the item would need to be scanned at over 5,000 dots per inch. This scan rate is a literal impossibility for many reasons. Scan time and the incredible storage space required make this process a practical impossibility, not to mention that present day scanners are not designed to scan at this high rate. Generally, the logical procedure is to film first, then scan from the film, due to the fact that preservation microfilm is the higher resolution format. There are other sound reasons for adopting this procedure. The preservation microfilm process has been rigorously tested and standardized, and the life expectancy of microfilm is in the 500+ year range (Jones 1993). The microfilm master, if properly stored, is quite simply the most stable reformatting method available today.
However, it is also well documented that users will "go to any length to avoid having to use microforms" (Trader 1993). By contrast, the digital library concept seems to be a veritable revolution for information accessibility. One can scarcely ignore that digitizing rare books, manuscripts, maps, special collections, and other one-of-a-kind primary source material can make it accessible to large numbers of users simultaneously, as opposed to the one-user/one item traditional model. This seems a best of both worlds solution. The high-resolution microfilm master can be safely archived, and retrieved when needed to generate new high-use, highly accessible digital versions. Yale University's "Project Open Book" has hypothesized that "research libraries will choose...to maintain information on microfilm for long-term preservation and in digital image form for ease of access" (Conway 1994). This process could also serve to mitigate one of the primary problems with digital technology - software migration. New digital files in successive software generations could be created as needed from the microfilm master.
So it would seem that in terms of its access characteristics alone, the age of digitized information and transmission is an unqualified success. But can we regard digitization as a stand- alone preservation technique capable of preserving our materials apart from the hybrid approach suggested above? How stable is this data and its accompanying media? Or, to put it another way, is digitization an effective tool for long-term preservation?
The question of digital longevity has generated considerable discourse and research, and is generally recognized as the strongest argument against a total commitment to the digital medium by the world's libraries. There are numerous pilot studies ongoing at many of the world's most prestigious libraries: Case Western Reserve University, Columbia University, Cornell University, Pennsylvania State University, Princeton University, University of Michigan, University of Southern California, University of Tennessee, Yale University, and the Bodleian Library, Oxford University. Although it is difficult at this point to categorically state that digital information will ever match the present status of microfilm as the recognized medium for preservation of materials (Waters 1994), the ease, speed and flexibility of access to digital data make it an attractive candidate for further research as a preservation tool as well.
Several pilot tests, while using conspicuously conservative language in their reports, announce positive results. "The Yale University Library envisions a future in which digital image technology comprises a critical tool in the process of preserving access to the deteriorating materials in its rich and valuable collections" (Waters 1991, 38). Cornell University reports:
...this study has convinced Cornell of the value of digital technology to preserve and make available research materials. The greatest promise of digital technology as a preservation option [in the short-term] is to improve access to materials. (Joint Study in Digital Preservation 1996, 1)Both of the previous carefully worded statements neatly side-step the question of long term preservation in the digital mode. This idea seems to invoke a more tentative stance that speaks of the need to periodically refresh digital information to lengthen permanence, but questions the permanence of the media that is the only means of access (Trader 1993). The distinction that needs to be made here is that the stored digital information and the media required to access it are two entirely separate entities, which represent two separate preservation strategies. According to Rothenberg (1995), information stored in the digital format can theoretically last indefinitely, but the physical media of storage is temporal and short-lived. The truth (which is not directly addressed by any of the projects) is simply this: at this point it is just not humanly possible to predict the long-term preservation properties of the digital medium. The technology is too new, too volatile, too changeable. Digital preservation research itself is only roughly six years old.
However, what can be learned from the past is quite sobering. There are countless examples of electronic dinosaurs, warehouses filled with bulky and out-dated computer equipment, all of which were the state-of-the-art in their day. In a valiant effort to avoid repeating these well-documented errors of the past, several scenarios have been suggested. One plan of action addresses the need for long-term commitments by institutions to achieve long-term preservation of the media (Graham 1995). While this position cannot really be disputed, in concrete terms it is vague and non-specific. The challenge for the future of digital technology is precisely this:
Since obsolescence is inherent in technological development, it is necessary for systems planners to ensure that new generations of information technology are backward-compatible, i.e. that they can read and convert information stored by older technologies (Conway 1991, 34).This rational plan for proceeding in the development of the online digital library concept may be as concrete as we can be at this point in its development. What this esteemed author does NOT address however, is: just how far backwardly compatible will each new software be? One generation? Three generations? Traditionally, backward comparability only extends two or perhaps three software generations. At some point, materials stored digitally will have to be entirely translated to a new software generation, and it would be prudent to perform this process with each and every software evolution.
One possible alternative to this procedure might be a databank of software programs, or a ‘data warehouse', available online by subscription, that would make it unnecessary to translate digital data to new software generations. In this scenario, the digital data could be stored in its original form, and the necessary software to read it be accessed on an "as needed" basis (Rothenberg 1995).
But do relevant models exist for the concept of perpetual migration? Consider for example, early recordings of Enrico Caruso or Claude Debussy performing his own piano works. These recording were originally made on wax cylinder, but exist on my record shelf wonderfully accessible on audio CD. The odyssey taken by the audio information stored on these CDs is both interesting and illuminating. Presumably, these recordings migrated first from wax cylinders, to laminated disks, to 78 rpm records, to 33 records, possibly to cassette or reel-to-reel tape, and finally to digital technology stored and reproduced in the CD format. These successive formats also represent several quantum advances in the art of sound technology. As with digital technology, as the medium evolves, so too does its corresponding medium of access. One would hope that successive platforms demonstrate some improvement over previous generations, as it generally has in the art of audio storage and reproduction.
Photocopy is the reformatting method of choice when a paper copy must be generated and placed on the bookshelf (Kenney, Personius 1992), since neither microfilm nor digitization produce paper copies directly. Estimates put the cost of this single photo copy in the range of $65 to $75. If a paper document is not a requirement, microfilm is still the preferred method, but the competition from digital imagery makes this an increasingly difficult choice to make. Even when factoring in ten-year storage, refreshing, overhead for digital technology; and the creation of archival and print masters for microfilm; the two are closely comparable in cost. Digital technology does demonstrate a distinct economy when compared with one-up microfilming at $46 vs. $70/item. When compared with two-up microfilming, the digital process is more expensive: $46 vs. $36/item. Digitization does represent a distinct economy when subsequent printed photocopies are needed, provided the original copy was generated from digitally scanned images. This process avoids the labor-intensive step of preparing the document for scanning, and the new printed copy can simply be generated from the stored digital files. The actual costs in one study indicated that a subsequent printed copy using photocopy technology still costs around $65, while a new copy generated from digital files would only cost $15. Obviously, any additional copies needed in the future would bring down the overall average cost per paper copy. For those libraries that prefer a paper document version, this "print on demand" process will also enhance access. "These findings indicate that when the need is to replace paper with paper, the use of digital technology is economically preferable" (Kenny, Personius, 1992). It is also probable that all three reformatting techniques will continue to be incorporated for the foreseeable future in library preservation plans "depending on the type, value, and use of the materials involved" (Trader 1993).
Standards for digital image resolution and tonal reproduction, the two elements that affect digital image quality, have recently been well established. For a cooperative system of seamless digital collections to function effectively as a whole, resolution standards are essential so that all elements display similar detail and readability. Several useful bench marks have evolved. The very precise resolution standards developed for microforms, the Quality Index, can be a useful test for evaluating digital image resolution as well. To state it as simply as possible, in the Quality Index, the dots per inch setting to be used in the scanning process is based on the smallest significant character (usually the lower case "e") in the source document. The dpi rate is adjusted to achieve the desired resolution in the reformatted version. There is an inverse relationship between text size and scan rate, or in other words, the smaller the text in the source document, the higher the scan rate employed. One recent study concluded that a scan rate of 600 dpi and 8 bits of grayscale is adequate to capture the specified level of detail in a variety of brittle documents from the nineteenth and twentieth centuries (Kenney and Chapman, 1995). This scan rate has also been proven effective for reproducing illustrated texts, containing line art or half tones. This setting does produce some large files, depending on the dimensions of the document page, so to conserve memory space, the scan rate can be decreased in some situations. As the size of the lower case "e" increases in different documents, the scan rate can be decreased. Color scanning requires a higher bit rate, either 24-bit or 32-bit for acceptable color reproduction. A 24-bit color file will be exactly three times as large as its 8-bit black and white counterpart, so ample planning for storage of these large data files will have to be mapped out in advance.
Both black and white and color files can be archived in their original form (usually a TIFF file), and then compressed for Internet/digital library access in a JPEG file format. This JPEG file often represents as much as a 90% reduction in file size, while retaining an acceptable image quality. For example, a 5 megabyte TIFF image can be compressed to a 500 kilobyte JPEG image for Internet access.
Digital technology is being rigorously tested in many of our most prestigious library and archival institutions. While it may be unrealistic to expect digital technology to ever completely replace preservation microfilm and photocopying as a tool for preservationists, it does offer many distinct advantages for both access and preservation of materials.
Battin, Patricia. 1993. From preservation to access: Paradigm for the nineties. IFLA Journal 19 4 : 367-373.
Bush, Vannevar. 1945. As we may think. The Atlantic Monthly (May 1945) : 101-108. available [online]: http://www.isg.sfu.ca/~duchier/misc/vbush/vbush-all.shtml.
Chowdhury, Jayadev. 1995. Taking off into the world of the Internet. Chemical Engineering 102 (March 1995) : 30-1.
Commission on Preservation and Access, The Digital Preservation Consortium. 1994. Mission and goals. (March) : title page.
Conway, Paul. 1996. Selecting microfilm for digital preservation: A case study from Project Open Book. Library Resources and Technical Services. 40, no. 1 : 67-77.
__________. 1995. Requirements for the digital research library. College and Research Libraries 26, no. 4 (July 1995) : 331-339.
__________. 1994. Digitizing preservation. Library Journal 119, no. 2 (February 1994) : 42-45.
__________. 1991. Things to think about when purchasing your EIM system. Inform 5, no. 10 (Nov/Dec 1991) : 34.
Cornell University Library. 1994. Cornell announces "making of America" digital library. Advanced Technology Libraries 23, no. 2 (February 1994) : 1-3.
Deloughry, Thomas J. 1996. Computing officials at 34 universities seek to create a network for higher education. The Chronicle of Higher Education October 11, 1996, A29-A31.
Farr, George F. 1992. NEH's program for the preservation of brittle books. Advances in preservation and access, vol. 1, eds. Barbra Buckner Higginbotham and Mary E. Jackson. Westport, Conn.: Meckler, 49-60.
Graham, Peter S. 1995. Requirements for the digital research library. College and Research Libraries 26, no. 4 (July 1995) : 331-339.
Gwinn, Nancy E., and Paul H. Mosher. 1983. Coordinating collection development: The RLG conspectus. College and Research Libraries 44: 128-140.
Hirtle, Peter B. 1996. Internet II and archives. Posting on the Archives and Archivists list serve. November 7, 1996.
Joint Study in Digital Preservation, Cornell University and the Xerox Corp. 1996. Executive summary. (February 1996) : Hypertext document : http://palimpsest.stanford.edu/cpa/reports/joint/execsumm.html
Jones, C. Lee. 1993. Preservation film: Platform for digital access systems. Commission on Preservation and Access. Washington DC : 1-3.
Kenny, Anne R., Chapman, Stephen. 1995. Digital resolution requirements for replacing text-based material: Methods for benchmarking image quality. (April 1995) Washington DC: The Commission on Preservation and Access.
_____________, and Lynne K. Personius. 1992. Joint Study in Digital preservation Phase 1: A Report to the Commission on Preservation and Access. Washington DC : The Commission on Preservation and Access.
Lesk, Michael. 1990. Special section: Digital imagery, preservation, and access. Information Technology and Libraries (December 1990) : 300-307.
Miller, Stephen Douglas. 1996. Implications of digital imaging on access to archives. Manuscripts and Archives in the Digital Age
O'Donnell, James J. T. 1995. The new liberal arts. ARL Newsletter 183 (December 1995) : 1-4.
Research Libraries Group and The Commission on Preservation and Access, Report of the Task Force on Archiving of Digital Information. 1996. Preserving Digital Information (May 1996).
Rothenberg, Jeff. 1995. Ensuring the longevity of digital documents. Scientific American 272 (January 1995) : 42-7.
Trader, Margaret P. 1993. Preservation technologies: photocopies, microforms, and digital imaging - pros and cons. Microform Review 22, no. 3 (Summer 1995) : 127-134.
Waters, Donald J. 1996. In over our heads. Realizing Benefits from Inter-Institutional Agreements: The Implication of the Draft Report of the Task Force on Archiving of Digital Information. Washington DC : The commission on Preservation and Access (January 1996) : 1-4.
___________, and Anne Kenney. 1994. The digital preservation consortium mission and goals. Washington DC : Commission on Preservation and Access (March) hypertext document: http://palimpsest.stanford.edu/cpa/reports/dpcmiss.html.
____________. 1991. From microfilm to digital imagery. On the feasibility of a project to study the means costs and benefits of converting large quantities of preserved library materials from microfilm to digital images. (June 1991) : Washington DC : The Commission on Preservation and Access.