Chapter 1
Background

An introduction to digital libraries

This is a fascinating period in the history of libraries and publishing. For the first time, it is possible to build large-scale services where collections of information are stored in digital formats and retrieved over networks. The materials are stored on computers. A network connects the computers to personal computers on the users' desks. In a completely digital library, nothing need ever reach paper.

This book provides an overview of this new field. Partly it is about technology, but equally it is about people and organizations. Digital libraries bring together facets of many disciplines, and experts with different backgrounds and different approaches. The book describes the contributions of these various disciplines and how they interact. It discusses the people who create information and the people who use it, their needs, motives, and economic incentives. It analyzes the profound changes that are occurring in publishing and libraries. It describes research into new technology, much of it based on the Internet and the World Wide Web. The topics range from technical aspects of computers and networks, through librarianship and publishing, to economics and law. The constant theme is change, with its social, organizational, and legal implications.

One book can not cover all these topics in depth, and much has been left out or described at an introductory level. Most of the examples come from the United States, with prominence given to universities and the academic community, but the development of digital libraries is world-wide with contributions from many sources. Specialists in big American universities are not the only developers of digital libraries, though they are major contributors. There is a wealth and diversity of innovation in almost every discipline, in countries around the world.

People

An informal definition of a digital library is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network. A key part of this definition is that the information is managed. A stream of data sent to earth from a satellite is not a library. The same data, when organized systematically, becomes a digital library collection. Most people would not consider a database containing financial records of one company to be a digital library, but would accept a collection of such information from many companies as part of a library. Digital libraries contain diverse collections of information for use by many different users. Digital libraries range in size from tiny to huge. They can use any type of computing equipment and any suitable software. The unifying theme is that information is organized on computers and available over a network, with procedures to select the material in the collections, to organize it, to make it available to users, and to archive it.

In some ways, digital libraries are very different from traditional libraries, yet in others they are remarkably similar. People do not change because new technology is invented. They still create information that has to be organized, stored, and distributed. They still need to find information that others have created, and use it for study, reference, or entertainment. However, the form in which the information is expressed and the methods that are used to manage it are greatly influenced by technology and this creates change. Every year, the quantity and variety of collections available in digital form grows, while the supporting technology continues to improve steadily. Cumulatively, these changes are stimulating fundamental alterations in how people create information and how they use it.

To understand these forces requires an understanding of the people who are developing the libraries. Technology has dictated the pace at which digital libraries have been able to develop, but the manner in which the technology is used depends upon people. Two important communities are the source of much of this innovation. One group is the information professionals. They include librarians, publishers, and a wide range of information providers, such as indexing and abstracting services. The other community contains the computer science researchers and their offspring, the Internet developers. Until recently, these two communities had disappointingly little interaction; even now it is commonplace to find a computer scientist who knows nothing of the basic tools of librarianship, or a librarian whose concepts of information retrieval are years out of date. Over the past few years, however, there has been much more collaboration and understanding.

Partly this is a consequence of digital libraries becoming a recognized field for research, but an even more important factor is greater involvement from the users themselves. Low-cost equipment and simple software have made electronic information directly available to everybody. Authors no longer need the services of a publisher to distribute their works. Readers can have direct access to information without going through an intermediary. Many exciting developments come from academic or professional groups who develop digital libraries for their own needs. Medicine has a long tradition of creative developments; the pioneering legal information systems were developed by lawyers for lawyers; the web was initially developed by physicists, for their own use.

Economics

Technology influences the economic and social aspects of information, and vice versa. The technology of digital libraries is developing fast and so are the financial, organizational, and social frameworks. The various groups that are developing digital libraries bring different social conventions and different attitudes to money. Publishers and libraries have a long tradition of managing physical objects, notably books, but also maps, photographs, sound recordings and other artifacts. They evolved economic and legal frameworks that are based on buying and selling these objects. Their natural instinct is to transfer to digital libraries the concepts that have served them well for physical artifacts. Computer scientists and scientific users, such as physicists, have a different tradition. Their interest in digital information began in the days when computers were very expensive. Only a few well-funded researchers had computers on the first networks. They exchanged information informally and openly with colleagues, without payment. The networks have grown, but the tradition of open information remains.

The economic framework that is developing for digital libraries shows a mixture of these two approaches. Some digital libraries mimic traditional publishing by requiring a form of payment before users may access the collections and use the services. Other digital libraries use a different economic model. Their material is provided with open access to everybody. The costs of creating and distributing the information are borne by the producer, not the user of the information. This book describes many examples of both models and attempts to analyze the balance between them. Almost certainly, both have a long-term future, but the final balance is impossible to forecast.

Why digital libraries?

The fundamental reason for building digital libraries is a belief that they will provide better delivery of information than was possible in the past. Traditional libraries are a fundamental part of society, but they are not perfect. Can we do better?

Enthusiasts for digital libraries point out that computers and networks have already changed the ways in which people communicate with each other. In some disciplines, they argue, a professional or scholar is better served by sitting at a personal computer connected to a communications network than by making a visit to a library. Information that was previously available only to the professional is now directly available to all. From a personal computer, the user is able to consult materials that are stored on computers around the world. Conversely, all but the most diehard enthusiasts recognize that printed documents are so much part of civilization that their dominant role cannot change except gradually. While some important uses of printing may be replaced by electronic information, not everybody considers a large-scale movement to electronic information desirable, even if it is technically, economically, and legally feasible.

Here are some of the potential benefits of digital libraries.

Each of the benefits described above can be seen in existing digital libraries. There is another group of potential benefits, which have not yet been demonstrated, but hold tantalizing prospects. The hope is that digital libraries will develop from static repositories of immutable objects to provide a wide range of services that allow collaboration and exchange of ideas. The technology of digital libraries is closely related to the technology used in fields such as electronic mail and teleconferencing, which have historically had little relationship to libraries. The potential for convergence between these fields is exciting.

The cost of digital libraries

The final potential benefit of digital libraries is cost. This is a topic about which there has been a notable lack of hard data, but some of the underlying facts are clear.

Conventional libraries are expensive. They occupy expensive buildings on prime sites. Big libraries employ hundreds of people - well-educated, though poorly paid. Libraries never have enough money to acquire and process all the materials they desire. Publishing is also expensive. Converting to electronic publishing adds new expenses. In order to recover the costs of developing new products, publishers sometimes even charge more for a digital version than the printed equivalent.

Today's digital libraries are also expensive, initially more expensive. However, digital libraries are made from components that are declining rapidly in price. As the cost of the underlying technology continues to fall, digital libraries become steadily less expensive. In particular, the costs of distribution and storage of digital information declines. The reduction in cost will not be uniform. Some things are already cheaper by computer than by traditional methods. Other costs will not decline at the same rate or may even increase. Overall, however, there is a great opportunity to lower the costs of publishing and libraries.

Lower long-term costs are not necessarily good news for existing libraries and publishers. In the short term, the pressure to support traditional media alongside new digital collections is a heavy burden on budgets. Because people and organizations appreciate the benefits of online access and online publishing, they are prepared to spend an increasing amount of their money on computing, networks, and digital information. Most of this money, however, is going not to traditional libraries, but to new areas: computers and networks, web sites and webmasters.

Publishers face difficulties because the normal pricing model of selling individual items does not fit the cost structure of electronic publishing. Much of the cost of conventional publishing is in the production and distribution of individual copies of books, photographs, video tapes, or other artifacts. Digital information is different. The fixed cost of creating the information and mounting it on a computer may be substantial, but the cost of using it is almost zero. Because the marginal cost is negligible, much of the information on the networks has been made openly available, with no access restrictions. Not everything on the world's networks is freely available, but a great deal is open to everybody, undermining revenue for the publishers.

These pressures are inevitably changing the economic decisions that are made by authors, users, publishers, and libraries. Chapter 6 explores some of these financial considerations; the economics of digital information is a theme that recurs throughout the book.

Panel 1.1
Two Pioneers of Digital Libraries

The vision of the digital library is not new. This is a field in which progress is been achieved by the incremental efforts of numerous people over a long period of time. However, a few authors stand out because their writings have inspired future generations. Two of them are Vannevar Bush and J. C. R. Licklider.

As We May Think

In July 1945, Vannevar Bush, who was then director of the U. S. Office of Scientific Research and Development, published an article in The Atlantic Monthly, entitled "As We May Think". This article is an elegantly written exposition of the potential that technology offers the scientist to gather, store, find, and retrieve information. Much of his analysis rings as true today as it did fifty years ago.

Bush commented that, "our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose." He discussed recent technological advances and how they might conceivably be applied at some distant time in the future. He provided an outline of one possible technical approach, which he called Memmex. An interesting historical footnote is that the Memmex design used photography to store information. For many years, microfilm was the technology perceived as the most suitable for storing information cheaply.

Bush is often cited as the first person to articulate the new vision of a library, but that is incorrect. His article built on earlier work, much of it carried out in Germany before World War II. The importance of his article lies in its wonderful exposition of the inter-relationship between information and scientific research, and in the latent potential of technology.

The original article was presumably read only by those few people who happened to see that month's edition of the magazine. Now The Atlantic Monthly has placed a copy of the paper on its web site for the world to see. Everybody interested in libraries or scientific information should read it.

Libraries of the Future

In the 1960s, J. C. R. Licklider was one of several people at the Massachusetts Institute of Technology who studied how digital computing could transform libraries. As with Bush, Licklider's principal interest was the literature of science, but with the emergence of modern computing, he could see many of the trends that have subsequently occurred.

In his book, The Library of the Future, Licklider described the research and development needed to build a truly usable digital library. When he wrote, time-shared computing was still in the research laboratory, and computer memory cost a dollar a byte, but he made a bold attempt to predict what a digital library might be like thirty years later, in 1994. His predictions proved remarkably accurate in their overall vision, though naturally he did not foretell every change that has happened in thirty years. In general, he under-estimated how much would be achieved by brute force methods, using huge amounts of cheap computer power, and over-estimated how much progress could be made from artificial intelligence and improvements in computer methods of natural language processing.

Licklider's book is hard to find and less well-known than it should be. It is one of the few important documents about digital libraries that is not available on the Internet.

Technical developments

The first serious attempts to store library information on computers date from the late 1960s. These early attempts faced serious technical barriers, including the high cost of computers, terse user interfaces, and the lack of networks. Because storage was expensive, the first applications were in areas where financial benefits could be gained from storing comparatively small volumes of data online. An early success was the work of the Library of Congress in developing a format for Machine-Readable Cataloguing (MARC) in the late 1960s. The MARC format was used by the Online Computer Library Center (OCLC) to share catalog records among many libraries. This resulted in large savings in costs for libraries.

Early information services, such as shared cataloguing, legal information systems, and the National Library of Medicine's Medline service, used the technology that existed when they were developed. Small quantities of information were mounted on a large central computer. Users sat at a dedicated terminal, connected by a low-speed communications link, which was either a telephone line or a special purpose network. These systems required a trained user who would accept a cryptic user interface in return for faster searching than could be carried out manually and access to information that was not available locally.

Such systems were no threat to the printed document. All that could be displayed was unformatted text, usually in a fixed spaced font, without diagrams, mathematics, or the graphic quality that is essential for easy reading. When these weaknesses were added to the inherent defects of early computer screens - poor contrast and low resolution - it is hardly surprising that most people were convinced that users would never willingly read from a screen.

The past thirty years have steadily eroded these technical barriers. During the early 1990s, a series of technical developments took place that removed the last fundamental barriers to building digital libraries. Some of this technology is still rough and ready, but low-cost computing has stimulated an explosion of online information services. Four technical areas stand out as being particularly important to digital libraries.

Access to digital libraries

Traditional libraries usually require that the user be a member of an organization that maintains expensive physical collections. In the United States, universities and some other organizations have excellent libraries, but most people do not belong to such an organization. In theory, much of the Library of Congress is open to anybody over the age of eighteen, and a few cities have excellent public libraries, but in practice, most people are restricted to the small collections held by their local public library. Even scientists often have poor library facilities. Doctors in large medical centers have excellent libraries, but those in remote locations typically have nothing. One of the motives that led the Institute of Electrical and Electronics Engineers (IEEE) to its early interest in electronic publishing was the fact that most engineers do not have access to an engineering library.

Users of digital libraries need a computer attached to the Internet. In the United States, many organizations provide every member of staff with a computer. Some have done so for many years. Across the nation, there are programs to bring computers to schools and to install them in pubic libraries. For individuals who must provide their own computing, adequate access to the Internet requires less than $2,000 worth of equipment, perhaps $20 per month for a dial-up connection, and a modicum of skill. Increase the costs a little and very attractive services can be obtained, with a powerful computer and a dedicated, higher speed connection. These are small investments for a prosperous professional, but can be a barrier for others. In 1998 it was estimated that 95 percent of people in the United States live in areas where there is reasonable access to the Internet. This percentage is growing rapidly.

Outside the United States, the situation varies. In most countries of the world, library services are worse than in the United States. For example, universities in Mexico report that reliable delivery of scholarly journals is impossible, even when funds are available. Some nations are well-supplied with computers and networks, but in most places equipment costs are higher than in the United States, people are less wealthy, monopolies keep communications costs high, and the support infrastructure is lacking. Digital libraries do bring information to many people who lack traditional libraries, but the Internet is far from being conveniently accessible world-wide.

A factor that must be considered in planning digital libraries is that the quality of the technology available to users varies greatly. A favored few have the latest personal computers on their desks, high-speed connections to the Internet, and the most recent release of software; they are supported by skilled staff who can configure and tune the equipment, solve problems, and keep the software up to date. Most people, however, have to make do with less. Their equipment may be old, their software out of date, their Internet connection troublesome, and their technical support from staff who are under-trained and over-worked. One of the great challenges in developing digital libraries is to build systems that take advantage of modern technology, yet perform adequately in less perfect situations.

Basic concepts and terminology

Terminology often proves to be a barrier in discussing digital libraries. The people who build digital libraries come from many disciplines and bring the terminology of those disciplines with them. Some words have such strong social, professional, legal, or technical connotations that they obstruct discussion between people of varying backgrounds. Simple words mean different things to different people. For example, the words "copy" and "publish" have different meanings to computing professionals, publishers, and lawyers. Common English usage is not the same as professional usage, the versions of English around the world have subtle variations of meaning, and discussions of digital libraries are not restricted to the English language.

Some words cause such misunderstandings that it is tempting to ban them from any discussion of digital libraries. In addition to "copy" and "publish", the list includes "document", "object", and "work". At the very least, such words must be used carefully and their exact meaning made clear whenever they are used. This book attempts to be precise when precision is needed. For example, in certain contexts the distinction must be made between "photograph" (an image on paper), and "digitized photograph" (a set of bits in a computer). Most of the time, however, such precision is mere pedantry. Where the context is clear, the book uses terms informally. Where the majority of the practitioners in the field use a word in certain way, their usage is followed.

Collections

Digital libraries hold any information that can be encoded as sequences of bits. Sometimes these are digitized versions of conventional media, such as text, images, music, sound recordings, specifications and designs, and many, many more. As digital libraries expand, the contents are less often the digital equivalents of physical items and more often items that have no equivalent, such as data from scientific instruments, computer programs, video games, and databases.

People

A variety of words are used to describe the people who are associated with digital libraries. One group of people are the creators of information in the library. Creators include authors, composers, photographers, map makers, designers, and anybody else who creates intellectual works. Some are professionals; some are amateurs. Some work individually, others in teams. They have many different reasons for creating information.

Another group are the users of the digital library. Depending on the context, users may be described by different terms. In libraries, they are often called "readers" or "patrons"; at other times they may be called the "audience", or the "customers". A characteristic of digital libraries is that creators and users are sometimes the same people. In academia, scholars and researchers use libraries as resources for their research, and publish their findings in forms that become part of digital library collections.

The final group of people is a broad one that includes everybody whose role is to support the creators and the users. They can be called information managers. The group includes computer specialists, librarians, publishers, editors, and many others. The World Wide Web has created a new profession of webmaster. Frequently a publisher will represent a creator, or a library will act on behalf of users, but publishers should not be confused with creators, or librarians with users. A single individual may be creator, user, and information manager.

Computers and networks

Digital libraries consists of many computers united by a communications network. The dominant network is the Internet, which is discussed Chapter 2. The emergence of the Internet as a flexible, low-cost, world-wide network has been one of the key factors that has led to the growth of digital libraries.


Figure 1.1. Computers in digital libraries

Figure 1.1 shows some of the computers that are used in digital libraries. The computers have three main function: to help users interact with the library, to store collections of materials, and to provide services.

The generic term server is used to describe any computer other than the user's personal computer. A single server may provide several of the functions listed above, perhaps acting as a repository, search system, and location system. Conversely, individual functions can be distributed across many servers. For example, the domain name system, which is a locator system for computers on the Internet, is a single, integrated service that runs on thousands of separate servers.

In computing terminology, a distributed system is a group of computers that work as a team to provide services to users. Digital libraries are some of the most complex and ambitious distributed systems ever built. The personal computers that users have on their desks have to exchange messages with the server computers; these computers are of every known type, managed by thousands of different organizations, running software that ranges from state-of-the art to antiquated. The term interoperability refers to the task of building coherent services for users, when the individual components are technically different and managed by different organizations. Some people argue that all technical problems in digital libraries are aspects of this one problem, interoperability. This is probably an overstatement, but it is certainly true that interoperability is a fundamental challenge in all aspects of digital libraries.

The challenge of change

If digital technology is so splendid, what is stopping every library immediately becoming entirely digital? Part of the answer is that the technology of digital libraries is still immature, but the challenge is much more than technology. An equal challenge is the ability of individuals and organizations to devise ways that use technology effectively, to absorb the inevitable changes, and to create the required social frameworks. The world of information is like a huge machine with many participants each contributing their experience, expertise, and resources. To make fundamental changes in the system requires inter-related shifts in the economic, social and legal relationships amongst these parties. These topics are studied in Chapters 5 and 6, but the underlying theme of social change runs throughout the book.

Digital libraries depend on people and can not be introduced faster than people and organizations can adapt. This applies equally to the creators, users, and the professionals who support them. The relationships amongst these groups are changing. With digital libraries, readers are more likely to go directly to information, without visiting a library building or having any contact with a professional intermediary. Authors carry out more of the preparation of a manuscript. Professionals need new skills and new training to support these new relationships. Some of these skills are absorbed through experience, while others can be taught. Since librarians have a career path based around schools of librarianship, these schools are adapting their curriculum, but it will be many years before the changes work through the system. The traditions of hundreds of years go deep.

The general wisdom is that, except in a few specialized areas, digital libraries and conventional collections are going to coexist for the foreseeable future. Institutional libraries will maintain large collections of traditional materials in parallel with their digital services, while publishers will continue to have large markets for their existing products. This does not imply that the organizations need not change, as new services extend the old. The full deployment of digital libraries will require extensive reallocation of money, with funds moving from the areas where savings are made to the areas that incur increased cost. Within an institution, such reallocations are painful to achieve, though they will eventually take place, but some of the changes are on a larger scale.

When a new and old technology compete, the new technology is never an exact match. Typically, the new has some features that are not in the old, but lacks some basic characteristics of the old. Therefore the old and new usually exist along side. However, the spectacular and continuing decline in the cost of computing with the corresponding increase in capabilities sometimes leads to complete substitution. Word processors were such an improvement that they supplanted typewriters in barely ten years. Card catalogs in libraries are on the same track. In 1980, only a handful of libraries could afford an online catalog. Twenty years later, a card catalog is becoming a historic curiosity in American libraries. In some specialized areas, digital libraries may completely replace conventional library materials.

Since established organizations have difficulties changing rapidly, many exciting developments in digital libraries have been introduced by new organizations. New organizations can begin afresh, but older organizations are faced with the problems of maintaining old services while introducing the new. The likely effect of digital libraries will be a massive transfer of money from traditional suppliers of information to new information entrepreneurs and to the computing industry. Naturally, existing organizations will try hard to discourage any change in which their importance diminishes, but the economic relationships between the various parties are already changing. Some important organizations will undoubtedly shrink in size or even go out of business. Predicting these changes is made particularly difficult by uncertainties about the finances of digital libraries and electronic publishing, and by the need for the legal system to adapt. Eventually, the pressures of the marketplace will establish a new order. At some stage, the market will have settled down sufficiently for the legal rules to be clarified. Until then, economic and legal uncertainties are annoying, though they have not proved to be serious barriers to progress.

Overall, there appear to be no barriers to digital libraries and electronic publishing. Technical, economic, social, and legal challenges abound, but they are being overcome steadily. We can not be sure exactly what form digital libraries will take, but it is clear that they are here to stay.



Last revision of content: January 1999
Formatted for the Web: December 2002
(c) Copyright The MIT Press 2000