Digital Libraries: Chapter 6 (1999)

Chapter 6

Economic and legal issues

Introduction

Digital libraries poses challenges in the fields of economics, public policy, and the law. Publishing and libraries exist in a social and economic context where the operating rules and conventions have evolved over many years. As electronic publication and digital libraries expand, the business practices and legal framework are changing quickly.

Because digital libraries are based on technology, some people hope to solve all challenges with technical solutions. Other people believe that everything can be achieved by passing new laws. Both approaches are flawed. Technology can contribute to the solutions, but it can not resolve economic or social issues. Changes in laws may be helpful, but bad laws are worse than no laws. Laws are effective only when they codify a framework that people understand and are willing to accept. In the same manner, business models for electronic information will fail unless they appeal to the interests of all interested parties. The underlying challenge is to establish social customs for using information that are widely understood and generally followed. If they allow reasonable people to carry out their work, then reasonable people will observe them. The economic and legal frameworks will follow.

Economic forces

Libraries and publishing are big businesses. Huge industries create information or entertainment for financial gain. They include feature films, newspapers, commercial photographs, novels and text books, computer software, and musical recordings. Some estimates suggest that these industries comprise five percent of the economy of the United States. In 1997, Harvard University Libraries had a budget well over fifty million dollars and employed a thousand people. The Library of Congress employs about 4,500 people. Companies, such as Time Warner are major forces on the world's stock markets. In 1996, the Thomson Corporation paid more than three billion dollars for West Publishing, the legal publisher. The stock market valuation of Yahoo, the Internet search firm, is even higher though its profits are tiny.

The publishers' concerns about the business and legal framework for online information derive from the two usual economic forces of greed and fear. The greed comes from a belief that publishers can make huge sums of money from electronic information, if only they knew how. The fear is that the changing economic picture will destroy traditional sources of revenue, organizations will wither away, and people will lose their jobs. To computing professionals, this fear is completely reasonable. Most of the companies that succeeded in one generation of computing have failed in the next. Mainframe companies such as Univac, CDC, Burroughs, and Honeywell were supplanted by minicomputers. Minicomputer companies, such as Prime, Wang, and Data General did not survive the transition to personal computers. Early personal computer companies have died, as have most of the software pioneers. Even IBM has lost its dominance. Will the same pattern happen with electronic information? The publishers and information services who dominated traditional markets are not necessarily those who will lead in the new ones.

Whichever organizations thrive, large or small, commercial or not-for-profit, they have to cover their costs. Every stage in the creation, distribution, and use of digital libraries is expensive. Authors, photographers, composers, designers, and editors need incentives for their efforts in creating information. In many circumstances, but not all, the incentives are financial. Publishers, librarians, archivists, booksellers, subscription agents, and computing specialists - all these people require payment. As yet, there is no consensus on how best to pay for information on the Internet. Almost every conceivable method is being tried. For discussion of the economics of digital libraries, the various approaches can be divided into open access, in which the funds come from the creator or producer of the information, and models in which the user or the user's library pays for access to the collections.

Chapter 5 noted that the various people who are involved in digital libraries and electronic publishing have many motives. In particular, it noted that creators and publishers often have different financial objectives. When the creators are principally motivated by financial reward, their interests are quite well aligned with those of the publisher. The aim is to generate revenue. The only question is how much should go to the creator and how much to the publisher. If, however, the creator's objective is non-financial while the publisher is concentrating on covering costs or generating profits for shareholders, then they may have conflicting objectives.

Open access digital libraries collections

A remarkable aspect of the web is the huge amounts of excellent material that are openly available, with no requirement for payment by the user. First predictions were that open access would restrict the web to inferior quality material, but a remarkable amount of high-quality information is to be found on the networks, paid for and maintained by the producers. In retrospect this is not surprising. Many creators and suppliers of information are keen that their materials should be seen and are prepared to meet the costs from their own budgets. Creators who invest their own resources to make their materials openly available include researchers seeking for professional recognition, government agencies informing the public, all types of marketing, hobbyists, and other recreational groups.

Whenever creators are principally motivated by the wish to have their work widely used, they will prefer open access, but the money to maintain these collections must come from somewhere. Grants are one important source of funds. The Perseus project has relied heavily on grants from foundations. At the Library of Congress, a combination of grants and internal funds pay for American Memory and the National Digital Library Program. Grants are usually short-term, but they can be renewed. The Los Alamos E-Print Archives receive an annual grant from the National Science Foundation, and Netlib has received funding from DARPA since its inception. In essence, grant funding for these digital libraries has become institutionalized.

The web search firms, such as Yahoo, Infoseek, and Lycos, provide open access paid for by advertising. They have rediscovered the financial model that is used by broadcast television in the United States. The television networks pay to create television programs and broadcast them openly. The viewer, sitting at home in front of a television, does not pay directly. The revenues come from advertisers. A crucial point in the evolution of digital libraries occurred when these web search programs first became available. Some of the services attempted to charge a monthly fee, but the creator of Lycos was determined to offer open access to everybody. He set out to find alternative sources of revenue. Since Lycos was available with no charge, competing services could not charge access fees for comparable products. The web search services remain open to everybody. After a rocky few years, the companies are now profitable, using advertising and revenue from licensing to support open access to their services.

Research teams have a particular interest in having their work widely read. CNRI is a typical research organization that uses its own resources to maintain a high-quality web site. Most of the research is first reported on the web. In addition, the corporation has a grant to publish D-Lib Magazine. The Internet Draft series is also maintained at CNRI; it is paid for by a different method. The conference fees for meetings of the Internet Engineering Task Force pay the salaries of the people who manage the publications. All this information is open access.

Government departments are another important source of open access collections. They provide much information that is short-lived, such as the hurricane tracking service provided by the United States, but many of their collections are of long-term value. For example, the Trade Compliance Center of the U.S. Department of Commerce maintains a library collection of international treaties. The U.S. Department of State provides a grant to librarians at the University of Illinois at Chicago, who operates the web site "www.state.gov" for the department.

Private individuals maintain many open access library collections. Some excellent examples include collections devoted to sports and hobbies, fan clubs, and privately published poetry, with an increasingly large number of online novels. Payment by the producer is not new. In book publishing it is given the disparaging name of "vanity press", but on the Internet it is often the source of fine collections.

Payment for access to digital library collections

When the publisher of a digital library collection wishes to collect revenue from the user, access to the collections is almost always restricted. Users have access to the materials only after payment has been received. The techniques used to manage such access are a theme of the next chapter. This section looks at the business practices.

Book and journal publishing have traditionally relied on payment by the user. When a copy of a book is sold to a library or to an individual, the proceeds are divided amongst the bookseller, the publisher, the author, and the other contributors. Feature films follow the same model. The costs of creating and distributing the film are recovered from users through sales at cinemas and video rentals. Extending this model to online information, leads to fees based on usage. Most users of the legal information services, Lexis and Westlaw, pay a rate that is based on the number of hours that they use the services. Alternative methods of charging for online information, which are sometimes tried, set a fee that is based on peak usage, such as the number of computers that could connect to the information, or the maximum number of simultaneous users. With the Internet and web protocols, these charging methods that are based on computer or network usage are all rather artificial.

An alternative is to charge for the content transmitted to the user. Several publishers provide access to the text of an article if the user pays a fee, perhaps $10. This could be charged to a credit card, but credit card transactions are awkward for the user and expensive to process. Therefore, there is research interest in automatic payment systems for information delivered over the networks. The aim of these systems is to build an Internet billing service with secure, low-cost transactions. The hope is that this would allow small organizations to set up network service, without the complexity of developing private billing services. If such systems became established, they would support high volumes of very small transactions. Whereas physical items, such as books, come in fixed units, a market might be established for small units of electronic information.

At present, the concept of automatic payment systems is mainly conjecture. The dominant form of payment for digital library materials is by subscription, consisting of scheduled payments for access to a set of materials. Unlimited use is allowed so long as reasonable conditions are observed. The Wall Street Journal has developed a good business selling individual subscriptions to its online editions. Many large scientific publishers now offer electronic journal subscriptions to libraries; society publishers, such as the Association for Computing Machinery (ACM), sell subscriptions both to libraries and to individual members. Some of the digital libraries described in earlier chapters began with grants and have steadily moved towards self-sufficiency through subscriptions. JSTOR and the Inter-university Consortium for Political and Social Research (ICPSR) have followed this path.

Television again provides an interesting parallel. In the United States, two alternatives to advertising revenue have been tried by the television industry. The first, pay-by-view, requires viewers to make a separate payment for each program that they watch. This has developed a niche market, but not become widespread. The second, which is used by the cable companies, is to ask viewers to pay a monthly subscription for a package of programs. This second business model has been extremely successful.

It appears that the users of digital libraries, like television viewers, welcome regular, predictable charges. Payment by subscription has advantages for both the publisher and the user. Libraries and other subscribers know the costs in advance and are able to budget accurately. Publishers know the revenue to expect. For the publisher, subscriptions overcome one of the problems of use-based pricing, that popular items make great profits while specialist items with limited demand lose money.

Libraries are also attracted by the subscription form of payment because it encourages wide-spread use. Libraries wish to see their collections used as heavily as possible, with the minimum of obstacles. Digital libraries have removed many of the barriers inherent in a traditional library. It would be sad to introduce new barriers, through use-based pricing.

Economic considerations

An economic factor that differentiates electronic information from traditional forms of publishing is that the costs are essentially the same whether or not anybody uses the materials. Many of the tasks in creating a publication - including soliciting, reviewing, editing, designing and formatting material - are much the same whether a print product is being created or an electronic one. With digital information, however, once the first copy has been mounted online, the distribution costs are tiny. In economic terms, the cost is almost entirely fixed cost. The marginal cost is near zero. As a result, once sales of a digital product reach the level that covers the costs of creation, all additional sales are pure profit. Unless this level is reached, the product is condemned to make a loss.

With physical materials, the standard method of payment is to charge for each physical copy of the item, which may be a book, a music CD, a photograph, or similar artifact. The production of each of these copies costs money and the customer feels that in some way this is reflected in the price. With digital information, the customer receives no artifact, which is one reason why use-based pricing is unappealing. Subscriptions match a fixed cost for using the materials against the fixed cost of creating them.

A further way in which the economics of electronic publications differ from traditional publishing is that the pricing needs to recognize the costs of change. Electronic publishing and digital libraries will be cheaper than print in the long term, but today they are expensive. Organizations who wish to enter this field have a dilemma. They have to continue with their traditional operations, while investing in the new technology at a time when the new systems are expensive and in many cases the volume of use is still comparatively low.

Many electronic publications are versions of materials that is also published in print. When publishers put a price on the electronic publication they want to know if sales of electronic information will decrease sales of corresponding print products. If a dictionary is online, will sales in book stores decline (or go up)? If a journal is online, will individual subscriptions change? After a decade of experience with materials online, firm evidence of such substitution is beginning to emerge, but is still hard to find for any specific publication. At the macroeconomic level, the impact is clear. Electronic information is becoming a significant item in library acquisition budgets. Electronic and print products often compete directly for a single pool of money. If one publisher generates extra revenue from electronic information, it is probably at the expense of another's print products. This overall transfer of money from print to electronic products is one of the driving force that is pressing every publisher towards electronic publication.

Some dramatic examples come from secondary information services. During the past decade, products such as Chemical Abstracts and Current Contents have changed their content little, but whereas they were predominantly printed products, now the various digital versions predominate. The companies have cleverly offered users and their libraries many choices: CD-ROMs, magnetic tape distributions, online services, and the original printed volumes. Any publisher that stays out of the electronic market must anticipate steadily declining revenues as the electronic market diverts funds from the purchase of paper products.

Fortunately, the librarian and the publisher do not have to pay for one of the most expensive part of digital libraries. Electronic libraries are being built around general purpose networks of personal computers that are being installed by organizations everywhere. Long distance communications use the international Internet. These investments, which are the foundation that make digital libraries possible, are being made from other budgets.

Alternative sources of revenue

Use-based pricing and subscriptions are not the only ways to recover revenue from users. Cable television redistributes programs created by the network television companies, initially against strenuous opposition from the broadcasters. Political lobbying, notably by Ted Turner, led to the present situation whereby the cable companies can redistribute any program but must pay a percentage of their revenues as royalty. When a radio program broadcasts recorded music, the process by which revenue accrues to the composers, performers, recording studios, and other contributors is based on a complex system of sampling. The British football league gains revenue from gambling by exercising copyright on its fixture list.

There is nothing inevitable about these various approaches. They are pragmatic resolutions of the complex question of how the people who create and distribute various kinds of information can be compensated by the people who benefit from them.

A case study: scientific journals in electronic format

Scientific journals in electronic format provide an interesting case study, since they are one of the pioneering areas where electronic publications are recovering revenue from libraries and users. They highlight the tension between commercial publishers, whose objective lies in maximizing profits, and authors whose interests are in widespread distribution of their work. Academic publishers and university libraries are natural partners, but the movement to electronic publication has aggravated some long standing friction. As described in Panel 6.1, the libraries can make a strong case that the publishers charge too much for their journals, though, in many ways, the universities have brought the problem on themselves. Many researchers and librarians hope that digital libraries will lead to new ways of scientific publication, which will provide wider access to research at lower costs to libraries. Meanwhile, the commercial publishers are under pressure from their shareholders to make higher profits ever year.

Panel 6.1
The economics of scientific journals

The highly profitable business of publishing scientific journals has come under increasing scrutiny in recent years. In the United States, the federal government uses money received from taxes to fund research in universities, government laboratories, medical centers, and other research organizations. The conventional way to report the results of such research is for the researcher to write a paper and submit it to the publisher of a scientific journal.

The first stage in the publication process is that an editor, who may be a volunteer or work for the publisher, sends out the paper for review by other scientists. The reviewers are unpaid volunteers, who read the paper critically for quality, checking for mistakes, and recommending whether the paper is of high enough quality to publish. This process is known as peer review. The editor selects the papers to publish and gives the author an opportunity to make changes based on the reviewers' comments.

Before publication, most publishers place some requirements on the authors. Usually they demand that copyright in the paper is transferred from the author to the publisher. In addition, many publishers prohibit the author from releasing the results of the research publicly before the journal article is published. As a result, without making any payment, the publisher acquires a monopoly position in the work.

Although a few journals, typically those published by societies, have individual subscribers, the main market for published journals is academic libraries. They are required to pay whatever price the publisher chooses to place on the journal. Many of these prices are high; more than a thousand dollars per year is common. Moreover, the annual increase in subscriptions over the past decade has averaged ten to fifteen percent. Yet the libraries have felt compelled to subscribe to the journals because their faculty and students need access to the research that is reported in them.

The economic system surrounding scientific journals is strange. The taxpayer pays the researcher, most of the reviewers, and many of the costs of the libraries, but the copyright is given away to the publisher. However, the universities must take much of the blame for allowing this situation. Since their faculty carry out the research and their libraries buy the journals, simple policy changes could save millions of dollar, while still providing reasonable compensation for publishers. Recently universities have begun to work together to remedy the situation.

The underlying reason for the reluctance of universities to act comes from a peculiarity of the academic system. The function of scientific articles is not only to communicate research; they also enhance the professional standing of the authors. Academic reputations are made by publication of journal articles and academic monographs. Professional recognition, which is built on publication, translates into appointments, promotion, and research grants. The most prominent hurdle in an academic career is the award of tenure, which is based primarily on peer-reviewed publications. Some people also write text books, which are occasionally very lucrative, but all successful academics write research papers (in the sciences) or monographs (in the humanities). The standing joke is, "Our dean can't read, but he sure can count." The dean counts papers in peer-reviewed journals. Publishers are delighted to make profits by providing things for the dean to count. All faculty are expected to publish several papers every year, whether or not they have important research to report. Because prestige comes from writing many papers, they have an incentive to write several papers that report slightly different aspects of a single piece of research. Studies have shown that most scientific papers are never cited by any other researchers. Many papers are completely unnecessary.

There are conflicting signs whether this situation is changing. Digital libraries provide researchers with alternative ways to tell the world about their research. The process of peer review has considerable value in weeding out bad papers and identifying obvious errors, but it is time consuming; the traditional process of review, editing, printing, and distribution often takes more than a year. An article in a journal that is stored on the shelves of a library is available only to those people who have access to the library and are prepared to make the effort to retrieve the article; a research report on the Internet is available to everybody. In some disciplines, an unstable pattern is emerging of communicating research by rapid, open access publishing of online reports or pre-prints, with traditional journals being used as an archive and for career purposes.

Subscriptions to online journals

The publishers of research journals in science, technology, and medicine include commercial companies, such as Elsevier, John Wiley, and Springer-Verlag, and learned societies, such as the American Association for the Advancement of Science, the publisher of Science. These publishers have been energetic in moving into electronic publication and are some of the first organizations to face the economic challenges. Initially, their general approach has been to retain the standard journal format and to publish electronic versions in parallel to the print.

Since 1996, many scientific publishers have provided electronic versions of their printed journals over the Internet. The online versions are similar to but not always identical to the printed versions. They may leave out some material, such as letters to the editor, or add supplementary data that was too long to include in print. To get started, the publishers have chosen variants on a familiar economic model, selling annual subscriptions to libraries, or library consortia. Publishers that have followed this approach include Academic Press, the American Chemical Society, the Association for Computing Machinery, Elsevier, the American Physical Society, Johns Hopkins University Press, Springer-Verlag, John Wiley, and others. HighWire Press offers a similar service for smaller publishers who publish high-quality journals but do not wish to invest in expensive computer systems. A common model is for the publisher to provide open access to a searchable index and abstracts, but to require payment for access to the full articles. The payment can be by subscription or by a fee per article.

Subscriptions that provide access to an online collection require an agreement between the publisher and the subscriber. If the subscriber is a library, the agreement is usually written as a contract. Although, the details of these agreements vary, a number of topics occur in every agreement. Some of the issues are listed below.

Material covered. When a library subscribes to a print journal for a year, it receives copies of the journal issues for that year, to be stored and used by the library for ever. When a library subscribes to an electronic journal for a year, it typically receives a license to access the publisher's online collection, containing the current year's journal and those earlier issues that have been converted to digital form. The library is relieved of the burden of storing back issues, but it loses access to everything if it does not renew a subscription the following year, or if the publisher goes out of business.
The user community. If the subscription is to a library, the users of that library must be delineated. Some libraries have a well-defined community of users. With corporate libraries and residential universities with full-time students, it is reasonably easy to identify who is authorized to use the materials, but many universities and community colleges have large populations of students who take part-time courses, and staff who are affiliated through hospitals or other local organizations. Public libraries, by definition are open to the public. Fortunately, most libraries have traditionally had a procedure for issuing library cards. One simple approach is for the subscription agreement to covers everybody who is eligible for a library card or physically in the library buildings.
Price for different sized organizations. One problem with subscription-based pricing is how to set the price. Should a university with 5,000 students pay the same subscription as a big state system with a population of 100,000, or a small college with 500? What should be the price for a small research laboratory within a large corporation? Should a liberal arts university pay the same for its occasional use of a medical journal as a medical school where it is a core publication? There is no simple answer to these questions.
Pricing relative to print subscriptions. When material is available in both print and online versions, how should the prices compare? In the long term, electronic publications are cheaper to produce, because of the savings in printing, paper, and distribution. In the short term, electronic publications represent a considerable investment in new systems. Initially, a few publishers attempted to charge higher costs for the electronic versions. Now a consensus is emerging for slightly lower prices. Publishers are experimenting with a variety of pricing options that encourage libraries to subscribe to large groups of journals.
Use of the online journals. One of the more contentious issues is what use can subscribers make of online journals. A basic subscription clearly should allow readers to view journal articles on a computer screen and it would be poor service for a publisher not to expect readers to print individual copies for their private use. Conversely, it would be unreasonable for a subscriber to an online journal to make copies and sell them on the open market. Reasonable agreement lies somewhere between these two extremes, but consensus has not yet been reached.

The model of institutional subscriptions has moved the delivery of scientific journals from print to the Internet without addressing any of the underlying tensions between publishers and their customers. It will be interesting to see how well it stands the test of time.

Scientific journals and their authors

The difference in goals between authors and publishers is manifest in the restrictions that publishers place on authors. Many publishers make demands on the authors that restrict the distribution of research. Publishers, quite naturally, will not publish papers that have appeared in other journals, but many refuse to publish research results that have been announced anywhere else, such as at a conference or on a web site. Others permit pre-publication, such as placing a version of the paper on a server, such as the Los Alamos archives, but require the open access version to be removed once the edited journal article is published. This sort of assertiveness antagonizes authors, while there is no evidence that it has any effect on revenues. The antagonism is increased by the publishers insisting that authors transfer copyright to them, leaving the authors few rights in the works that they created, and the taxpayer whose money fuels the process with nothing.

At the Association for Computing Machinery (ACM), we attempted to find a balance between the authors' interest in widespread distribution and the need for revenue. The resulting policy is clearly seen as interim, but hopefully the balance will be acceptable for the next few years. The current version of the policy retains the traditional copyright transfer from author to publisher and affirms that ACM can use material in any format or way that it wishes. At the same time, however, it allows authors great flexibility. In particular, the policy encourages authors to continue to mount their materials on private servers, both before and after publication. In fast moving fields, such as those covered by ACM journals, preprints have always been important and recently these preprints have matured into online collections of technical reports and preprints, freely available over the network. These are maintained privately or by research departments. Since they are important to ACM members, ACM does not want to remove them, yet the association does not want private online collections to destroy the journal market.

The legal framework

This is not a legal text book (though two lawyers have checked this section), but some legal issues are so important that this book would be incomplete without discussing them. Digital libraries reach across almost many areas of human activity. It is unsurprising that many aspects of the law are relevant to digital libraries, since the legal system provides a framework that permits the orderly development of online services. Relevant areas of law include contracts, copyright and other intellectual property, defamation, obscenity, communications law, privacy, tax, and international law.

The legal situation in the United States is complicated by the number of jurisdictions - the Constitution, international treaties, federal and state statutes, and the precedents set by courts at each level. Many laws, such as those controlling obscenity, are at the state level. People who mounted sexual material on a server in California, where it is legal, were prosecuted in Louisiana, where it was deemed to be obscene. Some legal areas, such as copyright, are covered by federal statutes, but important topics have never been interpreted by the Supreme Court. When the only legal rulings have been made by lower courts, other courts are not bound by precedent and may interpret the law differently.

The two communities that are building digital libraries on the Internet - the computer scientists and the information professionals - both have a tradition of responsible use of shared resources. The traditions are different and merging these traditions poses some problems, but a greater difficulty is the influx of people with no traditions to build on. Inevitably, a few of these people are malicious or deliberately exploit the networks for personal gain. Others are thoughtless and unaware of what constitutes reasonable behavior.

Until the late 1980s, the Internet was an academic and research network. There was a policy on who could use it and what was appropriate use, but more importantly the users policed themselves. Somebody who violated the norms was immediately barraged with complaints and other, less subtle forms of peer pressure. The social conventions were somewhat unconventional, but they worked as long as most people were members of a small community and learned the conventions from their colleagues. The conventions began to fail when the networks grew larger. At Carnegie Mellon University, we noticed a change when students who had learned their networked computing outside universities became undergraduates.

Many of the legal issues are general Internet questions and not specific to digital libraries. Currently, the Internet community is working on technical methods to control junk electronic mail, which is a burden on the network suppliers and an annoyance to users. Pornography and gambling are two areas which pit commercial interests against diverse social norms, and advocates of civil liberties against religious groups.

International questions

The Internet is worldwide. Any digital library is potentially accessible from anywhere in the world. Behavior that is considered perfectly normal in one country is often illegal in another. For example, the United States permits the possession of guns, but limits the use of encryption software. Most countries in Europe have the opposite rules.

Attitudes to free speech vary greatly around the world. Every country has laws that limit freedom of expression and access to information; they cover slander, libel, obscenity, privacy, hate, racism, or government secrets. The United States believes fervently that access to information is a fundamental democratic principle. It is enshrined in the First Amendment to the Constitution and the courts have consistently interpreted the concept of free speech broadly, but there are limits, even in the United States. Germany has strong laws on Nazism; Arabic countries are strict about blasphemy. Every jurisdiction expects to be able to control such activities, yet the Internet is hard to control. For years, there was a server in Finland that would act as a relay and post messages on the Internet anonymously. After great pressure from outside, the courts forced this server to disclose the names of people who posted particularly disreputable materials anonymously.

Internet commerce, including electronic information, is a broad area where the international nature of the Internet creates difficulty. In the United States, consumers and suppliers are already protected by laws that cover inter-state commerce, such as financial payments and sales by mail order across state boundaries. The situation is much more complex over a world-wide network, where the trade is in digital materials which can easily be duplicated or modified. On the Internet, the parties to a transaction do not even need to declare from what country they are operating.

Liability

The responsible for the content of library materials is a social and legal issue of particular importance to digital libraries. Society expects the creators of works to be responsible for their content, and that those who make decisions about content should behave responsibility. However, digital libraries will not thrive if legal liability for content is placed upon parties whose only function is to store and transmit information.

Because of the high esteem in which libraries are held, in most democratic countries they have a privileged legal position. It is almost impossible to hold a library liable for libelous statements or subversive opinions expressed in the books that it holds. In a similar way, telecommunications law protects common carriers, so that a telephone company does not have to monitor the conversations that take place on its lines. In fact, the telephone companies are prohibited from deliberately listening to such conversations.

Traditionally, the responsibility for the content has fallen on the creators and publishers, who are aware of the content, rather than on the libraries. This allows libraries to collect material from all cultures and all periods without having to scrutinize every item individually for possible invasions of privacy, libel, copyright violations, and so on. Most people would accept that this is good policy which should be extended to digital libraries. Organizations have a responsibility to account for the information resources they create and distribute, but it is unreasonable for libraries or Internet service companies to monitor everything that they transmit.

Liability of service providers is one of the central topics of the Digital Millennium Copyright Act of 1998, described in Panel 6.2. This act removes the liability for copyright violations from online service providers, including libraries and educational establishments. As usual with emerging law, this was not a simple process that does not end with legislation. Digital libraries naturally welcome the freedom of action, but it can be argued that it goes too far in the protection that it gives to service providers. It will interesting to see how the courts interpret the act.

Panel 6.2
The Digital Millennium Copyright Act

In 1998, the United States Congress passed an act that made significant changes in copyright law. Apart from a section dealing with the design of boat hulls, almost all the act is about digital works on networks. On the surface, the act appears a reasonable balance between commercial interests that wish to sell digital works, and the openness of information that is central to libraries and education. Some of the more important provision are listed below.

Online service providers, libraries, and educational establishments

Under the act, online service providers in the United States, including libraries and educational establishments, are largely protected from legal claims of copyright infringement that take place without their knowledge. To qualify for this exception, specific rules must be followed: the organization must provide users with information about copyright, have a policy for termination of repeat offenders, comply with requirements to take-down infringing materials, support industry-standard technical measures, and advise the Copyright Office of an agent to receive statutory notices under the act.

This section explicitly permits many of the activities that are fundamental to the operation of digital libraries. It allows services for users to store materials such as web sites, to follow hyperlinks, and to use search engines. It recognizes that service providers make copies of materials for technical reasons, notably system caching and to transmit or route material to other sites. These activities are permitted by the act, and service providers are not liable for violations by their users if they follow the rules and correct problems when notified.

For universities and other institutes of higher education, the act makes an important exception to the rule that organizations are responsible for the actions of their employees. It recognizes that administrators do not oversee the actions of faculty and graduate students, and are not necessarily liable for their acts.

The act prohibits the circumvention of technical methods used by copyright owners to restrict access to works. It also prohibits the manufacture or distribution of methods to defeat such technology. However, the act recognizes several exceptions, all of which are complex and need careful interpretation: software developers can reverse engineer protection systems to permit interoperability, researchers can study encryption and system security, law enforcement agencies can be authorized to circumvent security technology, and libraries can examine materials to decide whether to acquire them. Finally, users are permitted to identify and disable techniques that collect private information about users and usage.

The act provides rules on tampering with copyright management information about a work, such as the title, author, performer, and the copyright owner. This information must not be intentional altered or removed.

Some of the most heated legal discussions concern the interaction between economic issues and copyright law. Such arguments seem to emerge every time that a new technology is developed. In the early days of printing there was no copyright. Shakespeare's plays were freely pirated. In the nineteenth century, the United States had copyright protection for American authors but none for foreigners; the books of European authors were not protected and were shamelessly copied, despite the helpless pleas of authors, such as Dickens and Trollope.

In United States law, copyright applies to almost all literary works, including textual materials, photographs, computer programs, musical scores, videos and audio tapes. A major exception is materials created by government employees. Initially, the creator of a work or the employer of the creator owns the copyright. In general, this is considered to be intellectual property that can be bought and sold like any other property. Other countries have different approaches; in some countries, notably France, the creator has personal rights (known as "moral rights") which can not be transferred. Historically, copyright has had a finite life, expiring a certain number of years after the creator's death, but Congress has regularly extended that period, most recently when copyright on Mickey Mouse was about to expire - a sad example of the public good being secondary to the financial interests of a few corporations.

The owner of the copyright has an exclusive right to make copies, to prepare derivative works, and to distribute the copies by selling them or in other ways. This is important to authors, helping them to ensure that their works do not get corrupted, either accidentally or maliciously. It also allows publishers to develop products without fear that their market will be destroyed by copies from other sources.

Copyright law is not absolute, however. Although the rights holder has considerable control over how material may be used, the control has boundaries. Two important concepts in United States law are the first sale doctrine and fair use. First sale applies to a physical object, such as a book. The copyright owner can control the sale of a new book, and set the price, but once a customer buys a copy of the book, the customer has full ownership of that copy and can sell the copy or dispose of it in any way without needing permission.

Fair use is a legal right in the United States law that allows certain uses of copyright information without permission of the copyright owner. Under fair use, reviewers or scholars have the right to quote short passages, and photocopies can be made of an article of part of a book for private study. The boundaries of fair use are deliberately vague, but there are four basic factors that are considered:

the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
the nature of the copyrighted work;
the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
the effect of the use upon the potential market for or value of the copyrighted work.

Because these factors are imprecise, judges have discretion in how they are interpreted. In general, fair use allows reproduction of parts of work rather than the whole, single copies rather than many, and private use rather than commercial use. The exact distinctions can be clarified only by legal precedence. Although there have been a few, well-publicized court cases, such cases are so expensive that, even for traditional print materials, many important issues have never been tested in court or only in the lower courts.

The first sale doctrine and the concept of fair use do not transfer easily to digital libraries. While the first sale doctrine can be applied to physical media that store electronic materials, such as CD-ROMs, there is no parallel for information that is delivered over networks. The guidelines for fair use are equally hard to translate from physical media to the online world.

This uncertainty was one of the reasons that led to a series of attempts to rewrite copyright law, both in the United States and internationally. While most people accepted that copyright law should provide a balance between the public's right of access to information and economic incentives to creators and publishers of information, there was no consensus what that balance should be, with extremists on both sides. This led to an unsavory situation of vested interests attempting to push through one-sided legislation, motivated by the usual forces of fear and greed. For several years, legislation was introduced in the United States Congress to change or clarify the copyright law relating to information on networks, usually to favor some group of interested parties. At the worst, some commercial organizations lobbied for draconian rights to control information and with criminal sanctions on every activity that is not explicitly authorized. In response, public interest groups argued that there is no evidence that fair use hurts any profits and that unnecessary restrictions on information flow are harmful.

Until 1998, the results were a stalemate, which was probably good. Existing legislation was adequate to permit the first phase of electronic publishing and digital libraries. The fundamental difficulty was to understand the underlying issues. Legal clarification was needed eventually, but it was better to observe the patterns that emerge rather than to rush into premature legislation. The 1998 legislation described in Panel 6.2, above, is probably good enough to allow digital libraries to thrive.

Panel 6.2
Events in the history of copyright

Outside the U.S. Copyright Office there is a sequence of display panels that summarize some major legal decisions about copyright law that have been decided by U.S. federal courts, including the Supreme Court. They illustrate how, over the years, legal precedents shape and clarify the law, and allow it to evolve into areas, such as photography, broadcasting, and computing that were not thought of when the Constitution was written and the laws enacted.

Even these major decisions can not be considered irrevocable. Many were never tested by the Supreme Court and could be reversed. Recently, a federal court made a ruling that explicitly disagreed with the King vs. Mr. Maestro, Inc. case listed below.

Wheaton vs. Peters, 1834. This landmark case, established the principle that copyright is not a kind of natural right but rather is the creation of the copyright statute and subject to the conditions it imposes.

Baker vs. Selden, 1880. This case established that copyright law protects what an author writes and the way ideas are expressed, but the law does not protect the ideas themselves.

Burrow-Giles Lithographic Co. vs. Sarony, 1884. This decision expanded the scope of copyright to cover media other than text, in this case a photograph of Oscar Wilde.

Bleistein vs. Donaldson Lithographic Co., 1903. This case concerned three circus posters. The court decided that they were copyrightable, whether or not they had artistic value or were aesthetically pleasing.

Fred Fisher, Inc. vs. Dillingham, 1924. This dispute concerned the similarity in two musical passages. The court ruled that unconscious copying could result in an infringement of copyright.

Nichols vs. Universal Pictures Corp., 1931. The court ruled that it was not an infringement of copyright for a film to copy abstract ideas of plot and characters from a successful Broadway play.

Sheldon vs. Metro-Goldwyn Pictures Corp., 1936. The court ruled that "no plagiarist can excuse the wrong by showing how much of his work he did not pirate."

G. Ricordi & Co. vs. Paramount Pictures, Inc., 1951. This was a case about how renewal rights and rights in derivative works should be interpreted, in this instance the novel Madame Butterfly by John Luther Long, Belasco's play based on the novel, and Puccini's opera based on the play. The court ruled that copyright protection in derivative works applies only to the new material added.

Warner Bros. Pictures, Inc. vs. Columbia Broadcasting System, Inc., 1955. This case decided that the character Sam Spade in the story The Maltese Falcon was a vehicle for the story, not a copyrightable element of the work.

Mazer vs. Stein, 1954. The court decided that copyright does not protect utilitarian or useful objects, in this case a sculptural lamp. It is possible to register the separable pictorial, graphic, or sculptural features of a utilitarian piece.

King vs. Mr. Maestro, Inc., 1963. This was a case about the speech "I have a dream" by Martin Luther King, Jr.. Although he had delivered the speech to a huge crowd with simultaneous broadcast by radio and television, the court decided that this public performance did not constitute publication and the speech could be registered for copyright as an unpublished work.

Letter Edged in Black Press, Inc., vs. Public Building Commission of Chicago, 1970. This case, about the public display of a Picasso sculpture, has been superseded by later legislation.

Williams Electronics, Inc. vs. Artic International, Inc., 1982. This case involved copying a video game. The court ruled that video game components were copyrightable and that computer read-only memory can be considered a copy.

Norris Industries, Inc. vs. International Telephone and Telegraph Corp., 1983. The court ruled that, even if the Copyright Office rejects a work because it is not copyrightable, the owner is still entitled to file suit and to ask for a court ruling.

Privacy

Libraries, at least in the United States, feel strongly that users have a right to privacy. Nobody should know that a user is consulting books on sensitive issues, such as unpleasant diseases. Libraries have gone to court, rather than divulge to the police whether a patron was reading books about communism. Many states have laws that prohibit libraries from gathering data that violates the privacy of their users. The Internet community has a similar tradition. Although corporations have the legal right to inspect the activities of their employees, most technical people expect their electronic mail and their computer files to be treated as private under most normal circumstances.

Problems arise because much of the technology of digital libraries is also used for electronic commerce. Advertisers and merchants strive to gather the maximum amount of information about their customers, often without the knowledge of the customer. They sell such information to each other. The web has the concept of "cookies", which are useful for such purposes as recording when a user has been authenticated. Unfortunately, the same technology can also be used as a tool for tracking users' behavior without their knowledge.

As discussed in Panel 6.4, digital libraries must gather data on usage. Good data is needed to tune computer systems, anticipate problems, and plan for growth. With care, usage statistics can be gathered without identifying any specific individuals, but not everybody takes care. When a computer system fails, system administrators have the ability to look at any file on a server computer or inspect every message passing over a network. Occasionally they stumble across highly personal information or criminal activities. What is the correct behavior in these circumstance? What should the law say?

Panel 6.4
Digital library statistics and privacy

Anybody would runs a service needs to have data about how it is used. To manage a computer system, such as a digital library, requires data about performance and reliability, about utilization of capacity, about trends and peaks. Libraries and publishers need statistical data about how the collections are used and which services are popular. Designers of user interfaces depend upon knowledge of how people interact with the library. Security requires careful tracking of use patterns and analysis of anomalies.

This information is not easy to gather. Web sites gather data about how often each file is accessed. A common, though doubtful practice, is for sites to boast about how many "hits" they have. Since every graphic image is usually a separate file, a single user who reads one page may generate many hits. This statistic is important for configuring the computer system, but for nothing else. The manager of an online journal would like to know how many people read the journal, with data about the use of individual articles. The number of times that the contents page is accessed is probably useful information. So is the frequency with which the first page of each article is accessed.

These are indirect measures that can not distinguish between one person who reads a story several times and several different people who each read it once. They measure how often the computer system is accessed, but they have no way of estimating how many readers are accessing copies of the materials through caches or mirror sites. If the users are identified it becomes much easier to gather track usage and gather precise statistics, but then privacy becomes a problem.

In our work at Carnegie Mellon we handled this problem as follows. Every time that a user accessed an item a temporary record was created that included a hashed version of the user's ID. The temporary records were kept in a special part of the file system, which was never copied onto back-up tapes. Once a week, the records were analyzed, creating a report which could not be used to identify individual users. It contained information such as the number of different people who had searched each collection and the number of total searches. The temporary records with all trace of the users' identity were then discarded.

Software patents

Although the legal system has its problems, overall it has dealt well with the rapid growth of computing and the Internet. The worst exception is software patents. Few areas of the law are so far removed from the reality they are applied to. In too many cases the Patent Office approves patents that the entire computer industry knows to be foolish. Until recently, the Patent Office did not even employ trained computer scientists to evaluate patent applications. The examiners still award patents that are overly broad, that cover concepts that have been widely known for years, or that are simple applications of standard practice.

Part of the problem is that patent law is based on a concept of invention that does not fit computer science: Archimedes leaps from his bath crying, "Eureka." New ideas in software are created incrementally. The computer science community is quite homogeneous. People are trained at the same universities, use the same computers, and software. There is an open exchange of ideas through many channels. As a result, parallel groups work on the same problems and adapt the same standard techniques in the same incremental ways.

In one of our digital library projects, we did the initial design work as a small team working by itself. A few months later, we met two other groups who had worked on some of the same issues. The three groups had independently found solutions which were remarkably similar. One of the three groups kept their work private and filed a patent application. The other two followed the usual academic tradition of publishing their ideas to the world. Some of these concepts are now widely deployed in digital libraries. In this instance the patent application was turned down, but had it been approved, one group - the smallest contributor in our opinion - would have been in a position to dictate the development of this particular area.

The success of the Internet and the rapid expansion of digital libraries have been fueled by the open exchange of ideas. Patent law, with its emphasis on secrecy, litigation, and confrontation, can only harm such processes.

Footnote

This chapter, more than any other in the book, is a quick review of an extremely complex area. No attempt has been made to describe all the issues. The discussion reflects the author's viewpoint, which will probably need revision in time. Hopefully, however, the basic ideas will stand. Digital libraries are being developed in a world in which issues of users, content, and technology are interwoven with the economic, social, and legal context. These topics have to be studied together and can not be understood in isolation.

Last revision of content: January 1999
Formatted for the Web: December 2002
(c) Copyright The MIT Press 2000