Glossary
Digital libraries have absorbed terminology from many fields, including computing,
libraries, publishing, law, and more. This glossary gives brief explanations
of how some common terms are used in digital libraries today, which may not
be the usage in other contexts. Often the use in digital libraries has diverged
from or extended the original sense of a term.
- AACR2 (Anglo-American Cataloguing Rules)
- A set of rules that describe the content that is contained in library catalog
records.
- abstracting and indexing services
- Secondary information services that provide searching of scholarly and scientific
information, in particular of individual journal articles.
- access management
- Control of access to material in digital libraries. Sometimes called terms
and conditions or rights management.
- ACM Digital Library
- A digital library of the journals and conference proceedings published by
the Association for Computing Machinery.
- Alexandria Digital Library
- A digital library of geospatial information, based at the University of
California, Santa Barbara.
- American Memory and the National Digital Library Program
- The Library of Congress's digital library of materials converted from its
primary source materials related to American history.
- applet
- A small computer program that can be transmitted from a server to a client
computer and executed on the client.
- archives
- Collections with related systems and services, organized to emphasize the
long-term preservation of information.
- Art and Architecture Thesaurus
- A controlled vocabulary for fine art, architecture, decorative art, and
material culture, a project of the J. Paul Getty Trust.
- artifact
- A physical object in a library, archive, or museum.
- ASCII (American Standard Code for Information Interchange)
- A coding scheme that represents individual characters as 7 or 8 bits; printable
ASCII is a subset of ASCII.
- authentication
- Validation of a user, a computer, or some digital object to ensure that
it is what is claims to be.
- authorization
- Giving permission to a user or client computer to access specific information
and carry out approved actions.
- automatic indexing
- Creation of catalog or indexing records using computer programs, not human
cataloguers.
- Boolean searching
- Methods of information retrieval where a query consists of a sequence of
search terms, combined with operators, such as "and", "or",
and "not".
- browser
- A general-purpose user interface, used with the web and other online information
services. Also known as a web browser.
- browsing
- Exploration of a body of information, based on the organization of the collections
or scanning lists, rather than by direct searching.
- cache
- A temporary store that is used to keep a readily available copy of recently
used data or any data that is expected to be used frequently.
- California Digital Library
- A digital library that serves the nine campuses of the University of California.
- catalog
- A collection of bibliographic records created according to an established
set of rules.
- classification
- An organization of library materials by a hierarchy of subject categories.
- client
- A computer that acts on behalf of a user, including a user's personal computer,
or another computer that appears to a server to have that function.
- CGI (Common Gateway Interface)
- A programming interface that enables a web browser to be an interface to
information services other than web sites.
- Chemical Abstracts
- A secondary information service for chemistry.
- CNI (Coalition for Networked Information)
- A partnership of the Association for Research Libraries and Educause to
collaborate on academic networked information.
- complex object
- Library object that is made up from many inter-related elements or digital
objects.
- compression
- Reduction in the size of digital materials by removing redundancy or by
approximation; lossless compression can be reversed; lossy compression can
not be reversed since information is lost by approximation.
- computational linguistics
- The branch of natural language processing that deals with grammar and linguistics.
- controlled vocabulary
- A set of subject terms, and rules for their use in assigning terms to materials
for indexing and retrieval.
- conversion
- Transformation of information from one medium to another, including from
paper to digital form.
- CORBA
- A standard for distributed computing where an object on one computer invokes
an Object Request Broker (ORB) to interact with an object on another computer.
- CORE
- A project from 1991 to 1995 by Bellcore, Cornell University, OCLC, and the
American Chemical Society to convert chemistry journals to digital form.
- Cryptolope
- Secure container used to buy and sell content securely over the Internet,
developed by IBM.
- CSS (Cascading Style Sheets)
- System of style sheets for use with HTML, the basis of XLS.
- CSTR (Computer Science Technical Reports project)
- A DARPA-funded research project with CNRI and five universities, from 1992
to 1996.
- DARPA (Defense Advanced Research Projects Agency)
- A major sponsor of computer science research in the U.S., including digital
libraries. Formerly ARPA.
- data type
- Structural metadata associated with digital data that indicates the digital
format or the application used to process the data.
- DES (Data Encryption Standard)
- A method for private key encryption.
- Dewey Decimal Classification
- A classification scheme for library materials which uses a numeric code
to indicate subject areas.
- desktop metaphor
- User interface concept on personal computers that represents information
as files and folders on a desktop.
- Dienst
- An architecture for digital library services and an open protocol that provides
those services, developed at Cornell University, used in NCSTRL.
- digital archeology
- The process of retrieving information from damaged, fragmentary, and archaic
data sources.
- Digital Libraries Initiative
- A digital libraries research program. In Phase 1, from 1994 to 1998, NSF/DARPA/NASA
funded six university projects
- Phase 2 began in 1998/9.
- digital object
- An item as stored in a digital library, consisting of data, metadata, and
an identifier.
- digital signature
- A cryptographic code consisting of a hash, to indicate that data has not
changed, that can be decrypted with the public key of the creator of the signature.
- dissemination
- The transfer from the stored form of a digital object in a repository to
a client.
- distributed computing
- Computing systems in which services to users are provided by teams of computers
collaborating over a network.
- D-Lib Magazine
- A monthly, online publication about digital libraries research and innovation.
- DLITE
- An experimental user interface used with the Stanford University InfoBus.
- document
- Digital object that is the analog of a physical document, especially textual
materials; a document model is an object model for documents.
- domain name
- The name of a computer on the Internet; the domain name service (DNS) converts
domain names to IP addresses.
- DOI (Digital Object Identifier)
- An identifier used by publishers to identify materials published electronically,
a form of handle.
- DSSSL (Document Style Semantics and Specification Language)
- A general purpose system of style sheets for SGML.
- DTD (Document Type Definition)
- A mark-up specification for a class of documents, defined within the SGML
framework.
- Dublin Core
- A simple set of metadata elements used in digital libraries, primarily to
describe digital objects and for collections management, and for exchange
of metadata.
- dynamic object
- Digital object where the dissemination presented to the user depends upon
the execution of a computer program, or other external activity.
- EAD (Encoded Archival Description)
- A DTD used to encode electronic versions of finding aids for archival materials.
- electronic journal
- A online publication that is organized like a traditional printed journal,
either an online version of a printed journal or a journal that has only an
online existence.
- eLib
- A British program of innovation, around the theme of electronic publication.
- emulation
- Replication of a computing system to process programs and data from an early
system that is no longer available.
- encryption
- Techniques for encoding information for privacy or security, so that it
appears to be random data; the reverse process, decryption, requires knowledge
of a digital key.
- entities and elements
- In a mark-up language, entities are the basic unit of information, including
character entities; elements are strings of entities that form a structural
unit.
- expression
- The realization of a work, by expressing the abstract concept as actual
words, sounds, images, etc.
- fair use
- A concept in copyright law that allows limited use of copyright material
without requiring permission from the rights holders, e.g., for scholarship
or review.
- federated digital library
- A group of digital libraries that support common standards and services,
thus providing interoperability and a coherent service to users.
- field, subfield
- An individual item of information in a structured record, such as a catalog
or database record.
- fielded searching
- Methods for searching textual materials, including catalogs, where search
terms are matched against the content of specified fields.
- finding aid
- A textual document that describes holdings of an archive, library, or museum.
- firewall
- A computer system that screens data passing between network segments, used
to provide security for a private network at the point of connection to the
Internet.
- first sale
- A concept in copyright law that permits the purchaser of a book or other
object to transfer it to somebody else, without requiring permission from
the rights holders.
- FTP (File Transfer Protocol)
- A protocol used to transmit files between computers on the Internet.
- full text searching
- Methods for searching textual materials where the entire text is matched
against a query.
- gatherer
- A program that automatically assembles indexing information from digital
library collections.
- gazetteer
- A database used to translate between different representations of geospatial
references, such as place names and geographic coordinates.
- genre
- The class or category of an object when considered as an intellectual work.
- geospatial information
- Information that is reference by a geographic location.
- gif
- A format for storing compressed images.
- Google
- A web search program that ranks web pages in a list of hits by giving weight
to the links that reference a specific page.
- gopher
- A pre-web protocol used for building digital libraries, now largely obsolete.
- handle
- A system of globally-unique names for Internet resources and a computer
system for managing them, developed by CNRI; a form of URN.
- Harvest
- A research project that developed an architecture for distributed searching,
including protocols and formats.
- hash
- A short value calculated from digital data that serves to distinguish it
from other data.
- HighWire Press
- A publishing venture, from Stanford University Libraries, that provides
electronic versions of journals, on behalf of learned and professional societies.
- hit
- 1. An incoming request to a web server or other computer system.
2. In information retrieval, a document that is discovered in response to
a query.
- home page
- The introductory page to a collection of information on the web.
- HTML (Hyper-Text Mark-up Language)
- A simple mark-up and formatting language for text, with links to other objects,
used with the web.
- HTTP (Hyper-Text Transport Protocol)
- The basic protocol of the web, used for communication between browsers and
web sites.
- hyperlink
- A network link from one item in a digital library or web site to another.
- ICPSR (International Consortium for Political and Social Science Research)
- An archive of social science datasets, based at the University of Michigan.
- identifier
- A string of characters that identifies a specific resource in a digital
library or on a network.
- IETF (Internet Engineering Task Force)
- The body that coordinates the technological development of the Internet,
including standards.
- InfoBus
- An approach to interoperability that uses proxies as interfaces between
existing systems, developed at Stanford University.
- information discovery
- General term covering all strategies and methods of finding information
in a digital library.
- information retrieval
- Searching a body of information for objects that match a search query.
- Informedia
- A research program and digital library of segments of video, based at Carnegie
Mellon University.
- Inspec
- An indexing service for physics, engineering, computer science, and related
fields.
- Internet
- An international network, consisting of independently managed networks using
the TCP/IP protocols and a shared naming system. A successor to the ARPAnet.
- Internet RFC series
- The technical documentation of the Internet, provided by the Internet Engineering
Task Force. Internet Drafts are preliminary versions of RFCs.
- interoperability
- The task of building coherent services for users from components that are
technically different and independently managed.
- inverted file
- A list of the words in a set of documents and their locations within those
documents; an inverted list is the list of locations for a given word.
- item
- A specific piece of material in a digital library; a single instance or
copy of a manifestation.
- Java
- A programming language used for writing mobile code, especially for user
interfaces, developed by Sun Microsystems.
- JavaScript
- A scripting language used to embed executable instructions in a web page.
- JPEG
- A format for storing compressed images.
- JSTOR
- A subscription service, initiated by the Andrew W. Mellon Foundation, to
convert back runs of important journals and make them available to academic
libraries.
- key
- A digital code used to encrypt or decrypt messages. Private key encryption
uses a single, secret key. Dual key (public key) encryption uses two keys
of which one is secret and one is public.
- legacy system
- An existing system, usually a computer system, that must be accommodated
in building new systems.
- lexicon
- A linguistic tool with information about the morphological variations and
grammatical usage of words.
- Lexis
- A legal information service, a pioneer of full-text information online.
- Los Alamos E-Print Archives
- An open-access site for rapid distribution of research papers in physics
and related disciplines.
- manifestation
- Form given to an expression of a work, e.g., by representing it in digital
form.
- MARC (Machine-Readable Cataloging)
- A format used by libraries to store and exchange catalog records.
- mark-up language
- Codes embedded in a document that describe its structure and/or its format.
- Medline
- An indexing service for research in medicine and related fields, provided
by the National Library of Medicine.
- MELVYL
- A shared digital library system for academic institutions in California;
part of the California Digital Library.
- Memex
- A concept of an online library suggested by Vannevar Bush in 1945.
- Mercury
- An experimental digital library project to mount scientific journals online
at Carnegie Mellon University from 1987 to 1993.
- MeSH (Medical Subject Headings)
- A set of subject term and associated thesaurus used to describe medical
research, maintained by the National Library of Medicine.
- metadata
- Data about other data, commonly divided into descriptive metadata such as
bibliographic information, structural metadata about formats and structures,
and administrative metadata, which is used to manage information.
- migration
- Preservation of digital content, where the underlying information is retained
but older formats and internal structures are replaced by newer.
- MIME (Internet Media Type)
- A scheme for specifying the data type of digital material.
- mirror
- A computer system that contains a duplicate copy of information stored in
another system.
- mobile code
- Computer programs or parts of programs that are transmitted across a network
and executed by a remote computer.
- morphology
- Grammatical and other variants of words that are derived from the same root
or stem.
- Mosaic
- The first widely-used web browser, developed at the University of Illinois.
- MPEG
- A family of formats for compressing and storing digitized video and sound.
- multimedia
- A combination of several media types in a single digital object or collection,
e.g., images, audio, video.
- natural language processing
- Use of computers to interpret and manipulate words as part of a language.
- NCSTRL (Networked Computer Science Technical Reports Library)
- An international distributed library of computer science materials and services,
based at Cornell University.
- Netlib
- A digital library of mathematical software and related collections.
- NSF (National Science Foundation)
- U.S. government agency that supports science and engineering, including
digital libraries research.
- object
- A technical computing term for an independent piece of computer code with
its data. Hence, object-oriented programming, and distributed objects, where
objects are connected over a network.
- object model
- A description of the structural relationships among components of a library
object including its metadata.
- OCLC (Online Computer Library System)
- An organization that provides, among other services, a bibliographic utility
for libraries to share catalog records.
- OPAC (online public access catalog)
- An online library catalog used by library patrons.
- open access
- Resources that are openly available to users with no requirements for authentication
or payment.
- optical character recognition
- Automatic conversion of text from a digitized image to computer text.
- Pad++
- A experimental user interface for access to large collections of information,
based on semantic zooming.
- page description language
- A system for encoding documents that precisely describes their appearance
when rendered for printing or display.
- PDF (Portable Document Format)
- A page description language developed by Adobe Corporation to store and
render images of pages.
- peer review
- The procedure by which academic journal articles are reviewed by other researchers
before being accepted for publication.
- Perseus
- A digital library of hyperlinked sources in classics and related disciplines,
based at Tufts University.
- policy
- A rule established by the manager of a digital library that specifies which
users should be authorized to have what access to which materials.
- port
- A method used by TCP to specify which program running on a computer should
process a message arriving over the Internet.
- PostScript
- A programming language to create graphical output for printing, used as
a page description language.
- precision
- In information retrieval, the percentage of hits found by a search that
satisfy the request that generated the query.
- presentation profile
- Guidelines associated with a digital object that suggest how it might be
presented to a user.
- protocol
- A set of rules that describe the sequence of messages sent across a network,
specifying both syntax and semantics.
- proxy
- A computer that acts as a bridge between two computer systems that use different
standards, formats, or protocols.
- publish
- To make information available and distribute it to the public.
- PURL (Persistent URL)
- A method of providing persistent identifiers using standard web protocols,
developed by OCLC.
- query
- A textual string, possibly structured, that is used in information retrieval,
the task being to find objects that match the words in the query.
- ranked searching
- Methods of information retrieval that return a list of documents, ranked
in order of how well each matches the query,
- RDF (Resource Description Framework)
- A method for specifying the syntax of metadata, used to exchange metadata.
- RealAudio
- A format and protocol for compressing and storing digitized sound, and transmitting
it over a network to be played in real time.
- recall
- In informational retrieval, the percentage of the items in a body of material
which would satisfy a request that are actually found by a search.
- refresh
- To make an exact copy of data from older media to newer for long-term preservation.
- render
- To transform digital information in the form received from a repository
into a display on a computer screen, or for other presentation to the user.
- replication
- Make copies of digital material for backup, performance, reliability, or
preservation.
- repository
- A computer system used to store digital library collections and disseminate
them to users.
- RSA encryption
- A method of dual key (public key) encryption.
- scanning
- Method of conversion in which a physical object, e.g., a printed page, is
represented by a digital grid of pixels
- search term
- A single term within a query, usually a single word or short phrase.
- secondary information
- Information sources that describe other (primary) information, e.g., catalogs,
indexes, and abstracts; used to find information and manage collections.
- security
- Techniques and practices that preserve the integrity of computer systems,
and digital library services and collections.
- server
- Any computer on a network, other than a client, that stores collections
or provides services.
- SGML (Standard Generalized Markup Language)
- A system for creating mark-up languages that represent the structure of
a document.
- SICI (Serial Item and Contribution Identifier)
- An identifier for an issue of a serial or an article contained within a
serial.
- speech recognition
- Automatic conversion of spoken words to computer text.
- STARTS
- An experimental protocol for use in distributed searching, which enables
a client to combine results from several search engines.
- stemming
- In informational retrieval, reduction of morphological variants of a word
to a common stem.
- stop word
- A word that is so common that it is ignored in information retrieval. A
set of such words is called a stop list.
- structural type
- Metadata that indicates the structural category of a digital object.
- style sheet
- A set of rules that specify how mark-up in a document translates into the
appearance of the document when rendered.
- subscription
- In a digital library, a payment made by a person or an organization for
access to specific collections and services, usually for a fixed period, e.g.,
one year.
- subsequent use
- Use made of digital materials after they leave the control of a digital
library.
- tag
- A special string of characters embedded in marked-up text to indicate the
structure or format.
- TCP/IP
- The base protocols of the Internet. IP uses numeric IP addresses to join
network segments; TCP provides reliable delivery of messages between networked
computers.
- TEI (Text Encoding Initiative)
- A project to represent texts in digital form, emphasizing the needs of humanities
scholars. Also the DTD used by the program.
- TeX
- A method of encoding text that precisely describes its appearance when printed,
especially good for mathematical notation. LaTeX is a version of TeX.
- thesaurus
- A linguistic tool that relates words by meaning.
- Ticer Summer School
- A program at Tilburg University to educate experienced librarians about
digital libraries.
- Tipster
- A DARPA program of research to improve the quality of text processing methods,
including information retrieval.
- transliteration
- A systematic way to convert characters in one alphabet or phonetic sounds
into another alphabet.
- TREC (Text Retrieval Conferences)
- Annual conferences in which methods of text processing are evaluated against
standard collections and tasks.
- truncation
- Use of the first few letters of a word as a search term in information retrieval.
- Tulip
- An experiment in which Elsevier Science scanned material science journals
and a group of universities mounted them on local computers.
- UDP
- An Internet protocol which transmits data packets without error checking.
- Unicode
- A 16-bit code to represent the characters used in most of the world's scripts.
UTF-8 is an alternative encoding in which one or more 8-bit bytes represents
each Unicode character.
- union catalog
- A single catalog that contains records about materials in several collections
or libraries.
- URL (Uniform Resource Locator)
- A reference to a resource on the Internet, specifying a protocol, a computer,
a file on that computer, and parameters. An absolute URL specifies a location
as a domain name or IP address; a relative URL specifies a location relative
to the current file.
- URN (Uniform Resource Name)
- Location-independent names for Internet resources.
- WAIS
- An early version of Z39.50, used in digital libraries before the web, now
largely obsolete.
- Warwick Framework
- A general model that describes the various parts of a complex object, including
the various categories of metadata.
- watermark
- A code embedded into digital material that can be used to establish ownership,
may be visible or invisible to the user.
- web crawler
- A web indexing program that builds an index by following hyperlinks continuously
from web page to web page.
- webmaster
- A person who manages web sites.
- web search services
- Commercial services that provide searching of the web, including: Yahoo,
Altavista, Excite, Lycos, Infoseek, etc.
- web site
- A collection of information on the web; usually stored on a web server.
- Westlaw
- A legal information service provided by West Publishing.
- World Wide Web (web)
- An interlinked set of information sources on the Internet, and the technology
they use, including HTML, HTTP, URLs, and MIME.
- World Wide Web Consortium (W3C)
- A international consortium based at M.I.T. that coordinates technical developments
of the web.
- work
- The underlying intellectual abstraction behind some material in a digital
library.
- Xerox Digital Property Rights Language
- Syntax and rules for expressing rights, conditions, and fees for digital
works.
- XLS (eXtensible Style Language)
- System of style sheets for use with XML, derived from CSS.
- XML (eXtensible Mark-up Language)
- A simplified version of SGML intended for use with online information.
- Z39.50
- A protocol that allows a computer to search collections of information on
a remote system, create sets of results for further manipulation, and retrieve
information; mainly used for bibliographic information.
Last revision of content: January 1999
Formatted for the Web: December 2002
(c) Copyright The MIT Press 2000