Glossary
Digital libraries have absorbed terminology from many fields, including computing, 
  libraries, publishing, law, and more. This glossary gives brief explanations 
  of how some common terms are used in digital libraries today, which may not 
  be the usage in other contexts. Often the use in digital libraries has diverged 
  from or extended the original sense of a term.
 
  - AACR2 (Anglo-American Cataloguing Rules)
 
  - A set of rules that describe the content that is contained in library catalog 
    records.
 
  - abstracting and indexing services
 
  - Secondary information services that provide searching of scholarly and scientific 
    information, in particular of individual journal articles.
 
  - access management
 
  - Control of access to material in digital libraries. Sometimes called terms 
    and conditions or rights management.
 
  - ACM Digital Library
 
  - A digital library of the journals and conference proceedings published by 
    the Association for Computing Machinery.
 
  - Alexandria Digital Library
 
  - A digital library of geospatial information, based at the University of 
    California, Santa Barbara.
 
  - American Memory and the National Digital Library Program
 
  - The Library of Congress's digital library of materials converted from its 
    primary source materials related to American history.
 
  - applet
 
  - A small computer program that can be transmitted from a server to a client 
    computer and executed on the client.
 
  - archives
 
  - Collections with related systems and services, organized to emphasize the 
    long-term preservation of information.
 
  - Art and Architecture Thesaurus
 
  - A controlled vocabulary for fine art, architecture, decorative art, and 
    material culture, a project of the J. Paul Getty Trust.
 
  - artifact
 
  - A physical object in a library, archive, or museum.
 
  - ASCII (American Standard Code for Information Interchange)
 
  - A coding scheme that represents individual characters as 7 or 8 bits; printable 
    ASCII is a subset of ASCII.
 
  - authentication
 
  - Validation of a user, a computer, or some digital object to ensure that 
    it is what is claims to be.
 
  - authorization
 
  - Giving permission to a user or client computer to access specific information 
    and carry out approved actions.
 
  - automatic indexing
 
  - Creation of catalog or indexing records using computer programs, not human 
    cataloguers.
 
  - Boolean searching
 
  - Methods of information retrieval where a query consists of a sequence of 
    search terms, combined with operators, such as "and", "or", 
    and "not".
 
  - browser
 
  - A general-purpose user interface, used with the web and other online information 
    services. Also known as a web browser.
 
  - browsing
 
  - Exploration of a body of information, based on the organization of the collections 
    or scanning lists, rather than by direct searching.
 
  - cache
 
  - A temporary store that is used to keep a readily available copy of recently 
    used data or any data that is expected to be used frequently.
 
  - California Digital Library
 
  - A digital library that serves the nine campuses of the University of California.
 
  - catalog
 
  - A collection of bibliographic records created according to an established 
    set of rules.
 
  - classification
 
  - An organization of library materials by a hierarchy of subject categories.
 
  - client
 
  - A computer that acts on behalf of a user, including a user's personal computer, 
    or another computer that appears to a server to have that function.
 
  - CGI (Common Gateway Interface)
 
  - A programming interface that enables a web browser to be an interface to 
    information services other than web sites.
 
  - Chemical Abstracts
 
  - A secondary information service for chemistry.
 
  - CNI (Coalition for Networked Information)
 
  - A partnership of the Association for Research Libraries and Educause to 
    collaborate on academic networked information.
 
  - complex object
 
  - Library object that is made up from many inter-related elements or digital 
    objects.
 
  - compression
 
  - Reduction in the size of digital materials by removing redundancy or by 
    approximation; lossless compression can be reversed; lossy compression can 
    not be reversed since information is lost by approximation.
 
  - computational linguistics
 
  - The branch of natural language processing that deals with grammar and linguistics.
 
  - controlled vocabulary
 
  - A set of subject terms, and rules for their use in assigning terms to materials 
    for indexing and retrieval.
 
  - conversion
 
  - Transformation of information from one medium to another, including from 
    paper to digital form.
 
  - CORBA
 
  - A standard for distributed computing where an object on one computer invokes 
    an Object Request Broker (ORB) to interact with an object on another computer.
 
  - CORE
 
  - A project from 1991 to 1995 by Bellcore, Cornell University, OCLC, and the 
    American Chemical Society to convert chemistry journals to digital form.
 
  - Cryptolope
 
  - Secure container used to buy and sell content securely over the Internet, 
    developed by IBM.
 
  - CSS (Cascading Style Sheets)
 
  - System of style sheets for use with HTML, the basis of XLS.
 
  - CSTR (Computer Science Technical Reports project)
 
  - A DARPA-funded research project with CNRI and five universities, from 1992 
    to 1996.
 
  - DARPA (Defense Advanced Research Projects Agency)
 
  - A major sponsor of computer science research in the U.S., including digital 
    libraries. Formerly ARPA.
 
  - data type
 
  - Structural metadata associated with digital data that indicates the digital 
    format or the application used to process the data.
 
  - DES (Data Encryption Standard)
 
  - A method for private key encryption.
 
  - Dewey Decimal Classification
 
  - A classification scheme for library materials which uses a numeric code 
    to indicate subject areas.
 
  - desktop metaphor
 
  - User interface concept on personal computers that represents information 
    as files and folders on a desktop.
 
  - Dienst
 
  - An architecture for digital library services and an open protocol that provides 
    those services, developed at Cornell University, used in NCSTRL.
 
  - digital archeology
 
  - The process of retrieving information from damaged, fragmentary, and archaic 
    data sources.
 
  - Digital Libraries Initiative
 
  - A digital libraries research program. In Phase 1, from 1994 to 1998, NSF/DARPA/NASA 
    funded six university projects 
  
- Phase 2 began in 1998/9.
 
   - digital object
 
  - An item as stored in a digital library, consisting of data, metadata, and 
    an identifier.
 
  - digital signature
 
  - A cryptographic code consisting of a hash, to indicate that data has not 
    changed, that can be decrypted with the public key of the creator of the signature.
 
  - dissemination
 
  - The transfer from the stored form of a digital object in a repository to 
    a client.
 
  - distributed computing
 
  - Computing systems in which services to users are provided by teams of computers 
    collaborating over a network.
 
  - D-Lib Magazine
 
  - A monthly, online publication about digital libraries research and innovation.
 
  - DLITE
 
  - An experimental user interface used with the Stanford University InfoBus.
 
  - document
 
  - Digital object that is the analog of a physical document, especially textual 
    materials; a document model is an object model for documents.
 
  - domain name
 
  - The name of a computer on the Internet; the domain name service (DNS) converts 
    domain names to IP addresses.
 
  - DOI (Digital Object Identifier)
 
  - An identifier used by publishers to identify materials published electronically, 
    a form of handle.
 
  - DSSSL (Document Style Semantics and Specification Language)
 
  - A general purpose system of style sheets for SGML.
 
  - DTD (Document Type Definition)
 
  - A mark-up specification for a class of documents, defined within the SGML 
    framework.
 
  - Dublin Core
 
  - A simple set of metadata elements used in digital libraries, primarily to 
    describe digital objects and for collections management, and for exchange 
    of metadata.
 
  - dynamic object
 
  - Digital object where the dissemination presented to the user depends upon 
    the execution of a computer program, or other external activity.
 
  - EAD (Encoded Archival Description)
 
  - A DTD used to encode electronic versions of finding aids for archival materials.
 
  - electronic journal
 
  - A online publication that is organized like a traditional printed journal, 
    either an online version of a printed journal or a journal that has only an 
    online existence.
 
  - eLib
 
  - A British program of innovation, around the theme of electronic publication.
 
  - emulation
 
  - Replication of a computing system to process programs and data from an early 
    system that is no longer available.
 
  - encryption
 
  - Techniques for encoding information for privacy or security, so that it 
    appears to be random data; the reverse process, decryption, requires knowledge 
    of a digital key.
 
  - entities and elements
 
  - In a mark-up language, entities are the basic unit of information, including 
    character entities; elements are strings of entities that form a structural 
    unit.
 
  - expression
 
  - The realization of a work, by expressing the abstract concept as actual 
    words, sounds, images, etc.
 
  - fair use
 
  - A concept in copyright law that allows limited use of copyright material 
    without requiring permission from the rights holders, e.g., for scholarship 
    or review.
 
  - federated digital library
 
  - A group of digital libraries that support common standards and services, 
    thus providing interoperability and a coherent service to users.
 
  - field, subfield
 
  - An individual item of information in a structured record, such as a catalog 
    or database record.
 
  - fielded searching
 
  - Methods for searching textual materials, including catalogs, where search 
    terms are matched against the content of specified fields.
 
  - finding aid
 
  - A textual document that describes holdings of an archive, library, or museum.
 
  - firewall
 
  - A computer system that screens data passing between network segments, used 
    to provide security for a private network at the point of connection to the 
    Internet.
 
  - first sale
 
  - A concept in copyright law that permits the purchaser of a book or other 
    object to transfer it to somebody else, without requiring permission from 
    the rights holders.
 
  - FTP (File Transfer Protocol)
 
  - A protocol used to transmit files between computers on the Internet.
 
  - full text searching
 
  - Methods for searching textual materials where the entire text is matched 
    against a query.
 
  - gatherer
 
  - A program that automatically assembles indexing information from digital 
    library collections.
 
  - gazetteer
 
  - A database used to translate between different representations of geospatial 
    references, such as place names and geographic coordinates.
 
  - genre
 
  - The class or category of an object when considered as an intellectual work.
 
  - geospatial information
 
  - Information that is reference by a geographic location.
 
  - gif
 
  - A format for storing compressed images.
 
  - Google
 
  - A web search program that ranks web pages in a list of hits by giving weight 
    to the links that reference a specific page.
 
  - gopher
 
  - A pre-web protocol used for building digital libraries, now largely obsolete.
 
  - handle
 
  - A system of globally-unique names for Internet resources and a computer 
    system for managing them, developed by CNRI; a form of URN.
 
  - Harvest
 
  - A research project that developed an architecture for distributed searching, 
    including protocols and formats.
 
  - hash
 
  - A short value calculated from digital data that serves to distinguish it 
    from other data.
 
  - HighWire Press
 
  - A publishing venture, from Stanford University Libraries, that provides 
    electronic versions of journals, on behalf of learned and professional societies.
 
  - hit
 
  - 1. An incoming request to a web server or other computer system.
    2. In information retrieval, a document that is discovered in response to 
    a query. 
  - home page
 
  - The introductory page to a collection of information on the web.
 
  - HTML (Hyper-Text Mark-up Language)
 
  - A simple mark-up and formatting language for text, with links to other objects, 
    used with the web.
 
  - HTTP (Hyper-Text Transport Protocol)
 
  - The basic protocol of the web, used for communication between browsers and 
    web sites.
 
  - hyperlink
 
  - A network link from one item in a digital library or web site to another.
 
  - ICPSR (International Consortium for Political and Social Science Research)
 
  - An archive of social science datasets, based at the University of Michigan.
 
  - identifier
 
  - A string of characters that identifies a specific resource in a digital 
    library or on a network.
 
  - IETF (Internet Engineering Task Force)
 
  - The body that coordinates the technological development of the Internet, 
    including standards.
 
  - InfoBus
 
  - An approach to interoperability that uses proxies as interfaces between 
    existing systems, developed at Stanford University.
 
  - information discovery
 
  - General term covering all strategies and methods of finding information 
    in a digital library.
 
  - information retrieval
 
  - Searching a body of information for objects that match a search query.
 
  - Informedia
 
  - A research program and digital library of segments of video, based at Carnegie 
    Mellon University.
 
  - Inspec
 
  - An indexing service for physics, engineering, computer science, and related 
    fields.
 
  - Internet
 
  - An international network, consisting of independently managed networks using 
    the TCP/IP protocols and a shared naming system. A successor to the ARPAnet.
 
  - Internet RFC series
 
  - The technical documentation of the Internet, provided by the Internet Engineering 
    Task Force. Internet Drafts are preliminary versions of RFCs.
 
  - interoperability
 
  - The task of building coherent services for users from components that are 
    technically different and independently managed.
 
  - inverted file
 
  - A list of the words in a set of documents and their locations within those 
    documents; an inverted list is the list of locations for a given word.
 
  - item
 
  - A specific piece of material in a digital library; a single instance or 
    copy of a manifestation.
 
  - Java
 
  - A programming language used for writing mobile code, especially for user 
    interfaces, developed by Sun Microsystems.
 
  - JavaScript
 
  - A scripting language used to embed executable instructions in a web page.
 
  - JPEG
 
  - A format for storing compressed images.
 
  - JSTOR
 
  - A subscription service, initiated by the Andrew W. Mellon Foundation, to 
    convert back runs of important journals and make them available to academic 
    libraries.
 
  - key
 
  - A digital code used to encrypt or decrypt messages. Private key encryption 
    uses a single, secret key. Dual key (public key) encryption uses two keys 
    of which one is secret and one is public.
 
  - legacy system
 
  - An existing system, usually a computer system, that must be accommodated 
    in building new systems.
 
  - lexicon
 
  - A linguistic tool with information about the morphological variations and 
    grammatical usage of words.
 
  - Lexis
 
  - A legal information service, a pioneer of full-text information online.
 
  - Los Alamos E-Print Archives
 
  - An open-access site for rapid distribution of research papers in physics 
    and related disciplines.
 
  - manifestation
 
  - Form given to an expression of a work, e.g., by representing it in digital 
    form.
 
  - MARC (Machine-Readable Cataloging)
 
  - A format used by libraries to store and exchange catalog records.
 
  - mark-up language
 
  - Codes embedded in a document that describe its structure and/or its format.
 
  - Medline
 
  - An indexing service for research in medicine and related fields, provided 
    by the National Library of Medicine.
 
  - MELVYL
 
  - A shared digital library system for academic institutions in California; 
    part of the California Digital Library.
 
  - Memex
 
  - A concept of an online library suggested by Vannevar Bush in 1945.
 
  - Mercury
 
  - An experimental digital library project to mount scientific journals online 
    at Carnegie Mellon University from 1987 to 1993.
 
  - MeSH (Medical Subject Headings)
 
  - A set of subject term and associated thesaurus used to describe medical 
    research, maintained by the National Library of Medicine.
 
  - metadata
 
  - Data about other data, commonly divided into descriptive metadata such as 
    bibliographic information, structural metadata about formats and structures, 
    and administrative metadata, which is used to manage information.
 
  - migration
 
  - Preservation of digital content, where the underlying information is retained 
    but older formats and internal structures are replaced by newer.
 
  - MIME (Internet Media Type)
 
  - A scheme for specifying the data type of digital material.
 
  - mirror
 
  - A computer system that contains a duplicate copy of information stored in 
    another system.
 
  - mobile code
 
  - Computer programs or parts of programs that are transmitted across a network 
    and executed by a remote computer.
 
  - morphology
 
  - Grammatical and other variants of words that are derived from the same root 
    or stem.
 
  - Mosaic
 
  - The first widely-used web browser, developed at the University of Illinois.
 
  - MPEG
 
  - A family of formats for compressing and storing digitized video and sound.
 
  - multimedia
 
  - A combination of several media types in a single digital object or collection, 
    e.g., images, audio, video.
 
  - natural language processing
 
  - Use of computers to interpret and manipulate words as part of a language.
 
  - NCSTRL (Networked Computer Science Technical Reports Library)
 
  - An international distributed library of computer science materials and services, 
    based at Cornell University.
 
  - Netlib
 
  - A digital library of mathematical software and related collections.
 
  - NSF (National Science Foundation)
 
  - U.S. government agency that supports science and engineering, including 
    digital libraries research.
 
  - object
 
  - A technical computing term for an independent piece of computer code with 
    its data. Hence, object-oriented programming, and distributed objects, where 
    objects are connected over a network.
 
  - object model
 
  - A description of the structural relationships among components of a library 
    object including its metadata.
 
  - OCLC (Online Computer Library System)
 
  - An organization that provides, among other services, a bibliographic utility 
    for libraries to share catalog records.
 
  - OPAC (online public access catalog)
 
  - An online library catalog used by library patrons.
 
  - open access
 
  - Resources that are openly available to users with no requirements for authentication 
    or payment.
 
  - optical character recognition
 
  - Automatic conversion of text from a digitized image to computer text.
 
  - Pad++
 
  - A experimental user interface for access to large collections of information, 
    based on semantic zooming.
 
  - page description language
 
  - A system for encoding documents that precisely describes their appearance 
    when rendered for printing or display.
 
  - PDF (Portable Document Format)
 
  - A page description language developed by Adobe Corporation to store and 
    render images of pages.
 
  - peer review
 
  - The procedure by which academic journal articles are reviewed by other researchers 
    before being accepted for publication.
 
  - Perseus
 
  - A digital library of hyperlinked sources in classics and related disciplines, 
    based at Tufts University.
 
  - policy
 
  - A rule established by the manager of a digital library that specifies which 
    users should be authorized to have what access to which materials.
 
  - port
 
  - A method used by TCP to specify which program running on a computer should 
    process a message arriving over the Internet.
 
  - PostScript
 
  - A programming language to create graphical output for printing, used as 
    a page description language.
 
  - precision
 
  - In information retrieval, the percentage of hits found by a search that 
    satisfy the request that generated the query.
 
  - presentation profile
 
  - Guidelines associated with a digital object that suggest how it might be 
    presented to a user.
 
  - protocol
 
  - A set of rules that describe the sequence of messages sent across a network, 
    specifying both syntax and semantics.
 
  - proxy
 
  - A computer that acts as a bridge between two computer systems that use different 
    standards, formats, or protocols.
 
  - publish
 
  - To make information available and distribute it to the public.
 
  - PURL (Persistent URL)
 
  - A method of providing persistent identifiers using standard web protocols, 
    developed by OCLC.
 
  - query
 
  - A textual string, possibly structured, that is used in information retrieval, 
    the task being to find objects that match the words in the query.
 
  - ranked searching
 
  - Methods of information retrieval that return a list of documents, ranked 
    in order of how well each matches the query,
 
  - RDF (Resource Description Framework)
 
  - A method for specifying the syntax of metadata, used to exchange metadata.
 
  - RealAudio
 
  - A format and protocol for compressing and storing digitized sound, and transmitting 
    it over a network to be played in real time.
 
  - recall
 
  - In informational retrieval, the percentage of the items in a body of material 
    which would satisfy a request that are actually found by a search.
 
  - refresh
 
  - To make an exact copy of data from older media to newer for long-term preservation.
 
  - render
 
  - To transform digital information in the form received from a repository 
    into a display on a computer screen, or for other presentation to the user.
 
  - replication
 
  - Make copies of digital material for backup, performance, reliability, or 
    preservation.
 
  - repository
 
  - A computer system used to store digital library collections and disseminate 
    them to users.
 
  - RSA encryption
 
  - A method of dual key (public key) encryption.
 
  - scanning
 
  - Method of conversion in which a physical object, e.g., a printed page, is 
    represented by a digital grid of pixels
 
  - search term
 
  - A single term within a query, usually a single word or short phrase.
 
  - secondary information
 
  - Information sources that describe other (primary) information, e.g., catalogs, 
    indexes, and abstracts; used to find information and manage collections.
 
  - security
 
  - Techniques and practices that preserve the integrity of computer systems, 
    and digital library services and collections.
 
  - server
 
  - Any computer on a network, other than a client, that stores collections 
    or provides services.
 
  - SGML (Standard Generalized Markup Language)
 
  - A system for creating mark-up languages that represent the structure of 
    a document.
 
  - SICI (Serial Item and Contribution Identifier)
 
  - An identifier for an issue of a serial or an article contained within a 
    serial.
 
  - speech recognition
 
  - Automatic conversion of spoken words to computer text.
 
  - STARTS
 
  - An experimental protocol for use in distributed searching, which enables 
    a client to combine results from several search engines.
 
  - stemming
 
  - In informational retrieval, reduction of morphological variants of a word 
    to a common stem.
 
  - stop word
 
  - A word that is so common that it is ignored in information retrieval. A 
    set of such words is called a stop list.
 
  - structural type
 
  - Metadata that indicates the structural category of a digital object.
 
  - style sheet
 
  - A set of rules that specify how mark-up in a document translates into the 
    appearance of the document when rendered.
 
  - subscription
 
  - In a digital library, a payment made by a person or an organization for 
    access to specific collections and services, usually for a fixed period, e.g., 
    one year.
 
  - subsequent use
 
  - Use made of digital materials after they leave the control of a digital 
    library.
 
  - tag
 
  - A special string of characters embedded in marked-up text to indicate the 
    structure or format.
 
  - TCP/IP
 
  - The base protocols of the Internet. IP uses numeric IP addresses to join 
    network segments; TCP provides reliable delivery of messages between networked 
    computers.
 
  - TEI (Text Encoding Initiative)
 
  - A project to represent texts in digital form, emphasizing the needs of humanities 
    scholars. Also the DTD used by the program.
 
  - TeX
 
  - A method of encoding text that precisely describes its appearance when printed, 
    especially good for mathematical notation. LaTeX is a version of TeX.
 
  - thesaurus
 
  - A linguistic tool that relates words by meaning.
 
  - Ticer Summer School
 
  - A program at Tilburg University to educate experienced librarians about 
    digital libraries.
 
  - Tipster
 
  - A DARPA program of research to improve the quality of text processing methods, 
    including information retrieval.
 
  - transliteration
 
  - A systematic way to convert characters in one alphabet or phonetic sounds 
    into another alphabet.
 
  - TREC (Text Retrieval Conferences)
 
  - Annual conferences in which methods of text processing are evaluated against 
    standard collections and tasks.
 
  - truncation
 
  - Use of the first few letters of a word as a search term in information retrieval.
 
  - Tulip
 
  - An experiment in which Elsevier Science scanned material science journals 
    and a group of universities mounted them on local computers.
 
  - UDP
 
  - An Internet protocol which transmits data packets without error checking.
 
  - Unicode
 
  - A 16-bit code to represent the characters used in most of the world's scripts. 
    UTF-8 is an alternative encoding in which one or more 8-bit bytes represents 
    each Unicode character.
 
  - union catalog
 
  - A single catalog that contains records about materials in several collections 
    or libraries.
 
  - URL (Uniform Resource Locator)
 
  - A reference to a resource on the Internet, specifying a protocol, a computer, 
    a file on that computer, and parameters. An absolute URL specifies a location 
    as a domain name or IP address; a relative URL specifies a location relative 
    to the current file.
 
  - URN (Uniform Resource Name)
 
  - Location-independent names for Internet resources.
 
  - WAIS
 
  - An early version of Z39.50, used in digital libraries before the web, now 
    largely obsolete.
 
  - Warwick Framework
 
  - A general model that describes the various parts of a complex object, including 
    the various categories of metadata.
 
  - watermark
 
  - A code embedded into digital material that can be used to establish ownership, 
    may be visible or invisible to the user.
 
  - web crawler
 
  - A web indexing program that builds an index by following hyperlinks continuously 
    from web page to web page.
 
  - webmaster
 
  - A person who manages web sites.
 
  - web search services
 
  - Commercial services that provide searching of the web, including: Yahoo, 
    Altavista, Excite, Lycos, Infoseek, etc.
 
  - web site
 
  - A collection of information on the web; usually stored on a web server.
 
  - Westlaw
 
  - A legal information service provided by West Publishing.
 
  - World Wide Web (web)
 
  - An interlinked set of information sources on the Internet, and the technology 
    they use, including HTML, HTTP, URLs, and MIME.
 
  - World Wide Web Consortium (W3C)
 
  - A international consortium based at M.I.T. that coordinates technical developments 
    of the web.
 
  - work
 
  - The underlying intellectual abstraction behind some material in a digital 
    library.
 
  - Xerox Digital Property Rights Language
 
  - Syntax and rules for expressing rights, conditions, and fees for digital 
    works.
 
  - XLS (eXtensible Style Language)
 
  - System of style sheets for use with XML, derived from CSS.
 
  - XML (eXtensible Mark-up Language)
 
  - A simplified version of SGML intended for use with online information.
 
  - Z39.50
 
  - A protocol that allows a computer to search collections of information on 
    a remote system, create sets of results for further manipulation, and retrieve 
    information; mainly used for bibliographic information.
 
Last revision of content: January 1999
  Formatted for the Web: December 2002
  (c) Copyright The MIT Press 2000