The Apache HTTP Server is the most widely used web server on the Internet. The Apache server was developed from an early version of the original NCSA server with the intent of providing further improvement while maintaining compatibility. Since then, all development efforts on the NCSA server have ceased. Apache has since earned the title of reigning king among web servers, and it isn't hard to see why: the base distribution is fast, free, and full-featured. It runs on many different platforms and has a multitude of third-party modules available to expand its functionality.
You can pick up a copy of the Apache server and its documentation from the Apache home page: http://www.apache.org. This chapter covers Version 2.0 of the Apache server. Most of the configuration and module functionality are similar to the last major release, 1.3, which is still in wide use. Major differences between the versions will be noted.
The Apache distribution consists of the source for the core binary, httpd, the standard set of modules, and numerous additional header and configuration files. You can compile the server for your particular architecture and preferences using the config-make-make install routing common to building open source software. The latest version of gcc or another up-to-date ANSI C compiler is required to compile and build Apache.
However, you may not have to compile Apache from source. Most Linux and Mac OS X distributions have Apache already built-in. Furthermore, binaries are available for most popular platforms. Refer to www.apache.org for details.
By itself, httpd doesn't do more than listen for requests and deliver files as is. Apache is designed to load special modules to implement additional functionality. These modules define much of the behavior of the Apache server. A set of standard modules is distributed with the server, including a set of core modules that is automatically compiled into the server binary. Apache will call on modules as needed to perform a dedicated task, such as user authentication or database queries.
Modules must be compiled first to be used by the server, and can be loaded in two ways: statically or dynamically. Modules can be statically built directly into the server binary at compile time:
./configure --enable-module ./configure --disable-base_module ./configure --enable-modules=module_list
Alternatively, you can compile modules as DSO's (Dynamically Shared Objects) and load them as needed at run-time (when the server is started or restarted) by identifying them with the LoadModule directive in the configuration file.
To compile shared modules at compile time, use:
./configure --enable-MODULE=shared
DSO modules may also be compiled with apxs (Apache Extension Tool) at any time outside of the Apache source tree. See the Apache documentation for full details on apxs.
At startup, Apache reads the main server configuration file httpd.conf. You can control the behavior of the server and its modules by inserting or modifying the directives within this file. Additional configuration can occur on a directory-specific level using .htaccess files. These are configuration files like httpd.conf, but the directives they contain apply only to the directory where they reside. This allows for delegation of control over separate content areas of a single server, and may simplify server management.
The Apache server uses one other configuration file, mime.types, to determine what MIME types should be associated with what file suffixes (see Chapter 17).
The configuration files contain directives, which are one-line commands that tell the server what to do. In addition to the directives themselves, the configuration files may contain any number of blank lines or comment lines beginning with a hash mark (#). Although directive names are not case-sensitive, we use the case conventions in the default files. Example copies of each of these files are included with the server software distribution, which you can refer to for more information.
The first things Apache needs from the configuration file are basics like the listening port, server name, the default locations for content, logs, and other important files, and what modules to load. After that, the wider server functionality is configured. This includes access control, virtual hosts, special resource handling, and module-specific directives.
Here are some basic directives you might find in the httpd.conf configuration file:
ServerType standalone Port 80 ServerAdmin webmaster@oreilly.com ServerName webnuts.oreilly.com User nobody Group nobody
Each directive here specifies a property of the server's configuration and binds it to a default setting or value. Since these directives exist on their own in the configuration file, their context is that of the whole server. Many directives will appear in special subsections that limit their scope. Directives that define subsections are bracketed, XML-like elements. For example:
<Directory /docs> Deny From All </Directory>
This configuration section sets a directive for requests to a single directory /docs. Many configuration sections apply to locations of file on the server, such as <Files>, <Location>, and <Directory>. Other configuration sections define virtual servers (<VirtualHost>) or contain directives specific to a module (<IfModule>)
All server configuration can occur in the httpd.conf file, but you may want to allow special configuration of only certain parts of your server—you could let a user configure some aspects of how documents in her directory are served. By default, Apache looks for .htaccess files in every directory it serves a file from. .htaccess may contain any configuration directives allowed by the server configuration file with the AllowOverride directive. For example, if httpd.conf contained the line:
AllowOverride AuthConfig
most of the directives from the user authorization modules (Auth*) could be used in an .htaccess file to limit access to the files in that directory. This is exactly equivalent to using the same directives within a specified <Directory> section in httpd.conf.
Since .htaccess files affect the directory they are in and any subdirectories, they have a cascading affect on configuration. A directive in a lower-level .htaccess requires an AllowOverride from a parent-level .htaccess to work. This places increased load on the server, which must search for .htaccess files and parse them for every request in the current and parent-level directories. If you want to completely ignore .htaccess files, use AllowOverride None in httpd.conf.
On Unix systems, the Apache daemon httpd always starts itself as a system superuser (root). This is often done at startup through entries in the system initialization files. On Windows, the Apache service is called apache and runs with administrator privileges.
Once started, Apache's job is to listen for requests on any address and port to which it has been configured. When handling a request from a specific client, Apache spawns a separate process to handle the connection. This spawned process, however, doesn't run as the superuser; for security reasons, it instead runs as a restricted user that serves files to the client.
Apache normally has five such processes waiting for connections; hence, after startup, you will see one process (httpd) running as root and five processes owned by the Apache user ID, which stand to service requests. You can reconfigure that number, as well as the minimum and maximum number of service processes allowed with the StartServers, MinSpareServers, and MaxSpareServers directives. Each process handles specific HTTP requests for the client, such as GET or POST, which affect content on the server.
All resources available to visiting browsers (HTML documents, images, etc.) reside by default under a single root directory defined by the DocumentRoot directive. This defines the base directory that is prepended to a URL path to locate a file on the server. Most URL mapping is as simple as locating a file under the document root, but more complex mapping can be defined through aliasing, redirection, and URL rewriting using the mod_alias and mod_rewrite modules.
Webmasters often find the need to restrict some or all of the data on their servers to authorized users. Access can be controlled by requiring username and password information or by restricting the originating IP address of the client request. The mod_access and mod_auth core modules provide basic access control for Apache.
Access control is usually confined to specific directories of the document tree. You can place authorization directives in httpd.conf within <Directory> sections, or within .htaccess files in the restricted directory itself (using AllowOverride AuthConfig).
This example shows the directives used to configure username and password access to a specific directory:
<Directory /projects> Options All AuthType Basic AuthName "Editorial Group" AuthUserFile /usr/local/etc/httpd.conf/.htpasswd AuthGroupFile /usr/local/etc/httpd.conf/.htgroup require group editors </Directory>
The AuthType directive specifies the type of authentication used. "Basic" authentication describes the simple authorization scheme used by Apache where user password files are created with the htpasswd program. AuthName specifies the authorization "Realm". The realm can describe many different server locations so that an authorized user does not have to re-supply his password information as he navigates. AuthUserFile provides the user/password file location, and AuthGroupFile provides the group file location. require sets the restriction to only members of the group "editors".
The following configuration section limits access to a directory to requests from a specific domain:
<Directory /projects/golf> order deny,allow deny from all allow from .golf.org </Directory>
A password file is needed for user and group-level authentication. The location and name of the password file are specified with the AuthUserName directive. The easiest and most common way to create a password file or add passwords is to use the htpasswd program that is distributed with the server. If a password file already exists for a location, you can type:
htpasswd pathname username
The program then asks you to type the password you wish for the given username twice, and the username and encrypted password are stored in the new file.
If a password file does not exist yet, you can create one by typing the same command with the -c option (e.g., htpasswd -c pathname username). But be careful, since the -c option will create a new file without checking if one already exists, thereby overwriting any existing passwords.
Password files created with .htpasswd are similar to Unix password files. Keep in mind, however, that there is no correspondence between valid users and passwords on a Unix server, and users and passwords on an Apache web server. You do not need an account on the Unix server to access the web server.
You can bundle several users into a single named group by creating a group file. The location and name of the group file are specified with the AuthGroupFile directive. Each line of a group file specifies the group name, followed by a colon, followed by a list of valid usernames that belong to the group:
groupname: username1 username2 username3 ...
Each user in a group needs to be entered into the Apache password file. When a group authentication is required, the server accepts any valid username/password from the group.
The .htpasswd user authentication scheme is known as the basic authentication method for HTTP servers. Apache allows other types of authentication methods, which are configured with a similar set of directives.
Apache also has the ability to perform virtual hosting. This allows a single httpd process to serve multiple IP addresses or hostnames. Virtual hosting seems like a complicated procedure; however, it really isn't as bad as it seems. In each configuration file, you can structure directives that apply only to virtual hosts. For example, you can specify separate DocumentRoot directives for each virtual machine, such that someone connecting to www.oreilly.com is served one set of documents, while another client connecting to www.onlamp.com receives another, even though the content for each of these sites is served by the same server on the same machine.
To create a virtual server, simply enclose httpd.conf directives related to the server in a <VirtualHost> directive. Here is an example httpd.conf configuration that will set up two virtual servers:
ServerName www.oreilly.com AccessConfig /dev/null ResourceConfig /dev/null <VirtualHost www.oreilly.com> ServerAdmin webmaster@oreilly.com DocumentRoot /usr/local/www/virtual/htdocs/oreilly ServerName www.oreilly.com ErrorLog /usr/local/www/virtual/htdocs/oreilly/error_log TransferLog /usr/local/www/virtual/htdocs/oreilly/transfer_log </VirtualHost> <VirtualHost www.onlamp.com> ServerAdmin webmaster@onlamp.com DocumentRoot /usr/local/www/virtual/htdocs/onlamp ServerName www.onlamp.com ErrorLog /usr/local/www/virtual/htdocs/onlamp/error_log TransferLog /usr/local/www/virtual/htdocs/onlamp/transfer_log </VirtualHost>
Apache creates two log files by default: the error log and the access log. The server's error log records any errors the server encounters during execution. The access log records all client requests made to the server. You can set the locations of these files with the ErrorLog and CustomLog directives.
Access logs are highly configurable. The LogFormat directive allows you to specify which data is recorded for each server transaction. For example, the following directive:
LogFormat "%h %l %u %t \"%r\" %>s %b" common
configures the access log to record information in the Common Log Format, which includes such data as the client IP, user ID, time of request, the request command, and the server's response.
Copyright © 2003 O'Reilly & Associates. All rights reserved.