On occasion, you’ll need to run a piece of code on each and every request that Django handles. This code might need to modify the request before the view handles it, it might need to log information about the request for debugging purposes, and so forth.
You can do this with Django’s middleware framework, which is a set of hooks into Django’s request/response processing. It’s a light, low-level “plug-in” system capable of globally altering both Django’s input and output.
Each middleware component is responsible for doing some specific function. If you’re reading this book linearly (sorry, postmodernists), you’ve seen middleware a number of times already:
This chapter dives deeper into exactly what middleware is and how it works, and explains how you can write your own middleware.
A middleware component is simply a Python class that conforms to a certain API. Before diving into the formal aspects of what that API is, let’s look at a very simple example.
High-traffic sites often need to deploy Django behind a load-balancing proxy (see Chapter 20). This can cause a few small complications, one of which is that every request’s remote IP (request.META["REMOTE_IP"]) will be that of the load balancer, not the actual IP making the request. Load balancers deal with this by setting a special header, X-Forwarded-For, to the actual requesting IP address.
So here’s a small bit of middleware that lets sites running behind a proxy still see the correct IP address in request.META["REMOTE_ADDR"]:
class SetRemoteAddrFromForwardedFor(object): def process_request(self, request): try: real_ip = request.META['HTTP_X_FORWARDED_FOR'] except KeyError: pass else: # HTTP_X_FORWARDED_FOR can be a comma-separated list of IPs. # Take just the first one. real_ip = real_ip.split(",")[0] request.META['REMOTE_ADDR'] = real_ip
If this is installed (see the next section), every request’s X-Forwarded-For value will be automatically inserted into request.META['REMOTE_ADDR']. This means your Django applications don’t need to be concerned with whether they’re behind a load-balancing proxy or not; they can simply access request.META['REMOTE_ADDR'], and that will work whether or not a proxy is being used.
In fact, this is a common enough need that this piece of middleware is a built-in part of Django. It lives in django.middleware.http, and you can read a bit more about it in the next section.
If you’ve read this book straight through, you’ve already seen a number of examples of middleware installation; many of the examples in previous chapters have required certain middleware. For completeness, here’s how to install middleware.
To activate a middleware component, add it to the MIDDLEWARE_CLASSES tuple in your settings module. In MIDDLEWARE_CLASSES, each middleware component is represented by a string: the full Python path to the middleware’s class name. For example, here’s the default MIDDLEWARE_CLASSES created by django-admin.py startproject:
MIDDLEWARE_CLASSES = ( 'django.middleware.common.CommonMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.middleware.doc.XViewMiddleware' )
A Django installation doesn’t require any middleware — MIDDLEWARE_CLASSES can be empty, if you’d like — but we recommend that you activate CommonMiddleware, which we explain shortly.
The order is significant. On the request and view phases, Django applies middleware in the order given in MIDDLEWARE_CLASSES, and on the response and exception phases, Django applies middleware in reverse order. That is, Django treats MIDDLEWARE_CLASSES as a sort of “wrapper” around the view function: on the request it walks down the list to the view, and on the response it walks back up. See the section “How Django Processes a Request: Complete Details” in Chapter 3 for a review of the phases.
Now that you know what middleware is and how to install it, let’s take a look at all the available methods that middleware classes can define.
Use __init__() to perform systemwide setup for a given middleware class.
For performance reasons, each activated middleware class is instantiated only once per server process. This means that __init__() is called only once — at server startup — not for individual requests.
A common reason to implement an __init__() method is to check whether the middleware is indeed needed. If __init__() raises django.core.exceptions.MiddlewareNotUsed, then Django will remove the middleware from the middleware stack. You might use this feature to check for some piece of software that the middleware class requires, or check whether the server is running debug mode, or any other such environment situation.
If a middleware class defines an __init__() method, the method should take no arguments beyond the standard self.
This method gets called as soon as the request has been received — before Django has parsed the URL to determine which view to run. It gets passed the HttpRequest object, which you may modify at will.
process_request() should return either None or an HttpResponse object.
This method gets called after the request preprocessor is called and Django has determined which view to execute, but before that view has actually been executed.
The arguments passed to this view are shown in Table 15-1.
Argument | Explanation |
---|---|
request | The HttpRequest object. |
view | The Python function that Django will call to handle this request. This is the actual function object itself, not the name of the function as a string. |
args | The list of positional arguments that will be passed to the view, not including the request argument (which is always the first argument to a view). |
kwargs | The dictionary of keyword arguments that will be passed to the view. |
Just like process_request(), process_view() should return either None or an HttpResponse object.
This method gets called after the view function is called and the response is generated. Here, the processor can modify the content of a response; one obvious use case is content compression, such as gzipping of the request’s HTML.
The parameters should be pretty self-explanatory: request is the request object, and response is the response object returned from the view.
Unlike the request and view preprocessors, which may return None, process_response() must return an HttpResponse object. That response could be the original one passed into the function (possibly modified) or a brand-new one.
This method gets called only if something goes wrong and a view raises an uncaught exception. You can use this hook to send error notifications, dump postmortem information to a log, or even try to recover from the error automatically.
The parameters to this function are the same request object we’ve been dealing with all along, and exception, which is the actual Exception object raised by the view function.
process_exception() should return a either None or an HttpResponse object.
Note
Django ships with a number of middleware classes (discussed in the following section) that make good examples. Reading the code for them should give you a good feel for the power of middleware.
You can also find a number of community-contributed examples on Django’s wiki: http://code.djangoproject.com/wiki/ContributedMiddleware
Django comes with some built-in middleware to deal with common problems, which we discuss in the sections that follow.
Middleware class: django.contrib.auth.middleware.AuthenticationMiddleware.
This middleware enables authentication support. It adds the request.user attribute, representing the currently logged-in user, to every incoming HttpRequest object.
See Chapter 12 for complete details.
Middleware class: django.middleware.common.CommonMiddleware.
This middleware adds a few conveniences for perfectionists:
Forbids access to user agents in the ``DISALLOWED_USER_AGENTS`` setting: If provided, this setting should be a list of compiled regular expression objects that are matched against the user-agent header for each incoming request. Here’s an example snippet from a settings file:
import re DISALLOWED_USER_AGENTS = ( re.compile(r'^OmniExplorer_Bot'), re.compile(r'^Googlebot') )
Note the import re, because DISALLOWED_USER_AGENTS requires its values to be compiled regexes (i.e., the output of re.compile()). The settings file is regular python, so it’s perfectly OK to include Python import statements in it.
Performs URL rewriting based on the ``APPEND_SLASH`` and ``PREPEND_WWW`` settings: If APPEND_SLASH is True, URLs that lack a trailing slash will be redirected to the same URL with a trailing slash, unless the last component in the path contains a period. So foo.com/bar is redirected to foo.com/bar/, but foo.com/bar/file.txt is passed through unchanged.
If PREPEND_WWW is True, URLs that lack a leading “www.” will be redirected to the same URL with a leading “www.”.
Both of these options are meant to normalize URLs. The philosophy is that each URL should exist in one — and only one — place. Technically the URL example.com/bar is distinct from example.com/bar/, which in turn is distinct from www.example.com/bar/. A search-engine indexer would treat these as separate URLs, which is detrimental to your site’s search-engine rankings, so it’s a best practice to normalize URLs.
Handles ETags based on the ``USE_ETAGS`` setting: ETags are an HTTP-level optimization for caching pages conditionally. If USE_ETAGS is set to True, Django will calculate an ETag for each request by MD5-hashing the page content, and it will take care of sending Not Modified responses, if appropriate.
Note there is also a conditional GET middleware, covered shortly, which handles ETags and does a bit more.
Middleware class: django.middleware.gzip.GZipMiddleware.
This middleware automatically compresses content for browsers that understand gzip compression (all modern browsers). This can greatly reduce the amount of bandwidth a Web server consumes. The tradeoff is that it takes a bit of processing time to compress pages.
We usually prefer speed over bandwidth, but if you prefer the reverse, just enable this middleware.
Middleware class: django.middleware.http.ConditionalGetMiddleware.
This middleware provides support for conditional GET operations. If the response has an Last-Modified or ETag or header, and the request has If-None-Match or If-Modified-Since, the response is replaced by an 304 (“Not modified”) response. ETag support depends on on the USE_ETAGS setting and expects the ETag response header to already be set. As discussed above, the ETag header is set by the Common middleware.
It also removes the content from any response to a HEAD request and sets the Date and Content-Length response headers for all requests.
Middleware class: django.middleware.http.SetRemoteAddrFromForwardedFor.
This is the example we examined in the “What’s Middleware?” section earlier. It sets request.META['REMOTE_ADDR'] based on request.META['HTTP_X_FORWARDED_FOR'], if the latter is set. This is useful if you’re sitting behind a reverse proxy that causes each request’s REMOTE_ADDR to be set to 127.0.0.1.
Danger!
This middleware does not validate HTTP_X_FORWARDED_FOR.
If you’re not behind a reverse proxy that sets HTTP_X_FORWARDED_FOR automatically, do not use this middleware. Anybody can spoof the value of HTTP_X_FORWARDED_FOR, and because this sets REMOTE_ADDR based on HTTP_X_FORWARDED_FOR, that means anybody can fake his IP address.
Only use this middleware when you can absolutely trust the value of HTTP_X_FORWARDED_FOR.
Middleware class: django.contrib.sessions.middleware.SessionMiddleware.
This middleware enables session support. See Chapter 12 for details.
Middleware class: django.middleware.cache.CacheMiddleware.
This middleware caches each Django-powered page. This was discussed in detail in Chapter 13.
Middleware class: django.middleware.transaction.TransactionMiddleware.
This middleware binds a database COMMIT or ROLLBACK to the request/response phase. If a view function runs successfully, a COMMIT is issued. If the view raises an exception, a ROLLBACK is issued.
The order of this middleware in the stack is important. Middleware modules running outside of it run with commit-on-save — the default Django behavior. Middleware modules running inside it (coming later in the stack) will be under the same transaction control as the view functions.
See Appendix C for more about information about database transactions.
Middleware class: django.middleware.doc.XViewMiddleware.
This middleware sends custom X-View HTTP headers to HEAD requests that come from IP addresses defined in the INTERNAL_IPS setting. This is used by Django’s automatic documentation system.
Web developers and database-schema designers don’t always have the luxury of starting from scratch. In the next chapter, we’ll cover how to integrate with legacy systems, such as database schemas you’ve inherited from the 1980s.
About this comment system
This site is using a contextual comment system to help us gather targeted feedback about the book. Instead of commenting on an entire chapter, you can leave comments on any indivdual "block" in the chapter. A "block" with comments looks like this:
A "block" is a paragraph, list item, code sample, or other small chunk of content. It'll get highlighted when you select it:
To post a comment on a block, just click in the gutter next to the bit you want to comment on:
As we edit the book, we'll review everyone's comments and roll them into a future version of the book. We'll mark reviewed comments with a little checkmark:
Please make sure to leave a full name (and not a nickname or screenname) if you'd like your contributions acknowledged in print.
Many, many thanks to Jack Slocum; the inspiration and much of the code for the comment system comes from Jack's blog, and this site couldn't have been built without his wonderful
YAHOO.ext
library. Thanks also to Yahoo for YUI itself.