diff -r 261778de26ff -r 620f9b141567 thirdparty/google_appengine/lib/django/docs/cache.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/thirdparty/google_appengine/lib/django/docs/cache.txt Tue Aug 26 21:49:54 2008 +0000 @@ -0,0 +1,543 @@ +======================== +Django's cache framework +======================== + +A fundamental tradeoff in dynamic Web sites is, well, they're dynamic. Each +time a user requests a page, the Web server makes all sorts of calculations -- +from database queries to template rendering to business logic -- to create the +page that your site's visitor sees. This is a lot more expensive, from a +processing-overhead perspective, than your standard read-a-file-off-the-filesystem +server arrangement. + +For most Web applications, this overhead isn't a big deal. Most Web +applications aren't washingtonpost.com or slashdot.org; they're simply small- +to medium-sized sites with so-so traffic. But for medium- to high-traffic +sites, it's essential to cut as much overhead as possible. + +That's where caching comes in. + +To cache something is to save the result of an expensive calculation so that +you don't have to perform the calculation next time. Here's some pseudocode +explaining how this would work for a dynamically generated Web page:: + + given a URL, try finding that page in the cache + if the page is in the cache: + return the cached page + else: + generate the page + save the generated page in the cache (for next time) + return the generated page + +Django comes with a robust cache system that lets you save dynamic pages so +they don't have to be calculated for each request. For convenience, Django +offers different levels of cache granularity: You can cache the output of +specific views, you can cache only the pieces that are difficult to produce, or +you can cache your entire site. + +Django also works well with "upstream" caches, such as Squid +(http://www.squid-cache.org/) and browser-based caches. These are the types of +caches that you don't directly control but to which you can provide hints (via +HTTP headers) about which parts of your site should be cached, and how. + +Setting up the cache +==================== + +The cache system requires a small amount of setup. Namely, you have to tell it +where your cached data should live -- whether in a database, on the filesystem +or directly in memory. This is an important decision that affects your cache's +performance; yes, some cache types are faster than others. + +Your cache preference goes in the ``CACHE_BACKEND`` setting in your settings +file. Here's an explanation of all available values for CACHE_BACKEND. + +Memcached +--------- + +By far the fastest, most efficient type of cache available to Django, Memcached +is an entirely memory-based cache framework originally developed to handle high +loads at LiveJournal.com and subsequently open-sourced by Danga Interactive. +It's used by sites such as Slashdot and Wikipedia to reduce database access and +dramatically increase site performance. + +Memcached is available for free at http://danga.com/memcached/ . It runs as a +daemon and is allotted a specified amount of RAM. All it does is provide an +interface -- a *super-lightning-fast* interface -- for adding, retrieving and +deleting arbitrary data in the cache. All data is stored directly in memory, +so there's no overhead of database or filesystem usage. + +After installing Memcached itself, you'll need to install the Memcached Python +bindings. They're in a single Python module, memcache.py, available at +ftp://ftp.tummy.com/pub/python-memcached/ . If that URL is no longer valid, +just go to the Memcached Web site (http://www.danga.com/memcached/) and get the +Python bindings from the "Client APIs" section. + +To use Memcached with Django, set ``CACHE_BACKEND`` to +``memcached://ip:port/``, where ``ip`` is the IP address of the Memcached +daemon and ``port`` is the port on which Memcached is running. + +In this example, Memcached is running on localhost (127.0.0.1) port 11211:: + + CACHE_BACKEND = 'memcached://127.0.0.1:11211/' + +One excellent feature of Memcached is its ability to share cache over multiple +servers. To take advantage of this feature, include all server addresses in +``CACHE_BACKEND``, separated by semicolons. In this example, the cache is +shared over Memcached instances running on IP address 172.19.26.240 and +172.19.26.242, both on port 11211:: + + CACHE_BACKEND = 'memcached://172.19.26.240:11211;172.19.26.242:11211/' + +Memory-based caching has one disadvantage: Because the cached data is stored in +memory, the data will be lost if your server crashes. Clearly, memory isn't +intended for permanent data storage, so don't rely on memory-based caching as +your only data storage. Actually, none of the Django caching backends should be +used for permanent storage -- they're all intended to be solutions for caching, +not storage -- but we point this out here because memory-based caching is +particularly temporary. + +Database caching +---------------- + +To use a database table as your cache backend, first create a cache table in +your database by running this command:: + + python manage.py createcachetable [cache_table_name] + +...where ``[cache_table_name]`` is the name of the database table to create. +(This name can be whatever you want, as long as it's a valid table name that's +not already being used in your database.) This command creates a single table +in your database that is in the proper format that Django's database-cache +system expects. + +Once you've created that database table, set your ``CACHE_BACKEND`` setting to +``"db://tablename/"``, where ``tablename`` is the name of the database table. +In this example, the cache table's name is ``my_cache_table``: + + CACHE_BACKEND = 'db://my_cache_table' + +Database caching works best if you've got a fast, well-indexed database server. + +Filesystem caching +------------------ + +To store cached items on a filesystem, use the ``"file://"`` cache type for +``CACHE_BACKEND``. For example, to store cached data in ``/var/tmp/django_cache``, +use this setting:: + + CACHE_BACKEND = 'file:///var/tmp/django_cache' + +Note that there are three forward slashes toward the beginning of that example. +The first two are for ``file://``, and the third is the first character of the +directory path, ``/var/tmp/django_cache``. + +The directory path should be absolute -- that is, it should start at the root +of your filesystem. It doesn't matter whether you put a slash at the end of the +setting. + +Make sure the directory pointed-to by this setting exists and is readable and +writable by the system user under which your Web server runs. Continuing the +above example, if your server runs as the user ``apache``, make sure the +directory ``/var/tmp/django_cache`` exists and is readable and writable by the +user ``apache``. + +Local-memory caching +-------------------- + +If you want the speed advantages of in-memory caching but don't have the +capability of running Memcached, consider the local-memory cache backend. This +cache is multi-process and thread-safe. To use it, set ``CACHE_BACKEND`` to +``"locmem:///"``. For example:: + + CACHE_BACKEND = 'locmem:///' + +Simple caching (for development) +-------------------------------- + +A simple, single-process memory cache is available as ``"simple:///"``. This +merely saves cached data in-process, which means it should only be used in +development or testing environments. For example:: + + CACHE_BACKEND = 'simple:///' + +Dummy caching (for development) +------------------------------- + +Finally, Django comes with a "dummy" cache that doesn't actually cache -- it +just implements the cache interface without doing anything. + +This is useful if you have a production site that uses heavy-duty caching in +various places but a development/test environment on which you don't want to +cache. In that case, set ``CACHE_BACKEND`` to ``"dummy:///"`` in the settings +file for your development environment. As a result, your development +environment won't use caching and your production environment still will. + +CACHE_BACKEND arguments +----------------------- + +All caches may take arguments. They're given in query-string style on the +``CACHE_BACKEND`` setting. Valid arguments are: + + timeout + Default timeout, in seconds, to use for the cache. Defaults to 5 + minutes (300 seconds). + + max_entries + For the simple and database backends, the maximum number of entries + allowed in the cache before it is cleaned. Defaults to 300. + + cull_percentage + The percentage of entries that are culled when max_entries is reached. + The actual percentage is 1/cull_percentage, so set cull_percentage=3 to + cull 1/3 of the entries when max_entries is reached. + + A value of 0 for cull_percentage means that the entire cache will be + dumped when max_entries is reached. This makes culling *much* faster + at the expense of more cache misses. + +In this example, ``timeout`` is set to ``60``:: + + CACHE_BACKEND = "memcached://127.0.0.1:11211/?timeout=60" + +In this example, ``timeout`` is ``30`` and ``max_entries`` is ``400``:: + + CACHE_BACKEND = "memcached://127.0.0.1:11211/?timeout=30&max_entries=400" + +Invalid arguments are silently ignored, as are invalid values of known +arguments. + +The per-site cache +================== + +Once the cache is set up, the simplest way to use caching is to cache your +entire site. Just add ``'django.middleware.cache.CacheMiddleware'`` to your +``MIDDLEWARE_CLASSES`` setting, as in this example:: + + MIDDLEWARE_CLASSES = ( + 'django.middleware.cache.CacheMiddleware', + 'django.middleware.common.CommonMiddleware', + ) + +(The order of ``MIDDLEWARE_CLASSES`` matters. See "Order of MIDDLEWARE_CLASSES" +below.) + +Then, add the following required settings to your Django settings file: + +* ``CACHE_MIDDLEWARE_SECONDS`` -- The number of seconds each page should be + cached. +* ``CACHE_MIDDLEWARE_KEY_PREFIX`` -- If the cache is shared across multiple + sites using the same Django installation, set this to the name of the site, + or some other string that is unique to this Django instance, to prevent key + collisions. Use an empty string if you don't care. + +The cache middleware caches every page that doesn't have GET or POST +parameters. Optionally, if the ``CACHE_MIDDLEWARE_ANONYMOUS_ONLY`` setting is +``True``, only anonymous requests (i.e., not those made by a logged-in user) +will be cached. This is a simple and effective way of disabling caching for any +user-specific pages (include Django's admin interface). Note that if you use +``CACHE_MIDDLEWARE_ANONYMOUS_ONLY``, you should make sure you've activated +``AuthenticationMiddleware`` and that ``AuthenticationMiddleware`` appears +before ``CacheMiddleware`` in your ``MIDDLEWARE_CLASSES``. + +Additionally, ``CacheMiddleware`` automatically sets a few headers in each +``HttpResponse``: + +* Sets the ``Last-Modified`` header to the current date/time when a fresh + (uncached) version of the page is requested. +* Sets the ``Expires`` header to the current date/time plus the defined + ``CACHE_MIDDLEWARE_SECONDS``. +* Sets the ``Cache-Control`` header to give a max age for the page -- again, + from the ``CACHE_MIDDLEWARE_SECONDS`` setting. + +See the `middleware documentation`_ for more on middleware. + +.. _`middleware documentation`: ../middleware/ + +The per-view cache +================== + +A more granular way to use the caching framework is by caching the output of +individual views. ``django.views.decorators.cache`` defines a ``cache_page`` +decorator that will automatically cache the view's response for you. It's easy +to use:: + + from django.views.decorators.cache import cache_page + + def slashdot_this(request): + ... + + slashdot_this = cache_page(slashdot_this, 60 * 15) + +Or, using Python 2.4's decorator syntax:: + + @cache_page(60 * 15) + def slashdot_this(request): + ... + +``cache_page`` takes a single argument: the cache timeout, in seconds. In the +above example, the result of the ``slashdot_this()`` view will be cached for 15 +minutes. + +The low-level cache API +======================= + +Sometimes, however, caching an entire rendered page doesn't gain you very much. +For example, you may find it's only necessary to cache the result of an +intensive database query. In cases like this, you can use the low-level cache +API to store objects in the cache with any level of granularity you like. + +The cache API is simple. The cache module, ``django.core.cache``, exports a +``cache`` object that's automatically created from the ``CACHE_BACKEND`` +setting:: + + >>> from django.core.cache import cache + +The basic interface is ``set(key, value, timeout_seconds)`` and ``get(key)``:: + + >>> cache.set('my_key', 'hello, world!', 30) + >>> cache.get('my_key') + 'hello, world!' + +The ``timeout_seconds`` argument is optional and defaults to the ``timeout`` +argument in the ``CACHE_BACKEND`` setting (explained above). + +If the object doesn't exist in the cache, ``cache.get()`` returns ``None``:: + + >>> cache.get('some_other_key') + None + + # Wait 30 seconds for 'my_key' to expire... + + >>> cache.get('my_key') + None + +get() can take a ``default`` argument:: + + >>> cache.get('my_key', 'has expired') + 'has expired' + +There's also a get_many() interface that only hits the cache once. get_many() +returns a dictionary with all the keys you asked for that actually exist in the +cache (and haven't expired):: + + >>> cache.set('a', 1) + >>> cache.set('b', 2) + >>> cache.set('c', 3) + >>> cache.get_many(['a', 'b', 'c']) + {'a': 1, 'b': 2, 'c': 3} + +Finally, you can delete keys explicitly with ``delete()``. This is an easy way +of clearing the cache for a particular object:: + + >>> cache.delete('a') + +That's it. The cache has very few restrictions: You can cache any object that +can be pickled safely, although keys must be strings. + +Upstream caches +=============== + +So far, this document has focused on caching your *own* data. But another type +of caching is relevant to Web development, too: caching performed by "upstream" +caches. These are systems that cache pages for users even before the request +reaches your Web site. + +Here are a few examples of upstream caches: + + * Your ISP may cache certain pages, so if you requested a page from + somedomain.com, your ISP would send you the page without having to access + somedomain.com directly. + + * Your Django Web site may sit behind a Squid Web proxy + (http://www.squid-cache.org/) that caches pages for performance. In this + case, each request first would be handled by Squid, and it'd only be + passed to your application if needed. + + * Your Web browser caches pages, too. If a Web page sends out the right + headers, your browser will use the local (cached) copy for subsequent + requests to that page. + +Upstream caching is a nice efficiency boost, but there's a danger to it: +Many Web pages' contents differ based on authentication and a host of other +variables, and cache systems that blindly save pages based purely on URLs could +expose incorrect or sensitive data to subsequent visitors to those pages. + +For example, say you operate a Web e-mail system, and the contents of the +"inbox" page obviously depend on which user is logged in. If an ISP blindly +cached your site, then the first user who logged in through that ISP would have +his user-specific inbox page cached for subsequent visitors to the site. That's +not cool. + +Fortunately, HTTP provides a solution to this problem: A set of HTTP headers +exist to instruct caching mechanisms to differ their cache contents depending +on designated variables, and to tell caching mechanisms not to cache particular +pages. + +Using Vary headers +================== + +One of these headers is ``Vary``. It defines which request headers a cache +mechanism should take into account when building its cache key. For example, if +the contents of a Web page depend on a user's language preference, the page is +said to "vary on language." + +By default, Django's cache system creates its cache keys using the requested +path -- e.g., ``"/stories/2005/jun/23/bank_robbed/"``. This means every request +to that URL will use the same cached version, regardless of user-agent +differences such as cookies or language preferences. + +That's where ``Vary`` comes in. + +If your Django-powered page outputs different content based on some difference +in request headers -- such as a cookie, or language, or user-agent -- you'll +need to use the ``Vary`` header to tell caching mechanisms that the page output +depends on those things. + +To do this in Django, use the convenient ``vary_on_headers`` view decorator, +like so:: + + from django.views.decorators.vary import vary_on_headers + + # Python 2.3 syntax. + def my_view(request): + ... + my_view = vary_on_headers(my_view, 'User-Agent') + + # Python 2.4 decorator syntax. + @vary_on_headers('User-Agent') + def my_view(request): + ... + +In this case, a caching mechanism (such as Django's own cache middleware) will +cache a separate version of the page for each unique user-agent. + +The advantage to using the ``vary_on_headers`` decorator rather than manually +setting the ``Vary`` header (using something like +``response['Vary'] = 'user-agent'``) is that the decorator adds to the ``Vary`` +header (which may already exist) rather than setting it from scratch. + +You can pass multiple headers to ``vary_on_headers()``:: + + @vary_on_headers('User-Agent', 'Cookie') + def my_view(request): + ... + +Because varying on cookie is such a common case, there's a ``vary_on_cookie`` +decorator. These two views are equivalent:: + + @vary_on_cookie + def my_view(request): + ... + + @vary_on_headers('Cookie') + def my_view(request): + ... + +Also note that the headers you pass to ``vary_on_headers`` are not case +sensitive. ``"User-Agent"`` is the same thing as ``"user-agent"``. + +You can also use a helper function, ``django.utils.cache.patch_vary_headers``, +directly:: + + from django.utils.cache import patch_vary_headers + def my_view(request): + ... + response = render_to_response('template_name', context) + patch_vary_headers(response, ['Cookie']) + return response + +``patch_vary_headers`` takes an ``HttpResponse`` instance as its first argument +and a list/tuple of header names as its second argument. + +For more on Vary headers, see the `official Vary spec`_. + +.. _`official Vary spec`: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44 + +Controlling cache: Using other headers +====================================== + +Another problem with caching is the privacy of data and the question of where +data should be stored in a cascade of caches. + +A user usually faces two kinds of caches: his own browser cache (a private +cache) and his provider's cache (a public cache). A public cache is used by +multiple users and controlled by someone else. This poses problems with +sensitive data: You don't want, say, your banking-account number stored in a +public cache. So Web applications need a way to tell caches which data is +private and which is public. + +The solution is to indicate a page's cache should be "private." To do this in +Django, use the ``cache_control`` view decorator. Example:: + + from django.views.decorators.cache import cache_control + @cache_control(private=True) + def my_view(request): + ... + +This decorator takes care of sending out the appropriate HTTP header behind the +scenes. + +There are a few other ways to control cache parameters. For example, HTTP +allows applications to do the following: + + * Define the maximum time a page should be cached. + * Specify whether a cache should always check for newer versions, only + delivering the cached content when there are no changes. (Some caches + might deliver cached content even if the server page changed -- simply + because the cache copy isn't yet expired.) + +In Django, use the ``cache_control`` view decorator to specify these cache +parameters. In this example, ``cache_control`` tells caches to revalidate the +cache on every access and to store cached versions for, at most, 3600 seconds:: + + from django.views.decorators.cache import cache_control + @cache_control(must_revalidate=True, max_age=3600) + def my_view(request): + ... + +Any valid ``Cache-Control`` HTTP directive is valid in ``cache_control()``. +Here's a full list: + + * ``public=True`` + * ``private=True`` + * ``no_cache=True`` + * ``no_transform=True`` + * ``must_revalidate=True`` + * ``proxy_revalidate=True`` + * ``max_age=num_seconds`` + * ``s_maxage=num_seconds`` + +For explanation of Cache-Control HTTP directives, see the `Cache-Control spec`_. + +(Note that the caching middleware already sets the cache header's max-age with +the value of the ``CACHE_MIDDLEWARE_SETTINGS`` setting. If you use a custom +``max_age`` in a ``cache_control`` decorator, the decorator will take +precedence, and the header values will be merged correctly.) + +.. _`Cache-Control spec`: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9 + +Other optimizations +=================== + +Django comes with a few other pieces of middleware that can help optimize your +apps' performance: + + * ``django.middleware.http.ConditionalGetMiddleware`` adds support for + conditional GET. This makes use of ``ETag`` and ``Last-Modified`` + headers. + + * ``django.middleware.gzip.GZipMiddleware`` compresses content for browsers + that understand gzip compression (all modern browsers). + +Order of MIDDLEWARE_CLASSES +=========================== + +If you use ``CacheMiddleware``, it's important to put it in the right place +within the ``MIDDLEWARE_CLASSES`` setting, because the cache middleware needs +to know which headers by which to vary the cache storage. Middleware always +adds something the ``Vary`` response header when it can. + +Put the ``CacheMiddleware`` after any middlewares that might add something to +the ``Vary`` header. The following middlewares do so: + + * ``SessionMiddleware`` adds ``Cookie`` + * ``GZipMiddleware`` adds ``Accept-Encoding``