parts/django/docs/topics/db/optimization.txt
changeset 69 c6bca38c1cbf
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/parts/django/docs/topics/db/optimization.txt	Sat Jan 08 11:20:57 2011 +0530
@@ -0,0 +1,260 @@
+============================
+Database access optimization
+============================
+
+Django's database layer provides various ways to help developers get the most
+out of their databases. This document gathers together links to the relevant
+documentation, and adds various tips, organized under a number of headings that
+outline the steps to take when attempting to optimize your database usage.
+
+Profile first
+=============
+
+As general programming practice, this goes without saying. Find out :ref:`what
+queries you are doing and what they are costing you
+<faq-see-raw-sql-queries>`. You may also want to use an external project like
+django-debug-toolbar_, or a tool that monitors your database directly.
+
+Remember that you may be optimizing for speed or memory or both, depending on
+your requirements. Sometimes optimizing for one will be detrimental to the
+other, but sometimes they will help each other. Also, work that is done by the
+database process might not have the same cost (to you) as the same amount of
+work done in your Python process. It is up to you to decide what your
+priorities are, where the balance must lie, and profile all of these as required
+since this will depend on your application and server.
+
+With everything that follows, remember to profile after every change to ensure
+that the change is a benefit, and a big enough benefit given the decrease in
+readability of your code. **All** of the suggestions below come with the caveat
+that in your circumstances the general principle might not apply, or might even
+be reversed.
+
+.. _django-debug-toolbar: http://robhudson.github.com/django-debug-toolbar/
+
+Use standard DB optimization techniques
+=======================================
+
+...including:
+
+* Indexes. This is a number one priority, *after* you have determined from
+  profiling what indexes should be added. Use
+  :attr:`django.db.models.Field.db_index` to add these from Django.
+
+* Appropriate use of field types.
+
+We will assume you have done the obvious things above. The rest of this document
+focuses on how to use Django in such a way that you are not doing unnecessary
+work. This document also does not address other optimization techniques that
+apply to all expensive operations, such as :doc:`general purpose caching
+</topics/cache>`.
+
+Understand QuerySets
+====================
+
+Understanding :doc:`QuerySets </ref/models/querysets>` is vital to getting good
+performance with simple code. In particular:
+
+Understand QuerySet evaluation
+------------------------------
+
+To avoid performance problems, it is important to understand:
+
+* that :ref:`QuerySets are lazy <querysets-are-lazy>`.
+
+* when :ref:`they are evaluated <when-querysets-are-evaluated>`.
+
+* how :ref:`the data is held in memory <caching-and-querysets>`.
+
+Understand cached attributes
+----------------------------
+
+As well as caching of the whole ``QuerySet``, there is caching of the result of
+attributes on ORM objects. In general, attributes that are not callable will be
+cached. For example, assuming the :ref:`example Weblog models
+<queryset-model-example>`::
+
+  >>> entry = Entry.objects.get(id=1)
+  >>> entry.blog   # Blog object is retrieved at this point
+  >>> entry.blog   # cached version, no DB access
+
+But in general, callable attributes cause DB lookups every time::
+
+  >>> entry = Entry.objects.get(id=1)
+  >>> entry.authors.all()   # query performed
+  >>> entry.authors.all()   # query performed again
+
+Be careful when reading template code - the template system does not allow use
+of parentheses, but will call callables automatically, hiding the above
+distinction.
+
+Be careful with your own custom properties - it is up to you to implement
+caching.
+
+Use the ``with`` template tag
+-----------------------------
+
+To make use of the caching behaviour of ``QuerySet``, you may need to use the
+:ttag:`with` template tag.
+
+Use ``iterator()``
+------------------
+
+When you have a lot of objects, the caching behaviour of the ``QuerySet`` can
+cause a large amount of memory to be used. In this case,
+:meth:`~django.db.models.QuerySet.iterator()` may help.
+
+Do database work in the database rather than in Python
+======================================================
+
+For instance:
+
+* At the most basic level, use :ref:`filter and exclude <queryset-api>` to do
+  filtering in the database.
+
+* Use :ref:`F() object query expressions <query-expressions>` to do filtering
+  against other fields within the same model.
+
+* Use :doc:`annotate to do aggregation in the database </topics/db/aggregation>`.
+
+If these aren't enough to generate the SQL you need:
+
+Use ``QuerySet.extra()``
+------------------------
+
+A less portable but more powerful method is
+:meth:`~django.db.models.QuerySet.extra()`, which allows some SQL to be
+explicitly added to the query. If that still isn't powerful enough:
+
+Use raw SQL
+-----------
+
+Write your own :doc:`custom SQL to retrieve data or populate models
+</topics/db/sql>`. Use ``django.db.connection.queries`` to find out what Django
+is writing for you and start from there.
+
+Retrieve everything at once if you know you will need it
+========================================================
+
+Hitting the database multiple times for different parts of a single 'set' of
+data that you will need all parts of is, in general, less efficient than
+retrieving it all in one query. This is particularly important if you have a
+query that is executed in a loop, and could therefore end up doing many database
+queries, when only one was needed. So:
+
+Use ``QuerySet.select_related()``
+---------------------------------
+
+Understand :ref:`QuerySet.select_related() <select-related>` thoroughly, and use it:
+
+* in view code,
+
+* and in :doc:`managers and default managers </topics/db/managers>` where
+  appropriate. Be aware when your manager is and is not used; sometimes this is
+  tricky so don't make assumptions.
+
+Don't retrieve things you don't need
+====================================
+
+Use ``QuerySet.values()`` and ``values_list()``
+-----------------------------------------------
+
+When you just want a ``dict`` or ``list`` of values, and don't need ORM model
+objects, make appropriate usage of :meth:`~django.db.models.QuerySet.values()`.
+These can be useful for replacing model objects in template code - as long as
+the dicts you supply have the same attributes as those used in the template,
+you are fine.
+
+Use ``QuerySet.defer()`` and ``only()``
+---------------------------------------
+
+Use :meth:`~django.db.models.QuerySet.defer()` and
+:meth:`~django.db.models.QuerySet.only()` if there are database columns you
+know that you won't need (or won't need in most cases) to avoid loading
+them. Note that if you *do* use them, the ORM will have to go and get them in a
+separate query, making this a pessimization if you use it inappropriately.
+
+Use QuerySet.count()
+--------------------
+
+...if you only want the count, rather than doing ``len(queryset)``.
+
+Use QuerySet.exists()
+---------------------
+
+...if you only want to find out if at least one result exists, rather than ``if
+queryset``.
+
+But:
+
+Don't overuse ``count()`` and ``exists()``
+------------------------------------------
+
+If you are going to need other data from the QuerySet, just evaluate it.
+
+For example, assuming an Email class that has a ``body`` attribute and a
+many-to-many relation to User, the following template code is optimal:
+
+.. code-block:: html+django
+
+   {% if display_inbox %}
+     {% with user.emails.all as emails %}
+       {% if emails %}
+         <p>You have {{ emails|length }} email(s)</p>
+         {% for email in emails %}
+           <p>{{ email.body }}</p>
+         {% endfor %}
+       {% else %}
+         <p>No messages today.</p>
+       {% endif %}
+     {% endwith %}
+   {% endif %}
+
+
+It is optimal because:
+
+ 1. Since QuerySets are lazy, this does no database if 'display_inbox' is False.
+
+ #. Use of ``with`` means that we store ``user.emails.all`` in a variable for
+    later use, allowing its cache to be re-used.
+
+ #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called,
+    which causes the ``user.emails.all()`` query to be run on the database, and
+    at the least the first line to be turned into an ORM object. If there aren't
+    any results, it will return False, otherwise True.
+
+ #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling
+    out the rest of the cache without doing another query.
+
+ #. The ``for`` loop iterates over the already filled cache.
+
+In total, this code does either one or zero database queries. The only
+deliberate optimization performed is the use of the ``with`` tag. Using
+``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause
+additional queries.
+
+Use ``QuerySet.update()`` and ``delete()``
+------------------------------------------
+
+Rather than retrieve a load of objects, set some values, and save them
+individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
+<topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
+<topics-db-queries-delete>` where possible.
+
+Note, however, that these bulk update methods cannot call the ``save()`` or
+``delete()`` methods of individual instances, which means that any custom
+behaviour you have added for these methods will not be executed, including
+anything driven from the normal database object :doc:`signals </ref/signals>`.
+
+Use foreign key values directly
+-------------------------------
+
+If you only need a foreign key value, use the foreign key value that is already on
+the object you've got, rather than getting the whole related object and taking
+its primary key. i.e. do::
+
+   entry.blog_id
+
+instead of::
+
+   entry.blog.id
+