diff -r 5ff1fc726848 -r c6bca38c1cbf parts/django/docs/topics/db/optimization.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/parts/django/docs/topics/db/optimization.txt Sat Jan 08 11:20:57 2011 +0530 @@ -0,0 +1,260 @@ +============================ +Database access optimization +============================ + +Django's database layer provides various ways to help developers get the most +out of their databases. This document gathers together links to the relevant +documentation, and adds various tips, organized under a number of headings that +outline the steps to take when attempting to optimize your database usage. + +Profile first +============= + +As general programming practice, this goes without saying. Find out :ref:`what +queries you are doing and what they are costing you +`. You may also want to use an external project like +django-debug-toolbar_, or a tool that monitors your database directly. + +Remember that you may be optimizing for speed or memory or both, depending on +your requirements. Sometimes optimizing for one will be detrimental to the +other, but sometimes they will help each other. Also, work that is done by the +database process might not have the same cost (to you) as the same amount of +work done in your Python process. It is up to you to decide what your +priorities are, where the balance must lie, and profile all of these as required +since this will depend on your application and server. + +With everything that follows, remember to profile after every change to ensure +that the change is a benefit, and a big enough benefit given the decrease in +readability of your code. **All** of the suggestions below come with the caveat +that in your circumstances the general principle might not apply, or might even +be reversed. + +.. _django-debug-toolbar: http://robhudson.github.com/django-debug-toolbar/ + +Use standard DB optimization techniques +======================================= + +...including: + +* Indexes. This is a number one priority, *after* you have determined from + profiling what indexes should be added. Use + :attr:`django.db.models.Field.db_index` to add these from Django. + +* Appropriate use of field types. + +We will assume you have done the obvious things above. The rest of this document +focuses on how to use Django in such a way that you are not doing unnecessary +work. This document also does not address other optimization techniques that +apply to all expensive operations, such as :doc:`general purpose caching +`. + +Understand QuerySets +==================== + +Understanding :doc:`QuerySets ` is vital to getting good +performance with simple code. In particular: + +Understand QuerySet evaluation +------------------------------ + +To avoid performance problems, it is important to understand: + +* that :ref:`QuerySets are lazy `. + +* when :ref:`they are evaluated `. + +* how :ref:`the data is held in memory `. + +Understand cached attributes +---------------------------- + +As well as caching of the whole ``QuerySet``, there is caching of the result of +attributes on ORM objects. In general, attributes that are not callable will be +cached. For example, assuming the :ref:`example Weblog models +`:: + + >>> entry = Entry.objects.get(id=1) + >>> entry.blog # Blog object is retrieved at this point + >>> entry.blog # cached version, no DB access + +But in general, callable attributes cause DB lookups every time:: + + >>> entry = Entry.objects.get(id=1) + >>> entry.authors.all() # query performed + >>> entry.authors.all() # query performed again + +Be careful when reading template code - the template system does not allow use +of parentheses, but will call callables automatically, hiding the above +distinction. + +Be careful with your own custom properties - it is up to you to implement +caching. + +Use the ``with`` template tag +----------------------------- + +To make use of the caching behaviour of ``QuerySet``, you may need to use the +:ttag:`with` template tag. + +Use ``iterator()`` +------------------ + +When you have a lot of objects, the caching behaviour of the ``QuerySet`` can +cause a large amount of memory to be used. In this case, +:meth:`~django.db.models.QuerySet.iterator()` may help. + +Do database work in the database rather than in Python +====================================================== + +For instance: + +* At the most basic level, use :ref:`filter and exclude ` to do + filtering in the database. + +* Use :ref:`F() object query expressions ` to do filtering + against other fields within the same model. + +* Use :doc:`annotate to do aggregation in the database `. + +If these aren't enough to generate the SQL you need: + +Use ``QuerySet.extra()`` +------------------------ + +A less portable but more powerful method is +:meth:`~django.db.models.QuerySet.extra()`, which allows some SQL to be +explicitly added to the query. If that still isn't powerful enough: + +Use raw SQL +----------- + +Write your own :doc:`custom SQL to retrieve data or populate models +`. Use ``django.db.connection.queries`` to find out what Django +is writing for you and start from there. + +Retrieve everything at once if you know you will need it +======================================================== + +Hitting the database multiple times for different parts of a single 'set' of +data that you will need all parts of is, in general, less efficient than +retrieving it all in one query. This is particularly important if you have a +query that is executed in a loop, and could therefore end up doing many database +queries, when only one was needed. So: + +Use ``QuerySet.select_related()`` +--------------------------------- + +Understand :ref:`QuerySet.select_related() ` thoroughly, and use it: + +* in view code, + +* and in :doc:`managers and default managers ` where + appropriate. Be aware when your manager is and is not used; sometimes this is + tricky so don't make assumptions. + +Don't retrieve things you don't need +==================================== + +Use ``QuerySet.values()`` and ``values_list()`` +----------------------------------------------- + +When you just want a ``dict`` or ``list`` of values, and don't need ORM model +objects, make appropriate usage of :meth:`~django.db.models.QuerySet.values()`. +These can be useful for replacing model objects in template code - as long as +the dicts you supply have the same attributes as those used in the template, +you are fine. + +Use ``QuerySet.defer()`` and ``only()`` +--------------------------------------- + +Use :meth:`~django.db.models.QuerySet.defer()` and +:meth:`~django.db.models.QuerySet.only()` if there are database columns you +know that you won't need (or won't need in most cases) to avoid loading +them. Note that if you *do* use them, the ORM will have to go and get them in a +separate query, making this a pessimization if you use it inappropriately. + +Use QuerySet.count() +-------------------- + +...if you only want the count, rather than doing ``len(queryset)``. + +Use QuerySet.exists() +--------------------- + +...if you only want to find out if at least one result exists, rather than ``if +queryset``. + +But: + +Don't overuse ``count()`` and ``exists()`` +------------------------------------------ + +If you are going to need other data from the QuerySet, just evaluate it. + +For example, assuming an Email class that has a ``body`` attribute and a +many-to-many relation to User, the following template code is optimal: + +.. code-block:: html+django + + {% if display_inbox %} + {% with user.emails.all as emails %} + {% if emails %} +

You have {{ emails|length }} email(s)

+ {% for email in emails %} +

{{ email.body }}

+ {% endfor %} + {% else %} +

No messages today.

+ {% endif %} + {% endwith %} + {% endif %} + + +It is optimal because: + + 1. Since QuerySets are lazy, this does no database if 'display_inbox' is False. + + #. Use of ``with`` means that we store ``user.emails.all`` in a variable for + later use, allowing its cache to be re-used. + + #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called, + which causes the ``user.emails.all()`` query to be run on the database, and + at the least the first line to be turned into an ORM object. If there aren't + any results, it will return False, otherwise True. + + #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling + out the rest of the cache without doing another query. + + #. The ``for`` loop iterates over the already filled cache. + +In total, this code does either one or zero database queries. The only +deliberate optimization performed is the use of the ``with`` tag. Using +``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause +additional queries. + +Use ``QuerySet.update()`` and ``delete()`` +------------------------------------------ + +Rather than retrieve a load of objects, set some values, and save them +individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update() +`. Similarly, do :ref:`bulk deletes +` where possible. + +Note, however, that these bulk update methods cannot call the ``save()`` or +``delete()`` methods of individual instances, which means that any custom +behaviour you have added for these methods will not be executed, including +anything driven from the normal database object :doc:`signals `. + +Use foreign key values directly +------------------------------- + +If you only need a foreign key value, use the foreign key value that is already on +the object you've got, rather than getting the whole related object and taking +its primary key. i.e. do:: + + entry.blog_id + +instead of:: + + entry.blog.id +