|
1 ============================ |
|
2 Database access optimization |
|
3 ============================ |
|
4 |
|
5 Django's database layer provides various ways to help developers get the most |
|
6 out of their databases. This document gathers together links to the relevant |
|
7 documentation, and adds various tips, organized under a number of headings that |
|
8 outline the steps to take when attempting to optimize your database usage. |
|
9 |
|
10 Profile first |
|
11 ============= |
|
12 |
|
13 As general programming practice, this goes without saying. Find out :ref:`what |
|
14 queries you are doing and what they are costing you |
|
15 <faq-see-raw-sql-queries>`. You may also want to use an external project like |
|
16 django-debug-toolbar_, or a tool that monitors your database directly. |
|
17 |
|
18 Remember that you may be optimizing for speed or memory or both, depending on |
|
19 your requirements. Sometimes optimizing for one will be detrimental to the |
|
20 other, but sometimes they will help each other. Also, work that is done by the |
|
21 database process might not have the same cost (to you) as the same amount of |
|
22 work done in your Python process. It is up to you to decide what your |
|
23 priorities are, where the balance must lie, and profile all of these as required |
|
24 since this will depend on your application and server. |
|
25 |
|
26 With everything that follows, remember to profile after every change to ensure |
|
27 that the change is a benefit, and a big enough benefit given the decrease in |
|
28 readability of your code. **All** of the suggestions below come with the caveat |
|
29 that in your circumstances the general principle might not apply, or might even |
|
30 be reversed. |
|
31 |
|
32 .. _django-debug-toolbar: http://robhudson.github.com/django-debug-toolbar/ |
|
33 |
|
34 Use standard DB optimization techniques |
|
35 ======================================= |
|
36 |
|
37 ...including: |
|
38 |
|
39 * Indexes. This is a number one priority, *after* you have determined from |
|
40 profiling what indexes should be added. Use |
|
41 :attr:`django.db.models.Field.db_index` to add these from Django. |
|
42 |
|
43 * Appropriate use of field types. |
|
44 |
|
45 We will assume you have done the obvious things above. The rest of this document |
|
46 focuses on how to use Django in such a way that you are not doing unnecessary |
|
47 work. This document also does not address other optimization techniques that |
|
48 apply to all expensive operations, such as :doc:`general purpose caching |
|
49 </topics/cache>`. |
|
50 |
|
51 Understand QuerySets |
|
52 ==================== |
|
53 |
|
54 Understanding :doc:`QuerySets </ref/models/querysets>` is vital to getting good |
|
55 performance with simple code. In particular: |
|
56 |
|
57 Understand QuerySet evaluation |
|
58 ------------------------------ |
|
59 |
|
60 To avoid performance problems, it is important to understand: |
|
61 |
|
62 * that :ref:`QuerySets are lazy <querysets-are-lazy>`. |
|
63 |
|
64 * when :ref:`they are evaluated <when-querysets-are-evaluated>`. |
|
65 |
|
66 * how :ref:`the data is held in memory <caching-and-querysets>`. |
|
67 |
|
68 Understand cached attributes |
|
69 ---------------------------- |
|
70 |
|
71 As well as caching of the whole ``QuerySet``, there is caching of the result of |
|
72 attributes on ORM objects. In general, attributes that are not callable will be |
|
73 cached. For example, assuming the :ref:`example Weblog models |
|
74 <queryset-model-example>`:: |
|
75 |
|
76 >>> entry = Entry.objects.get(id=1) |
|
77 >>> entry.blog # Blog object is retrieved at this point |
|
78 >>> entry.blog # cached version, no DB access |
|
79 |
|
80 But in general, callable attributes cause DB lookups every time:: |
|
81 |
|
82 >>> entry = Entry.objects.get(id=1) |
|
83 >>> entry.authors.all() # query performed |
|
84 >>> entry.authors.all() # query performed again |
|
85 |
|
86 Be careful when reading template code - the template system does not allow use |
|
87 of parentheses, but will call callables automatically, hiding the above |
|
88 distinction. |
|
89 |
|
90 Be careful with your own custom properties - it is up to you to implement |
|
91 caching. |
|
92 |
|
93 Use the ``with`` template tag |
|
94 ----------------------------- |
|
95 |
|
96 To make use of the caching behaviour of ``QuerySet``, you may need to use the |
|
97 :ttag:`with` template tag. |
|
98 |
|
99 Use ``iterator()`` |
|
100 ------------------ |
|
101 |
|
102 When you have a lot of objects, the caching behaviour of the ``QuerySet`` can |
|
103 cause a large amount of memory to be used. In this case, |
|
104 :meth:`~django.db.models.QuerySet.iterator()` may help. |
|
105 |
|
106 Do database work in the database rather than in Python |
|
107 ====================================================== |
|
108 |
|
109 For instance: |
|
110 |
|
111 * At the most basic level, use :ref:`filter and exclude <queryset-api>` to do |
|
112 filtering in the database. |
|
113 |
|
114 * Use :ref:`F() object query expressions <query-expressions>` to do filtering |
|
115 against other fields within the same model. |
|
116 |
|
117 * Use :doc:`annotate to do aggregation in the database </topics/db/aggregation>`. |
|
118 |
|
119 If these aren't enough to generate the SQL you need: |
|
120 |
|
121 Use ``QuerySet.extra()`` |
|
122 ------------------------ |
|
123 |
|
124 A less portable but more powerful method is |
|
125 :meth:`~django.db.models.QuerySet.extra()`, which allows some SQL to be |
|
126 explicitly added to the query. If that still isn't powerful enough: |
|
127 |
|
128 Use raw SQL |
|
129 ----------- |
|
130 |
|
131 Write your own :doc:`custom SQL to retrieve data or populate models |
|
132 </topics/db/sql>`. Use ``django.db.connection.queries`` to find out what Django |
|
133 is writing for you and start from there. |
|
134 |
|
135 Retrieve everything at once if you know you will need it |
|
136 ======================================================== |
|
137 |
|
138 Hitting the database multiple times for different parts of a single 'set' of |
|
139 data that you will need all parts of is, in general, less efficient than |
|
140 retrieving it all in one query. This is particularly important if you have a |
|
141 query that is executed in a loop, and could therefore end up doing many database |
|
142 queries, when only one was needed. So: |
|
143 |
|
144 Use ``QuerySet.select_related()`` |
|
145 --------------------------------- |
|
146 |
|
147 Understand :ref:`QuerySet.select_related() <select-related>` thoroughly, and use it: |
|
148 |
|
149 * in view code, |
|
150 |
|
151 * and in :doc:`managers and default managers </topics/db/managers>` where |
|
152 appropriate. Be aware when your manager is and is not used; sometimes this is |
|
153 tricky so don't make assumptions. |
|
154 |
|
155 Don't retrieve things you don't need |
|
156 ==================================== |
|
157 |
|
158 Use ``QuerySet.values()`` and ``values_list()`` |
|
159 ----------------------------------------------- |
|
160 |
|
161 When you just want a ``dict`` or ``list`` of values, and don't need ORM model |
|
162 objects, make appropriate usage of :meth:`~django.db.models.QuerySet.values()`. |
|
163 These can be useful for replacing model objects in template code - as long as |
|
164 the dicts you supply have the same attributes as those used in the template, |
|
165 you are fine. |
|
166 |
|
167 Use ``QuerySet.defer()`` and ``only()`` |
|
168 --------------------------------------- |
|
169 |
|
170 Use :meth:`~django.db.models.QuerySet.defer()` and |
|
171 :meth:`~django.db.models.QuerySet.only()` if there are database columns you |
|
172 know that you won't need (or won't need in most cases) to avoid loading |
|
173 them. Note that if you *do* use them, the ORM will have to go and get them in a |
|
174 separate query, making this a pessimization if you use it inappropriately. |
|
175 |
|
176 Use QuerySet.count() |
|
177 -------------------- |
|
178 |
|
179 ...if you only want the count, rather than doing ``len(queryset)``. |
|
180 |
|
181 Use QuerySet.exists() |
|
182 --------------------- |
|
183 |
|
184 ...if you only want to find out if at least one result exists, rather than ``if |
|
185 queryset``. |
|
186 |
|
187 But: |
|
188 |
|
189 Don't overuse ``count()`` and ``exists()`` |
|
190 ------------------------------------------ |
|
191 |
|
192 If you are going to need other data from the QuerySet, just evaluate it. |
|
193 |
|
194 For example, assuming an Email class that has a ``body`` attribute and a |
|
195 many-to-many relation to User, the following template code is optimal: |
|
196 |
|
197 .. code-block:: html+django |
|
198 |
|
199 {% if display_inbox %} |
|
200 {% with user.emails.all as emails %} |
|
201 {% if emails %} |
|
202 <p>You have {{ emails|length }} email(s)</p> |
|
203 {% for email in emails %} |
|
204 <p>{{ email.body }}</p> |
|
205 {% endfor %} |
|
206 {% else %} |
|
207 <p>No messages today.</p> |
|
208 {% endif %} |
|
209 {% endwith %} |
|
210 {% endif %} |
|
211 |
|
212 |
|
213 It is optimal because: |
|
214 |
|
215 1. Since QuerySets are lazy, this does no database if 'display_inbox' is False. |
|
216 |
|
217 #. Use of ``with`` means that we store ``user.emails.all`` in a variable for |
|
218 later use, allowing its cache to be re-used. |
|
219 |
|
220 #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called, |
|
221 which causes the ``user.emails.all()`` query to be run on the database, and |
|
222 at the least the first line to be turned into an ORM object. If there aren't |
|
223 any results, it will return False, otherwise True. |
|
224 |
|
225 #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling |
|
226 out the rest of the cache without doing another query. |
|
227 |
|
228 #. The ``for`` loop iterates over the already filled cache. |
|
229 |
|
230 In total, this code does either one or zero database queries. The only |
|
231 deliberate optimization performed is the use of the ``with`` tag. Using |
|
232 ``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause |
|
233 additional queries. |
|
234 |
|
235 Use ``QuerySet.update()`` and ``delete()`` |
|
236 ------------------------------------------ |
|
237 |
|
238 Rather than retrieve a load of objects, set some values, and save them |
|
239 individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update() |
|
240 <topics-db-queries-update>`. Similarly, do :ref:`bulk deletes |
|
241 <topics-db-queries-delete>` where possible. |
|
242 |
|
243 Note, however, that these bulk update methods cannot call the ``save()`` or |
|
244 ``delete()`` methods of individual instances, which means that any custom |
|
245 behaviour you have added for these methods will not be executed, including |
|
246 anything driven from the normal database object :doc:`signals </ref/signals>`. |
|
247 |
|
248 Use foreign key values directly |
|
249 ------------------------------- |
|
250 |
|
251 If you only need a foreign key value, use the foreign key value that is already on |
|
252 the object you've got, rather than getting the whole related object and taking |
|
253 its primary key. i.e. do:: |
|
254 |
|
255 entry.blog_id |
|
256 |
|
257 instead of:: |
|
258 |
|
259 entry.blog.id |
|
260 |