|
1 =========== |
|
2 Aggregation |
|
3 =========== |
|
4 |
|
5 .. versionadded:: 1.1 |
|
6 |
|
7 .. currentmodule:: django.db.models |
|
8 |
|
9 The topic guide on :doc:`Django's database-abstraction API </topics/db/queries>` |
|
10 described the way that you can use Django queries that create, |
|
11 retrieve, update and delete individual objects. However, sometimes you will |
|
12 need to retrieve values that are derived by summarizing or *aggregating* a |
|
13 collection of objects. This topic guide describes the ways that aggregate values |
|
14 can be generated and returned using Django queries. |
|
15 |
|
16 Throughout this guide, we'll refer to the following models. These models are |
|
17 used to track the inventory for a series of online bookstores: |
|
18 |
|
19 .. _queryset-model-example: |
|
20 |
|
21 .. code-block:: python |
|
22 |
|
23 class Author(models.Model): |
|
24 name = models.CharField(max_length=100) |
|
25 age = models.IntegerField() |
|
26 friends = models.ManyToManyField('self', blank=True) |
|
27 |
|
28 class Publisher(models.Model): |
|
29 name = models.CharField(max_length=300) |
|
30 num_awards = models.IntegerField() |
|
31 |
|
32 class Book(models.Model): |
|
33 isbn = models.CharField(max_length=9) |
|
34 name = models.CharField(max_length=300) |
|
35 pages = models.IntegerField() |
|
36 price = models.DecimalField(max_digits=10, decimal_places=2) |
|
37 rating = models.FloatField() |
|
38 authors = models.ManyToManyField(Author) |
|
39 publisher = models.ForeignKey(Publisher) |
|
40 pubdate = models.DateField() |
|
41 |
|
42 class Store(models.Model): |
|
43 name = models.CharField(max_length=300) |
|
44 books = models.ManyToManyField(Book) |
|
45 |
|
46 |
|
47 Generating aggregates over a QuerySet |
|
48 ===================================== |
|
49 |
|
50 Django provides two ways to generate aggregates. The first way is to generate |
|
51 summary values over an entire ``QuerySet``. For example, say you wanted to |
|
52 calculate the average price of all books available for sale. Django's query |
|
53 syntax provides a means for describing the set of all books:: |
|
54 |
|
55 >>> Book.objects.all() |
|
56 |
|
57 What we need is a way to calculate summary values over the objects that |
|
58 belong to this ``QuerySet``. This is done by appending an ``aggregate()`` |
|
59 clause onto the ``QuerySet``:: |
|
60 |
|
61 >>> from django.db.models import Avg |
|
62 >>> Book.objects.all().aggregate(Avg('price')) |
|
63 {'price__avg': 34.35} |
|
64 |
|
65 The ``all()`` is redundant in this example, so this could be simplified to:: |
|
66 |
|
67 >>> Book.objects.aggregate(Avg('price')) |
|
68 {'price__avg': 34.35} |
|
69 |
|
70 The argument to the ``aggregate()`` clause describes the aggregate value that |
|
71 we want to compute - in this case, the average of the ``price`` field on the |
|
72 ``Book`` model. A list of the aggregate functions that are available can be |
|
73 found in the :ref:`QuerySet reference <aggregation-functions>`. |
|
74 |
|
75 ``aggregate()`` is a terminal clause for a ``QuerySet`` that, when invoked, |
|
76 returns a dictionary of name-value pairs. The name is an identifier for the |
|
77 aggregate value; the value is the computed aggregate. The name is |
|
78 automatically generated from the name of the field and the aggregate function. |
|
79 If you want to manually specify a name for the aggregate value, you can do so |
|
80 by providing that name when you specify the aggregate clause:: |
|
81 |
|
82 >>> Book.objects.aggregate(average_price=Avg('price')) |
|
83 {'average_price': 34.35} |
|
84 |
|
85 If you want to generate more than one aggregate, you just add another |
|
86 argument to the ``aggregate()`` clause. So, if we also wanted to know |
|
87 the maximum and minimum price of all books, we would issue the query:: |
|
88 |
|
89 >>> from django.db.models import Avg, Max, Min, Count |
|
90 >>> Book.objects.aggregate(Avg('price'), Max('price'), Min('price')) |
|
91 {'price__avg': 34.35, 'price__max': Decimal('81.20'), 'price__min': Decimal('12.99')} |
|
92 |
|
93 Generating aggregates for each item in a QuerySet |
|
94 ================================================= |
|
95 |
|
96 The second way to generate summary values is to generate an independent |
|
97 summary for each object in a ``QuerySet``. For example, if you are retrieving |
|
98 a list of books, you may want to know how many authors contributed to |
|
99 each book. Each Book has a many-to-many relationship with the Author; we |
|
100 want to summarize this relationship for each book in the ``QuerySet``. |
|
101 |
|
102 Per-object summaries can be generated using the ``annotate()`` clause. |
|
103 When an ``annotate()`` clause is specified, each object in the ``QuerySet`` |
|
104 will be annotated with the specified values. |
|
105 |
|
106 The syntax for these annotations is identical to that used for the |
|
107 ``aggregate()`` clause. Each argument to ``annotate()`` describes an |
|
108 aggregate that is to be calculated. For example, to annotate Books with |
|
109 the number of authors:: |
|
110 |
|
111 # Build an annotated queryset |
|
112 >>> q = Book.objects.annotate(Count('authors')) |
|
113 # Interrogate the first object in the queryset |
|
114 >>> q[0] |
|
115 <Book: The Definitive Guide to Django> |
|
116 >>> q[0].authors__count |
|
117 2 |
|
118 # Interrogate the second object in the queryset |
|
119 >>> q[1] |
|
120 <Book: Practical Django Projects> |
|
121 >>> q[1].authors__count |
|
122 1 |
|
123 |
|
124 As with ``aggregate()``, the name for the annotation is automatically derived |
|
125 from the name of the aggregate function and the name of the field being |
|
126 aggregated. You can override this default name by providing an alias when you |
|
127 specify the annotation:: |
|
128 |
|
129 >>> q = Book.objects.annotate(num_authors=Count('authors')) |
|
130 >>> q[0].num_authors |
|
131 2 |
|
132 >>> q[1].num_authors |
|
133 1 |
|
134 |
|
135 Unlike ``aggregate()``, ``annotate()`` is *not* a terminal clause. The output |
|
136 of the ``annotate()`` clause is a ``QuerySet``; this ``QuerySet`` can be |
|
137 modified using any other ``QuerySet`` operation, including ``filter()``, |
|
138 ``order_by``, or even additional calls to ``annotate()``. |
|
139 |
|
140 Joins and aggregates |
|
141 ==================== |
|
142 |
|
143 So far, we have dealt with aggregates over fields that belong to the |
|
144 model being queried. However, sometimes the value you want to aggregate |
|
145 will belong to a model that is related to the model you are querying. |
|
146 |
|
147 When specifying the field to be aggregated in an aggregate function, Django |
|
148 will allow you to use the same :ref:`double underscore notation |
|
149 <field-lookups-intro>` that is used when referring to related fields in |
|
150 filters. Django will then handle any table joins that are required to retrieve |
|
151 and aggregate the related value. |
|
152 |
|
153 For example, to find the price range of books offered in each store, |
|
154 you could use the annotation:: |
|
155 |
|
156 >>> Store.objects.annotate(min_price=Min('books__price'), max_price=Max('books__price')) |
|
157 |
|
158 This tells Django to retrieve the Store model, join (through the |
|
159 many-to-many relationship) with the Book model, and aggregate on the |
|
160 price field of the book model to produce a minimum and maximum value. |
|
161 |
|
162 The same rules apply to the ``aggregate()`` clause. If you wanted to |
|
163 know the lowest and highest price of any book that is available for sale |
|
164 in a store, you could use the aggregate:: |
|
165 |
|
166 >>> Store.objects.aggregate(min_price=Min('books__price'), max_price=Max('books__price')) |
|
167 |
|
168 Join chains can be as deep as you require. For example, to extract the |
|
169 age of the youngest author of any book available for sale, you could |
|
170 issue the query:: |
|
171 |
|
172 >>> Store.objects.aggregate(youngest_age=Min('books__authors__age')) |
|
173 |
|
174 Aggregations and other QuerySet clauses |
|
175 ======================================= |
|
176 |
|
177 ``filter()`` and ``exclude()`` |
|
178 ------------------------------ |
|
179 |
|
180 Aggregates can also participate in filters. Any ``filter()`` (or |
|
181 ``exclude()``) applied to normal model fields will have the effect of |
|
182 constraining the objects that are considered for aggregation. |
|
183 |
|
184 When used with an ``annotate()`` clause, a filter has the effect of |
|
185 constraining the objects for which an annotation is calculated. For example, |
|
186 you can generate an annotated list of all books that have a title starting |
|
187 with "Django" using the query:: |
|
188 |
|
189 >>> Book.objects.filter(name__startswith="Django").annotate(num_authors=Count('authors')) |
|
190 |
|
191 When used with an ``aggregate()`` clause, a filter has the effect of |
|
192 constraining the objects over which the aggregate is calculated. |
|
193 For example, you can generate the average price of all books with a |
|
194 title that starts with "Django" using the query:: |
|
195 |
|
196 >>> Book.objects.filter(name__startswith="Django").aggregate(Avg('price')) |
|
197 |
|
198 Filtering on annotations |
|
199 ~~~~~~~~~~~~~~~~~~~~~~~~ |
|
200 |
|
201 Annotated values can also be filtered. The alias for the annotation can be |
|
202 used in ``filter()`` and ``exclude()`` clauses in the same way as any other |
|
203 model field. |
|
204 |
|
205 For example, to generate a list of books that have more than one author, |
|
206 you can issue the query:: |
|
207 |
|
208 >>> Book.objects.annotate(num_authors=Count('authors')).filter(num_authors__gt=1) |
|
209 |
|
210 This query generates an annotated result set, and then generates a filter |
|
211 based upon that annotation. |
|
212 |
|
213 Order of ``annotate()`` and ``filter()`` clauses |
|
214 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
215 |
|
216 When developing a complex query that involves both ``annotate()`` and |
|
217 ``filter()`` clauses, particular attention should be paid to the order |
|
218 in which the clauses are applied to the ``QuerySet``. |
|
219 |
|
220 When an ``annotate()`` clause is applied to a query, the annotation is |
|
221 computed over the state of the query up to the point where the annotation |
|
222 is requested. The practical implication of this is that ``filter()`` and |
|
223 ``annotate()`` are not commutative operations -- that is, there is a |
|
224 difference between the query:: |
|
225 |
|
226 >>> Publisher.objects.annotate(num_books=Count('book')).filter(book__rating__gt=3.0) |
|
227 |
|
228 and the query:: |
|
229 |
|
230 >>> Publisher.objects.filter(book__rating__gt=3.0).annotate(num_books=Count('book')) |
|
231 |
|
232 Both queries will return a list of Publishers that have at least one good |
|
233 book (i.e., a book with a rating exceeding 3.0). However, the annotation in |
|
234 the first query will provide the total number of all books published by the |
|
235 publisher; the second query will only include good books in the annotated |
|
236 count. In the first query, the annotation precedes the filter, so the |
|
237 filter has no effect on the annotation. In the second query, the filter |
|
238 preceeds the annotation, and as a result, the filter constrains the objects |
|
239 considered when calculating the annotation. |
|
240 |
|
241 ``order_by()`` |
|
242 -------------- |
|
243 |
|
244 Annotations can be used as a basis for ordering. When you |
|
245 define an ``order_by()`` clause, the aggregates you provide can reference |
|
246 any alias defined as part of an ``annotate()`` clause in the query. |
|
247 |
|
248 For example, to order a ``QuerySet`` of books by the number of authors |
|
249 that have contributed to the book, you could use the following query:: |
|
250 |
|
251 >>> Book.objects.annotate(num_authors=Count('authors')).order_by('num_authors') |
|
252 |
|
253 ``values()`` |
|
254 ------------ |
|
255 |
|
256 Ordinarily, annotations are generated on a per-object basis - an annotated |
|
257 ``QuerySet`` will return one result for each object in the original |
|
258 ``QuerySet``. However, when a ``values()`` clause is used to constrain the |
|
259 columns that are returned in the result set, the method for evaluating |
|
260 annotations is slightly different. Instead of returning an annotated result |
|
261 for each result in the original ``QuerySet``, the original results are |
|
262 grouped according to the unique combinations of the fields specified in the |
|
263 ``values()`` clause. An annotation is then provided for each unique group; |
|
264 the annotation is computed over all members of the group. |
|
265 |
|
266 For example, consider an author query that attempts to find out the average |
|
267 rating of books written by each author: |
|
268 |
|
269 >>> Author.objects.annotate(average_rating=Avg('book__rating')) |
|
270 |
|
271 This will return one result for each author in the database, annotated with |
|
272 their average book rating. |
|
273 |
|
274 However, the result will be slightly different if you use a ``values()`` clause:: |
|
275 |
|
276 >>> Author.objects.values('name').annotate(average_rating=Avg('book__rating')) |
|
277 |
|
278 In this example, the authors will be grouped by name, so you will only get |
|
279 an annotated result for each *unique* author name. This means if you have |
|
280 two authors with the same name, their results will be merged into a single |
|
281 result in the output of the query; the average will be computed as the |
|
282 average over the books written by both authors. |
|
283 |
|
284 Order of ``annotate()`` and ``values()`` clauses |
|
285 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
286 |
|
287 As with the ``filter()`` clause, the order in which ``annotate()`` and |
|
288 ``values()`` clauses are applied to a query is significant. If the |
|
289 ``values()`` clause precedes the ``annotate()``, the annotation will be |
|
290 computed using the grouping described by the ``values()`` clause. |
|
291 |
|
292 However, if the ``annotate()`` clause precedes the ``values()`` clause, |
|
293 the annotations will be generated over the entire query set. In this case, |
|
294 the ``values()`` clause only constrains the fields that are generated on |
|
295 output. |
|
296 |
|
297 For example, if we reverse the order of the ``values()`` and ``annotate()`` |
|
298 clause from our previous example:: |
|
299 |
|
300 >>> Author.objects.annotate(average_rating=Avg('book__rating')).values('name', 'average_rating') |
|
301 |
|
302 This will now yield one unique result for each author; however, only |
|
303 the author's name and the ``average_rating`` annotation will be returned |
|
304 in the output data. |
|
305 |
|
306 You should also note that ``average_rating`` has been explicitly included |
|
307 in the list of values to be returned. This is required because of the |
|
308 ordering of the ``values()`` and ``annotate()`` clause. |
|
309 |
|
310 If the ``values()`` clause precedes the ``annotate()`` clause, any annotations |
|
311 will be automatically added to the result set. However, if the ``values()`` |
|
312 clause is applied after the ``annotate()`` clause, you need to explicitly |
|
313 include the aggregate column. |
|
314 |
|
315 Interaction with default ordering or ``order_by()`` |
|
316 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
317 |
|
318 Fields that are mentioned in the ``order_by()`` part of a queryset (or which |
|
319 are used in the default ordering on a model) are used when selecting the |
|
320 output data, even if they are not otherwise specified in the ``values()`` |
|
321 call. These extra fields are used to group "like" results together and they |
|
322 can make otherwise identical result rows appear to be separate. This shows up, |
|
323 particularly, when counting things. |
|
324 |
|
325 By way of example, suppose you have a model like this:: |
|
326 |
|
327 class Item(models.Model): |
|
328 name = models.CharField(max_length=10) |
|
329 data = models.IntegerField() |
|
330 |
|
331 class Meta: |
|
332 ordering = ["name"] |
|
333 |
|
334 The important part here is the default ordering on the ``name`` field. If you |
|
335 want to count how many times each distinct ``data`` value appears, you might |
|
336 try this:: |
|
337 |
|
338 # Warning: not quite correct! |
|
339 Item.objects.values("data").annotate(Count("id")) |
|
340 |
|
341 ...which will group the ``Item`` objects by their common ``data`` values and |
|
342 then count the number of ``id`` values in each group. Except that it won't |
|
343 quite work. The default ordering by ``name`` will also play a part in the |
|
344 grouping, so this query will group by distinct ``(data, name)`` pairs, which |
|
345 isn't what you want. Instead, you should construct this queryset:: |
|
346 |
|
347 Item.objects.values("data").annotate(Count("id")).order_by() |
|
348 |
|
349 ...clearing any ordering in the query. You could also order by, say, ``data`` |
|
350 without any harmful effects, since that is already playing a role in the |
|
351 query. |
|
352 |
|
353 This behavior is the same as that noted in the queryset documentation for |
|
354 :meth:`~django.db.models.QuerySet.distinct` and the general rule is the same: |
|
355 normally you won't want extra columns playing a part in the result, so clear |
|
356 out the ordering, or at least make sure it's restricted only to those fields |
|
357 you also select in a ``values()`` call. |
|
358 |
|
359 .. note:: |
|
360 You might reasonably ask why Django doesn't remove the extraneous columns |
|
361 for you. The main reason is consistency with ``distinct()`` and other |
|
362 places: Django **never** removes ordering constraints that you have |
|
363 specified (and we can't change those other methods' behavior, as that |
|
364 would violate our :doc:`/misc/api-stability` policy). |
|
365 |
|
366 Aggregating annotations |
|
367 ----------------------- |
|
368 |
|
369 You can also generate an aggregate on the result of an annotation. When you |
|
370 define an ``aggregate()`` clause, the aggregates you provide can reference |
|
371 any alias defined as part of an ``annotate()`` clause in the query. |
|
372 |
|
373 For example, if you wanted to calculate the average number of authors per |
|
374 book you first annotate the set of books with the author count, then |
|
375 aggregate that author count, referencing the annotation field:: |
|
376 |
|
377 >>> Book.objects.annotate(num_authors=Count('authors')).aggregate(Avg('num_authors')) |
|
378 {'num_authors__avg': 1.66} |