pytask: comparison parts/django/docs/topics/db/aggregation.txt

equal deleted inserted replaced

-:5ff1fc726848
+:c6bca38c1cbf
+===========
+Aggregation
+===========
+.. versionadded:: 1.1
+.. currentmodule:: django.db.models
+The topic guide on :doc:`Django's database-abstraction API </topics/db/queries>`
+described the way that you can use Django queries that create,
+retrieve, update and delete individual objects. However, sometimes you will
+need to retrieve values that are derived by summarizing or *aggregating* a
+collection of objects. This topic guide describes the ways that aggregate values
+can be generated and returned using Django queries.
+Throughout this guide, we'll refer to the following models. These models are
+used to track the inventory for a series of online bookstores:
+.. _queryset-model-example:
+.. code-block:: python
+class Author(models.Model):
+name = models.CharField(max_length=100)
+age = models.IntegerField()
+friends = models.ManyToManyField('self', blank=True)
+class Publisher(models.Model):
+name = models.CharField(max_length=300)
+num_awards = models.IntegerField()
+class Book(models.Model):
+isbn = models.CharField(max_length=9)
+name = models.CharField(max_length=300)
+pages = models.IntegerField()
+price = models.DecimalField(max_digits=10, decimal_places=2)
+rating = models.FloatField()
+authors = models.ManyToManyField(Author)
+publisher = models.ForeignKey(Publisher)
+pubdate = models.DateField()
+class Store(models.Model):
+name = models.CharField(max_length=300)
+books = models.ManyToManyField(Book)
+Generating aggregates over a QuerySet
+=====================================
+Django provides two ways to generate aggregates. The first way is to generate
+summary values over an entire ``QuerySet``. For example, say you wanted to
+calculate the average price of all books available for sale. Django's query
+syntax provides a means for describing the set of all books::
+>>> Book.objects.all()
+What we need is a way to calculate summary values over the objects that
+belong to this ``QuerySet``. This is done by appending an ``aggregate()``
+clause onto the ``QuerySet``::
+>>> from django.db.models import Avg
+>>> Book.objects.all().aggregate(Avg('price'))
+{'price__avg': 34.35}
+The ``all()`` is redundant in this example, so this could be simplified to::
+>>> Book.objects.aggregate(Avg('price'))
+{'price__avg': 34.35}
+The argument to the ``aggregate()`` clause describes the aggregate value that
+we want to compute - in this case, the average of the ``price`` field on the
+``Book`` model. A list of the aggregate functions that are available can be
+found in the :ref:`QuerySet reference <aggregation-functions>`.
+``aggregate()`` is a terminal clause for a ``QuerySet`` that, when invoked,
+returns a dictionary of name-value pairs. The name is an identifier for the
+aggregate value; the value is the computed aggregate. The name is
+automatically generated from the name of the field and the aggregate function.
+If you want to manually specify a name for the aggregate value, you can do so
+by providing that name when you specify the aggregate clause::
+>>> Book.objects.aggregate(average_price=Avg('price'))
+{'average_price': 34.35}
+If you want to generate more than one aggregate, you just add another
+argument to the ``aggregate()`` clause. So, if we also wanted to know
+the maximum and minimum price of all books, we would issue the query::
+>>> from django.db.models import Avg, Max, Min, Count
+>>> Book.objects.aggregate(Avg('price'), Max('price'), Min('price'))
+{'price__avg': 34.35, 'price__max': Decimal('81.20'), 'price__min': Decimal('12.99')}
+Generating aggregates for each item in a QuerySet
+=================================================
+The second way to generate summary values is to generate an independent
+summary for each object in a ``QuerySet``. For example, if you are retrieving
+a list of books, you may want to know how many authors contributed to
+each book. Each Book has a many-to-many relationship with the Author; we
+want to summarize this relationship for each book in the ``QuerySet``.
+Per-object summaries can be generated using the ``annotate()`` clause.
+When an ``annotate()`` clause is specified, each object in the ``QuerySet``
+will be annotated with the specified values.
+The syntax for these annotations is identical to that used for the
+``aggregate()`` clause. Each argument to ``annotate()`` describes an
+aggregate that is to be calculated. For example, to annotate Books with
+the number of authors::
+# Build an annotated queryset
+>>> q = Book.objects.annotate(Count('authors'))
+# Interrogate the first object in the queryset
+>>> q[0]
+<Book: The Definitive Guide to Django>
+>>> q[0].authors__count
+2
+# Interrogate the second object in the queryset
+>>> q[1]
+<Book: Practical Django Projects>
+>>> q[1].authors__count
+1
+As with ``aggregate()``, the name for the annotation is automatically derived
+from the name of the aggregate function and the name of the field being
+aggregated. You can override this default name by providing an alias when you
+specify the annotation::
+>>> q = Book.objects.annotate(num_authors=Count('authors'))
+>>> q[0].num_authors
+2
+>>> q[1].num_authors
+1
+Unlike ``aggregate()``, ``annotate()`` is *not* a terminal clause. The output
+of the ``annotate()`` clause is a ``QuerySet``; this ``QuerySet`` can be
+modified using any other ``QuerySet`` operation, including ``filter()``,
+``order_by``, or even additional calls to ``annotate()``.
+Joins and aggregates
+====================
+So far, we have dealt with aggregates over fields that belong to the
+model being queried. However, sometimes the value you want to aggregate
+will belong to a model that is related to the model you are querying.
+When specifying the field to be aggregated in an aggregate function, Django
+will allow you to use the same :ref:`double underscore notation
+<field-lookups-intro>` that is used when referring to related fields in
+filters. Django will then handle any table joins that are required to retrieve
+and aggregate the related value.
+For example, to find the price range of books offered in each store,
+you could use the annotation::
+>>> Store.objects.annotate(min_price=Min('books__price'), max_price=Max('books__price'))
+This tells Django to retrieve the Store model, join (through the
+many-to-many relationship) with the Book model, and aggregate on the
+price field of the book model to produce a minimum and maximum value.
+The same rules apply to the ``aggregate()`` clause. If you wanted to
+know the lowest and highest price of any book that is available for sale
+in a store, you could use the aggregate::
+>>> Store.objects.aggregate(min_price=Min('books__price'), max_price=Max('books__price'))
+Join chains can be as deep as you require. For example, to extract the
+age of the youngest author of any book available for sale, you could
+issue the query::
+>>> Store.objects.aggregate(youngest_age=Min('books__authors__age'))
+Aggregations and other QuerySet clauses
+=======================================
+``filter()`` and ``exclude()``
+------------------------------
+Aggregates can also participate in filters. Any ``filter()`` (or
+``exclude()``) applied to normal model fields will have the effect of
+constraining the objects that are considered for aggregation.
+When used with an ``annotate()`` clause, a filter has the effect of
+constraining the objects for which an annotation is calculated. For example,
+you can generate an annotated list of all books that have a title starting
+with "Django" using the query::
+>>> Book.objects.filter(name__startswith="Django").annotate(num_authors=Count('authors'))
+When used with an ``aggregate()`` clause, a filter has the effect of
+constraining the objects over which the aggregate is calculated.
+For example, you can generate the average price of all books with a
+title that starts with "Django" using the query::
+>>> Book.objects.filter(name__startswith="Django").aggregate(Avg('price'))
+Filtering on annotations
+~~~~~~~~~~~~~~~~~~~~~~~~
+Annotated values can also be filtered. The alias for the annotation can be
+used in ``filter()`` and ``exclude()`` clauses in the same way as any other
+model field.
+For example, to generate a list of books that have more than one author,
+you can issue the query::
+>>> Book.objects.annotate(num_authors=Count('authors')).filter(num_authors__gt=1)
+This query generates an annotated result set, and then generates a filter
+based upon that annotation.
+Order of ``annotate()`` and ``filter()`` clauses
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When developing a complex query that involves both ``annotate()`` and
+``filter()`` clauses, particular attention should be paid to the order
+in which the clauses are applied to the ``QuerySet``.
+When an ``annotate()`` clause is applied to a query, the annotation is
+computed over the state of the query up to the point where the annotation
+is requested. The practical implication of this is that ``filter()`` and
+``annotate()`` are not commutative operations -- that is, there is a
+difference between the query::
+>>> Publisher.objects.annotate(num_books=Count('book')).filter(book__rating__gt=3.0)
+and the query::
+>>> Publisher.objects.filter(book__rating__gt=3.0).annotate(num_books=Count('book'))
+Both queries will return a list of Publishers that have at least one good
+book (i.e., a book with a rating exceeding 3.0). However, the annotation in
+the first query will provide the total number of all books published by the
+publisher; the second query will only include good books in the annotated
+count. In the first query, the annotation precedes the filter, so the
+filter has no effect on the annotation. In the second query, the filter
+preceeds the annotation, and as a result, the filter constrains the objects
+considered when calculating the annotation.
+``order_by()``
+--------------
+Annotations can be used as a basis for ordering. When you
+define an ``order_by()`` clause, the aggregates you provide can reference
+any alias defined as part of an ``annotate()`` clause in the query.
+For example, to order a ``QuerySet`` of books by the number of authors
+that have contributed to the book, you could use the following query::
+>>> Book.objects.annotate(num_authors=Count('authors')).order_by('num_authors')
+``values()``
+------------
+Ordinarily, annotations are generated on a per-object basis - an annotated
+``QuerySet`` will return one result for each object in the original
+``QuerySet``. However, when a ``values()`` clause is used to constrain the
+columns that are returned in the result set, the method for evaluating
+annotations is slightly different. Instead of returning an annotated result
+for each result in the original ``QuerySet``, the original results are
+grouped according to the unique combinations of the fields specified in the
+``values()`` clause. An annotation is then provided for each unique group;
+the annotation is computed over all members of the group.
+For example, consider an author query that attempts to find out the average
+rating of books written by each author:
+>>> Author.objects.annotate(average_rating=Avg('book__rating'))
+This will return one result for each author in the database, annotated with
+their average book rating.
+However, the result will be slightly different if you use a ``values()`` clause::
+>>> Author.objects.values('name').annotate(average_rating=Avg('book__rating'))
+In this example, the authors will be grouped by name, so you will only get
+an annotated result for each *unique* author name. This means if you have
+two authors with the same name, their results will be merged into a single
+result in the output of the query; the average will be computed as the
+average over the books written by both authors.
+Order of ``annotate()`` and ``values()`` clauses
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As with the ``filter()`` clause, the order in which ``annotate()`` and
+``values()`` clauses are applied to a query is significant. If the
+``values()`` clause precedes the ``annotate()``, the annotation will be
+computed using the grouping described by the ``values()`` clause.
+However, if the ``annotate()`` clause precedes the ``values()`` clause,
+the annotations will be generated over the entire query set. In this case,
+the ``values()`` clause only constrains the fields that are generated on
+output.
+For example, if we reverse the order of the ``values()`` and ``annotate()``
+clause from our previous example::
+>>> Author.objects.annotate(average_rating=Avg('book__rating')).values('name', 'average_rating')
+This will now yield one unique result for each author; however, only
+the author's name and the ``average_rating`` annotation will be returned
+in the output data.
+You should also note that ``average_rating`` has been explicitly included
+in the list of values to be returned. This is required because of the
+ordering of the ``values()`` and ``annotate()`` clause.
+If the ``values()`` clause precedes the ``annotate()`` clause, any annotations
+will be automatically added to the result set. However, if the ``values()``
+clause is applied after the ``annotate()`` clause, you need to explicitly
+include the aggregate column.
+Interaction with default ordering or ``order_by()``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Fields that are mentioned in the ``order_by()`` part of a queryset (or which
+are used in the default ordering on a model) are used when selecting the
+output data, even if they are not otherwise specified in the ``values()``
+call. These extra fields are used to group "like" results together and they
+can make otherwise identical result rows appear to be separate. This shows up,
+particularly, when counting things.
+By way of example, suppose you have a model like this::
+class Item(models.Model):
+name = models.CharField(max_length=10)
+data = models.IntegerField()
+class Meta:
+ordering = ["name"]
+The important part here is the default ordering on the ``name`` field. If you
+want to count how many times each distinct ``data`` value appears, you might
+try this::
+# Warning: not quite correct!
+Item.objects.values("data").annotate(Count("id"))
+...which will group the ``Item`` objects by their common ``data`` values and
+then count the number of ``id`` values in each group. Except that it won't
+quite work. The default ordering by ``name`` will also play a part in the
+grouping, so this query will group by distinct ``(data, name)`` pairs, which
+isn't what you want. Instead, you should construct this queryset::
+Item.objects.values("data").annotate(Count("id")).order_by()
+...clearing any ordering in the query. You could also order by, say, ``data``
+without any harmful effects, since that is already playing a role in the
+query.
+This behavior is the same as that noted in the queryset documentation for
+:meth:`~django.db.models.QuerySet.distinct` and the general rule is the same:
+normally you won't want extra columns playing a part in the result, so clear
+out the ordering, or at least make sure it's restricted only to those fields
+you also select in a ``values()`` call.
+.. note::
+You might reasonably ask why Django doesn't remove the extraneous columns
+for you. The main reason is consistency with ``distinct()`` and other
+places: Django **never** removes ordering constraints that you have
+specified (and we can't change those other methods' behavior, as that
+would violate our :doc:`/misc/api-stability` policy).
+Aggregating annotations
+-----------------------
+You can also generate an aggregate on the result of an annotation. When you
+define an ``aggregate()`` clause, the aggregates you provide can reference
+any alias defined as part of an ``annotate()`` clause in the query.
+For example, if you wanted to calculate the average number of authors per
+book you first annotate the set of books with the author count, then
+aggregate that author count, referencing the annotation field::
+>>> Book.objects.annotate(num_authors=Count('authors')).aggregate(Avg('num_authors'))
+{'num_authors__avg': 1.66}