parts/django/docs/topics/serialization.txt
changeset 69 c6bca38c1cbf
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/parts/django/docs/topics/serialization.txt	Sat Jan 08 11:20:57 2011 +0530
@@ -0,0 +1,402 @@
+==========================
+Serializing Django objects
+==========================
+
+Django's serialization framework provides a mechanism for "translating" Django
+objects into other formats. Usually these other formats will be text-based and
+used for sending Django objects over a wire, but it's possible for a
+serializer to handle any format (text-based or not).
+
+.. seealso::
+
+    If you just want to get some data from your tables into a serialized
+    form, you could use the :djadmin:`dumpdata` management command.
+
+Serializing data
+----------------
+
+At the highest level, serializing data is a very simple operation::
+
+    from django.core import serializers
+    data = serializers.serialize("xml", SomeModel.objects.all())
+
+The arguments to the ``serialize`` function are the format to serialize the data
+to (see `Serialization formats`_) and a :class:`~django.db.models.QuerySet` to
+serialize. (Actually, the second argument can be any iterator that yields Django
+objects, but it'll almost always be a QuerySet).
+
+You can also use a serializer object directly::
+
+    XMLSerializer = serializers.get_serializer("xml")
+    xml_serializer = XMLSerializer()
+    xml_serializer.serialize(queryset)
+    data = xml_serializer.getvalue()
+
+This is useful if you want to serialize data directly to a file-like object
+(which includes an :class:`~django.http.HttpResponse`)::
+
+    out = open("file.xml", "w")
+    xml_serializer.serialize(SomeModel.objects.all(), stream=out)
+
+Subset of fields
+~~~~~~~~~~~~~~~~
+
+If you only want a subset of fields to be serialized, you can
+specify a ``fields`` argument to the serializer::
+
+    from django.core import serializers
+    data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size'))
+
+In this example, only the ``name`` and ``size`` attributes of each model will
+be serialized.
+
+.. note::
+
+    Depending on your model, you may find that it is not possible to
+    deserialize a model that only serializes a subset of its fields. If a
+    serialized object doesn't specify all the fields that are required by a
+    model, the deserializer will not be able to save deserialized instances.
+
+Inherited Models
+~~~~~~~~~~~~~~~~
+
+If you have a model that is defined using an :ref:`abstract base class
+<abstract-base-classes>`, you don't have to do anything special to serialize
+that model. Just call the serializer on the object (or objects) that you want to
+serialize, and the output will be a complete representation of the serialized
+object.
+
+However, if you have a model that uses :ref:`multi-table inheritance
+<multi-table-inheritance>`, you also need to serialize all of the base classes
+for the model. This is because only the fields that are locally defined on the
+model will be serialized. For example, consider the following models::
+
+    class Place(models.Model):
+        name = models.CharField(max_length=50)
+
+    class Restaurant(Place):
+        serves_hot_dogs = models.BooleanField()
+
+If you only serialize the Restaurant model::
+
+    data = serializers.serialize('xml', Restaurant.objects.all())
+
+the fields on the serialized output will only contain the `serves_hot_dogs`
+attribute. The `name` attribute of the base class will be ignored.
+
+In order to fully serialize your Restaurant instances, you will need to
+serialize the Place models as well::
+
+    all_objects = list(Restaurant.objects.all()) + list(Place.objects.all())
+    data = serializers.serialize('xml', all_objects)
+
+Deserializing data
+------------------
+
+Deserializing data is also a fairly simple operation::
+
+    for obj in serializers.deserialize("xml", data):
+        do_something_with(obj)
+
+As you can see, the ``deserialize`` function takes the same format argument as
+``serialize``, a string or stream of data, and returns an iterator.
+
+However, here it gets slightly complicated. The objects returned by the
+``deserialize`` iterator *aren't* simple Django objects. Instead, they are
+special ``DeserializedObject`` instances that wrap a created -- but unsaved --
+object and any associated relationship data.
+
+Calling ``DeserializedObject.save()`` saves the object to the database.
+
+This ensures that deserializing is a non-destructive operation even if the
+data in your serialized representation doesn't match what's currently in the
+database. Usually, working with these ``DeserializedObject`` instances looks
+something like::
+
+    for deserialized_object in serializers.deserialize("xml", data):
+        if object_should_be_saved(deserialized_object):
+            deserialized_object.save()
+
+In other words, the usual use is to examine the deserialized objects to make
+sure that they are "appropriate" for saving before doing so.  Of course, if you
+trust your data source you could just save the object and move on.
+
+The Django object itself can be inspected as ``deserialized_object.object``.
+
+.. _serialization-formats:
+
+Serialization formats
+---------------------
+
+Django supports a number of serialization formats, some of which require you
+to install third-party Python modules:
+
+    ==========  ==============================================================
+    Identifier  Information
+    ==========  ==============================================================
+    ``xml``     Serializes to and from a simple XML dialect.
+
+    ``json``    Serializes to and from JSON_ (using a version of simplejson_
+                bundled with Django).
+
+    ``yaml``    Serializes to YAML (YAML Ain't a Markup Language). This
+                serializer is only available if PyYAML_ is installed.
+    ==========  ==============================================================
+
+.. _json: http://json.org/
+.. _simplejson: http://undefined.org/python/#simplejson
+.. _PyYAML: http://www.pyyaml.org/
+
+Notes for specific serialization formats
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+json
+^^^^
+
+If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON
+serializer, you must pass ``ensure_ascii=False`` as a parameter to the
+``serialize()`` call. Otherwise, the output won't be encoded correctly.
+
+For example::
+
+    json_serializer = serializers.get_serializer("json")()
+    json_serializer.serialize(queryset, ensure_ascii=False, stream=response)
+
+The Django source code includes the simplejson_ module. However, if you're
+using Python 2.6 or later (which includes a builtin version of the module), Django will
+use the builtin ``json`` module automatically. If you have a system installed
+version that includes the C-based speedup extension, or your system version is
+more recent than the version shipped with Django (currently, 2.0.7), the
+system version will be used instead of the version included with Django.
+
+Be aware that if you're serializing using that module directly, not all Django
+output can be passed unmodified to simplejson. In particular, :ref:`lazy
+translation objects <lazy-translations>` need a `special encoder`_ written for
+them. Something like this will work::
+
+    from django.utils.functional import Promise
+    from django.utils.encoding import force_unicode
+
+    class LazyEncoder(simplejson.JSONEncoder):
+        def default(self, obj):
+            if isinstance(obj, Promise):
+                return force_unicode(obj)
+            return super(LazyEncoder, self).default(obj)
+
+.. _special encoder: http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html
+
+.. _topics-serialization-natural-keys:
+
+Natural keys
+------------
+
+.. versionadded:: 1.2
+
+   The ability to use natural keys when serializing/deserializing data was
+   added in the 1.2 release.
+
+The default serialization strategy for foreign keys and many-to-many
+relations is to serialize the value of the primary key(s) of the
+objects in the relation. This strategy works well for most types of
+object, but it can cause difficulty in some circumstances.
+
+Consider the case of a list of objects that have foreign key on
+:class:`ContentType`. If you're going to serialize an object that
+refers to a content type, you need to have a way to refer to that
+content type. Content Types are automatically created by Django as
+part of the database synchronization process, so you don't need to
+include content types in a fixture or other serialized data. As a
+result, the primary key of any given content type isn't easy to
+predict - it will depend on how and when :djadmin:`syncdb` was
+executed to create the content types.
+
+There is also the matter of convenience. An integer id isn't always
+the most convenient way to refer to an object; sometimes, a
+more natural reference would be helpful.
+
+It is for these reasons that Django provides *natural keys*. A natural
+key is a tuple of values that can be used to uniquely identify an
+object instance without using the primary key value.
+
+Deserialization of natural keys
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Consider the following two models::
+
+    from django.db import models
+
+    class Person(models.Model):
+        first_name = models.CharField(max_length=100)
+        last_name = models.CharField(max_length=100)
+
+        birthdate = models.DateField()
+
+        class Meta:
+            unique_together = (('first_name', 'last_name'),)
+
+    class Book(models.Model):
+        name = models.CharField(max_length=100)
+        author = models.ForeignKey(Person)
+
+Ordinarily, serialized data for ``Book`` would use an integer to refer to
+the author. For example, in JSON, a Book might be serialized as::
+
+    ...
+    {
+        "pk": 1,
+        "model": "store.book",
+        "fields": {
+            "name": "Mostly Harmless",
+            "author": 42
+        }
+    }
+    ...
+
+This isn't a particularly natural way to refer to an author. It
+requires that you know the primary key value for the author; it also
+requires that this primary key value is stable and predictable.
+
+However, if we add natural key handling to Person, the fixture becomes
+much more humane. To add natural key handling, you define a default
+Manager for Person with a ``get_by_natural_key()`` method. In the case
+of a Person, a good natural key might be the pair of first and last
+name::
+
+    from django.db import models
+
+    class PersonManager(models.Manager):
+        def get_by_natural_key(self, first_name, last_name):
+            return self.get(first_name=first_name, last_name=last_name)
+
+    class Person(models.Model):
+        objects = PersonManager()
+
+        first_name = models.CharField(max_length=100)
+        last_name = models.CharField(max_length=100)
+
+        birthdate = models.DateField()
+
+        class Meta:
+            unique_together = (('first_name', 'last_name'),)
+
+Now books can use that natural key to refer to ``Person`` objects::
+
+    ...
+    {
+        "pk": 1,
+        "model": "store.book",
+        "fields": {
+            "name": "Mostly Harmless",
+            "author": ["Douglas", "Adams"]
+        }
+    }
+    ...
+
+When you try to load this serialized data, Django will use the
+``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]``
+into the primary key of an actual ``Person`` object.
+
+.. note::
+
+    Whatever fields you use for a natural key must be able to uniquely
+    identify an object. This will usually mean that your model will
+    have a uniqueness clause (either unique=True on a single field, or
+    ``unique_together`` over multiple fields) for the field or fields
+    in your natural key. However, uniqueness doesn't need to be
+    enforced at the database level. If you are certain that a set of
+    fields will be effectively unique, you can still use those fields
+    as a natural key.
+
+Serialization of natural keys
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+So how do you get Django to emit a natural key when serializing an object?
+Firstly, you need to add another method -- this time to the model itself::
+
+    class Person(models.Model):
+        objects = PersonManager()
+
+        first_name = models.CharField(max_length=100)
+        last_name = models.CharField(max_length=100)
+
+        birthdate = models.DateField()
+
+        def natural_key(self):
+            return (self.first_name, self.last_name)
+
+        class Meta:
+            unique_together = (('first_name', 'last_name'),)
+
+That method should always return a natural key tuple -- in this
+example, ``(first name, last name)``. Then, when you call
+``serializers.serialize()``, you provide a ``use_natural_keys=True``
+argument::
+
+    >>> serializers.serialize('json', [book1, book2], indent=2, use_natural_keys=True)
+
+When ``use_natural_keys=True`` is specified, Django will use the
+``natural_key()`` method to serialize any reference to objects of the
+type that defines the method.
+
+If you are using :djadmin:`dumpdata` to generate serialized data, you
+use the `--natural` command line flag to generate natural keys.
+
+.. note::
+
+    You don't need to define both ``natural_key()`` and
+    ``get_by_natural_key()``. If you don't want Django to output
+    natural keys during serialization, but you want to retain the
+    ability to load natural keys, then you can opt to not implement
+    the ``natural_key()`` method.
+
+    Conversely, if (for some strange reason) you want Django to output
+    natural keys during serialization, but *not* be able to load those
+    key values, just don't define the ``get_by_natural_key()`` method.
+
+Dependencies during serialization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since natural keys rely on database lookups to resolve references, it
+is important that data exists before it is referenced. You can't make
+a `forward reference` with natural keys - the data you are referencing
+must exist before you include a natural key reference to that data.
+
+To accommodate this limitation, calls to :djadmin:`dumpdata` that use
+the :djadminopt:`--natural` option will serialize any model with a
+``natural_key()`` method before it serializes normal key objects.
+
+However, this may not always be enough. If your natural key refers to
+another object (by using a foreign key or natural key to another object
+as part of a natural key), then you need to be able to ensure that
+the objects on which a natural key depends occur in the serialized data
+before the natural key requires them.
+
+To control this ordering, you can define dependencies on your
+``natural_key()`` methods. You do this by setting a ``dependencies``
+attribute on the ``natural_key()`` method itself.
+
+For example, consider the ``Permission`` model in ``contrib.auth``.
+The following is a simplified version of the ``Permission`` model::
+
+    class Permission(models.Model):
+        name = models.CharField(max_length=50)
+        content_type = models.ForeignKey(ContentType)
+        codename = models.CharField(max_length=100)
+        # ...
+        def natural_key(self):
+            return (self.codename,) + self.content_type.natural_key()
+
+The natural key for a ``Permission`` is a combination of the codename for the
+``Permission``, and the ``ContentType`` to which the ``Permission`` applies. This means
+that ``ContentType`` must be serialized before ``Permission``. To define this
+dependency, we add one extra line::
+
+    class Permission(models.Model):
+        # ...
+        def natural_key(self):
+            return (self.codename,) + self.content_type.natural_key()
+        natural_key.dependencies = ['contenttypes.contenttype']
+
+This definition ensures that ``ContentType`` models are serialized before
+``Permission`` models. In turn, any object referencing ``Permission`` will
+be serialized after both ``ContentType`` and ``Permission``.