diff -r 5ff1fc726848 -r c6bca38c1cbf parts/django/docs/topics/serialization.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/parts/django/docs/topics/serialization.txt Sat Jan 08 11:20:57 2011 +0530 @@ -0,0 +1,402 @@ +========================== +Serializing Django objects +========================== + +Django's serialization framework provides a mechanism for "translating" Django +objects into other formats. Usually these other formats will be text-based and +used for sending Django objects over a wire, but it's possible for a +serializer to handle any format (text-based or not). + +.. seealso:: + + If you just want to get some data from your tables into a serialized + form, you could use the :djadmin:`dumpdata` management command. + +Serializing data +---------------- + +At the highest level, serializing data is a very simple operation:: + + from django.core import serializers + data = serializers.serialize("xml", SomeModel.objects.all()) + +The arguments to the ``serialize`` function are the format to serialize the data +to (see `Serialization formats`_) and a :class:`~django.db.models.QuerySet` to +serialize. (Actually, the second argument can be any iterator that yields Django +objects, but it'll almost always be a QuerySet). + +You can also use a serializer object directly:: + + XMLSerializer = serializers.get_serializer("xml") + xml_serializer = XMLSerializer() + xml_serializer.serialize(queryset) + data = xml_serializer.getvalue() + +This is useful if you want to serialize data directly to a file-like object +(which includes an :class:`~django.http.HttpResponse`):: + + out = open("file.xml", "w") + xml_serializer.serialize(SomeModel.objects.all(), stream=out) + +Subset of fields +~~~~~~~~~~~~~~~~ + +If you only want a subset of fields to be serialized, you can +specify a ``fields`` argument to the serializer:: + + from django.core import serializers + data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size')) + +In this example, only the ``name`` and ``size`` attributes of each model will +be serialized. + +.. note:: + + Depending on your model, you may find that it is not possible to + deserialize a model that only serializes a subset of its fields. If a + serialized object doesn't specify all the fields that are required by a + model, the deserializer will not be able to save deserialized instances. + +Inherited Models +~~~~~~~~~~~~~~~~ + +If you have a model that is defined using an :ref:`abstract base class +`, you don't have to do anything special to serialize +that model. Just call the serializer on the object (or objects) that you want to +serialize, and the output will be a complete representation of the serialized +object. + +However, if you have a model that uses :ref:`multi-table inheritance +`, you also need to serialize all of the base classes +for the model. This is because only the fields that are locally defined on the +model will be serialized. For example, consider the following models:: + + class Place(models.Model): + name = models.CharField(max_length=50) + + class Restaurant(Place): + serves_hot_dogs = models.BooleanField() + +If you only serialize the Restaurant model:: + + data = serializers.serialize('xml', Restaurant.objects.all()) + +the fields on the serialized output will only contain the `serves_hot_dogs` +attribute. The `name` attribute of the base class will be ignored. + +In order to fully serialize your Restaurant instances, you will need to +serialize the Place models as well:: + + all_objects = list(Restaurant.objects.all()) + list(Place.objects.all()) + data = serializers.serialize('xml', all_objects) + +Deserializing data +------------------ + +Deserializing data is also a fairly simple operation:: + + for obj in serializers.deserialize("xml", data): + do_something_with(obj) + +As you can see, the ``deserialize`` function takes the same format argument as +``serialize``, a string or stream of data, and returns an iterator. + +However, here it gets slightly complicated. The objects returned by the +``deserialize`` iterator *aren't* simple Django objects. Instead, they are +special ``DeserializedObject`` instances that wrap a created -- but unsaved -- +object and any associated relationship data. + +Calling ``DeserializedObject.save()`` saves the object to the database. + +This ensures that deserializing is a non-destructive operation even if the +data in your serialized representation doesn't match what's currently in the +database. Usually, working with these ``DeserializedObject`` instances looks +something like:: + + for deserialized_object in serializers.deserialize("xml", data): + if object_should_be_saved(deserialized_object): + deserialized_object.save() + +In other words, the usual use is to examine the deserialized objects to make +sure that they are "appropriate" for saving before doing so. Of course, if you +trust your data source you could just save the object and move on. + +The Django object itself can be inspected as ``deserialized_object.object``. + +.. _serialization-formats: + +Serialization formats +--------------------- + +Django supports a number of serialization formats, some of which require you +to install third-party Python modules: + + ========== ============================================================== + Identifier Information + ========== ============================================================== + ``xml`` Serializes to and from a simple XML dialect. + + ``json`` Serializes to and from JSON_ (using a version of simplejson_ + bundled with Django). + + ``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This + serializer is only available if PyYAML_ is installed. + ========== ============================================================== + +.. _json: http://json.org/ +.. _simplejson: http://undefined.org/python/#simplejson +.. _PyYAML: http://www.pyyaml.org/ + +Notes for specific serialization formats +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +json +^^^^ + +If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON +serializer, you must pass ``ensure_ascii=False`` as a parameter to the +``serialize()`` call. Otherwise, the output won't be encoded correctly. + +For example:: + + json_serializer = serializers.get_serializer("json")() + json_serializer.serialize(queryset, ensure_ascii=False, stream=response) + +The Django source code includes the simplejson_ module. However, if you're +using Python 2.6 or later (which includes a builtin version of the module), Django will +use the builtin ``json`` module automatically. If you have a system installed +version that includes the C-based speedup extension, or your system version is +more recent than the version shipped with Django (currently, 2.0.7), the +system version will be used instead of the version included with Django. + +Be aware that if you're serializing using that module directly, not all Django +output can be passed unmodified to simplejson. In particular, :ref:`lazy +translation objects ` need a `special encoder`_ written for +them. Something like this will work:: + + from django.utils.functional import Promise + from django.utils.encoding import force_unicode + + class LazyEncoder(simplejson.JSONEncoder): + def default(self, obj): + if isinstance(obj, Promise): + return force_unicode(obj) + return super(LazyEncoder, self).default(obj) + +.. _special encoder: http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html + +.. _topics-serialization-natural-keys: + +Natural keys +------------ + +.. versionadded:: 1.2 + + The ability to use natural keys when serializing/deserializing data was + added in the 1.2 release. + +The default serialization strategy for foreign keys and many-to-many +relations is to serialize the value of the primary key(s) of the +objects in the relation. This strategy works well for most types of +object, but it can cause difficulty in some circumstances. + +Consider the case of a list of objects that have foreign key on +:class:`ContentType`. If you're going to serialize an object that +refers to a content type, you need to have a way to refer to that +content type. Content Types are automatically created by Django as +part of the database synchronization process, so you don't need to +include content types in a fixture or other serialized data. As a +result, the primary key of any given content type isn't easy to +predict - it will depend on how and when :djadmin:`syncdb` was +executed to create the content types. + +There is also the matter of convenience. An integer id isn't always +the most convenient way to refer to an object; sometimes, a +more natural reference would be helpful. + +It is for these reasons that Django provides *natural keys*. A natural +key is a tuple of values that can be used to uniquely identify an +object instance without using the primary key value. + +Deserialization of natural keys +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Consider the following two models:: + + from django.db import models + + class Person(models.Model): + first_name = models.CharField(max_length=100) + last_name = models.CharField(max_length=100) + + birthdate = models.DateField() + + class Meta: + unique_together = (('first_name', 'last_name'),) + + class Book(models.Model): + name = models.CharField(max_length=100) + author = models.ForeignKey(Person) + +Ordinarily, serialized data for ``Book`` would use an integer to refer to +the author. For example, in JSON, a Book might be serialized as:: + + ... + { + "pk": 1, + "model": "store.book", + "fields": { + "name": "Mostly Harmless", + "author": 42 + } + } + ... + +This isn't a particularly natural way to refer to an author. It +requires that you know the primary key value for the author; it also +requires that this primary key value is stable and predictable. + +However, if we add natural key handling to Person, the fixture becomes +much more humane. To add natural key handling, you define a default +Manager for Person with a ``get_by_natural_key()`` method. In the case +of a Person, a good natural key might be the pair of first and last +name:: + + from django.db import models + + class PersonManager(models.Manager): + def get_by_natural_key(self, first_name, last_name): + return self.get(first_name=first_name, last_name=last_name) + + class Person(models.Model): + objects = PersonManager() + + first_name = models.CharField(max_length=100) + last_name = models.CharField(max_length=100) + + birthdate = models.DateField() + + class Meta: + unique_together = (('first_name', 'last_name'),) + +Now books can use that natural key to refer to ``Person`` objects:: + + ... + { + "pk": 1, + "model": "store.book", + "fields": { + "name": "Mostly Harmless", + "author": ["Douglas", "Adams"] + } + } + ... + +When you try to load this serialized data, Django will use the +``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]`` +into the primary key of an actual ``Person`` object. + +.. note:: + + Whatever fields you use for a natural key must be able to uniquely + identify an object. This will usually mean that your model will + have a uniqueness clause (either unique=True on a single field, or + ``unique_together`` over multiple fields) for the field or fields + in your natural key. However, uniqueness doesn't need to be + enforced at the database level. If you are certain that a set of + fields will be effectively unique, you can still use those fields + as a natural key. + +Serialization of natural keys +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +So how do you get Django to emit a natural key when serializing an object? +Firstly, you need to add another method -- this time to the model itself:: + + class Person(models.Model): + objects = PersonManager() + + first_name = models.CharField(max_length=100) + last_name = models.CharField(max_length=100) + + birthdate = models.DateField() + + def natural_key(self): + return (self.first_name, self.last_name) + + class Meta: + unique_together = (('first_name', 'last_name'),) + +That method should always return a natural key tuple -- in this +example, ``(first name, last name)``. Then, when you call +``serializers.serialize()``, you provide a ``use_natural_keys=True`` +argument:: + + >>> serializers.serialize('json', [book1, book2], indent=2, use_natural_keys=True) + +When ``use_natural_keys=True`` is specified, Django will use the +``natural_key()`` method to serialize any reference to objects of the +type that defines the method. + +If you are using :djadmin:`dumpdata` to generate serialized data, you +use the `--natural` command line flag to generate natural keys. + +.. note:: + + You don't need to define both ``natural_key()`` and + ``get_by_natural_key()``. If you don't want Django to output + natural keys during serialization, but you want to retain the + ability to load natural keys, then you can opt to not implement + the ``natural_key()`` method. + + Conversely, if (for some strange reason) you want Django to output + natural keys during serialization, but *not* be able to load those + key values, just don't define the ``get_by_natural_key()`` method. + +Dependencies during serialization +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Since natural keys rely on database lookups to resolve references, it +is important that data exists before it is referenced. You can't make +a `forward reference` with natural keys - the data you are referencing +must exist before you include a natural key reference to that data. + +To accommodate this limitation, calls to :djadmin:`dumpdata` that use +the :djadminopt:`--natural` option will serialize any model with a +``natural_key()`` method before it serializes normal key objects. + +However, this may not always be enough. If your natural key refers to +another object (by using a foreign key or natural key to another object +as part of a natural key), then you need to be able to ensure that +the objects on which a natural key depends occur in the serialized data +before the natural key requires them. + +To control this ordering, you can define dependencies on your +``natural_key()`` methods. You do this by setting a ``dependencies`` +attribute on the ``natural_key()`` method itself. + +For example, consider the ``Permission`` model in ``contrib.auth``. +The following is a simplified version of the ``Permission`` model:: + + class Permission(models.Model): + name = models.CharField(max_length=50) + content_type = models.ForeignKey(ContentType) + codename = models.CharField(max_length=100) + # ... + def natural_key(self): + return (self.codename,) + self.content_type.natural_key() + +The natural key for a ``Permission`` is a combination of the codename for the +``Permission``, and the ``ContentType`` to which the ``Permission`` applies. This means +that ``ContentType`` must be serialized before ``Permission``. To define this +dependency, we add one extra line:: + + class Permission(models.Model): + # ... + def natural_key(self): + return (self.codename,) + self.content_type.natural_key() + natural_key.dependencies = ['contenttypes.contenttype'] + +This definition ensures that ``ContentType`` models are serialized before +``Permission`` models. In turn, any object referencing ``Permission`` will +be serialized after both ``ContentType`` and ``Permission``.