parts/django/docs/topics/serialization.txt
changeset 307 c6bca38c1cbf
equal deleted inserted replaced
306:5ff1fc726848 307:c6bca38c1cbf
       
     1 ==========================
       
     2 Serializing Django objects
       
     3 ==========================
       
     4 
       
     5 Django's serialization framework provides a mechanism for "translating" Django
       
     6 objects into other formats. Usually these other formats will be text-based and
       
     7 used for sending Django objects over a wire, but it's possible for a
       
     8 serializer to handle any format (text-based or not).
       
     9 
       
    10 .. seealso::
       
    11 
       
    12     If you just want to get some data from your tables into a serialized
       
    13     form, you could use the :djadmin:`dumpdata` management command.
       
    14 
       
    15 Serializing data
       
    16 ----------------
       
    17 
       
    18 At the highest level, serializing data is a very simple operation::
       
    19 
       
    20     from django.core import serializers
       
    21     data = serializers.serialize("xml", SomeModel.objects.all())
       
    22 
       
    23 The arguments to the ``serialize`` function are the format to serialize the data
       
    24 to (see `Serialization formats`_) and a :class:`~django.db.models.QuerySet` to
       
    25 serialize. (Actually, the second argument can be any iterator that yields Django
       
    26 objects, but it'll almost always be a QuerySet).
       
    27 
       
    28 You can also use a serializer object directly::
       
    29 
       
    30     XMLSerializer = serializers.get_serializer("xml")
       
    31     xml_serializer = XMLSerializer()
       
    32     xml_serializer.serialize(queryset)
       
    33     data = xml_serializer.getvalue()
       
    34 
       
    35 This is useful if you want to serialize data directly to a file-like object
       
    36 (which includes an :class:`~django.http.HttpResponse`)::
       
    37 
       
    38     out = open("file.xml", "w")
       
    39     xml_serializer.serialize(SomeModel.objects.all(), stream=out)
       
    40 
       
    41 Subset of fields
       
    42 ~~~~~~~~~~~~~~~~
       
    43 
       
    44 If you only want a subset of fields to be serialized, you can
       
    45 specify a ``fields`` argument to the serializer::
       
    46 
       
    47     from django.core import serializers
       
    48     data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size'))
       
    49 
       
    50 In this example, only the ``name`` and ``size`` attributes of each model will
       
    51 be serialized.
       
    52 
       
    53 .. note::
       
    54 
       
    55     Depending on your model, you may find that it is not possible to
       
    56     deserialize a model that only serializes a subset of its fields. If a
       
    57     serialized object doesn't specify all the fields that are required by a
       
    58     model, the deserializer will not be able to save deserialized instances.
       
    59 
       
    60 Inherited Models
       
    61 ~~~~~~~~~~~~~~~~
       
    62 
       
    63 If you have a model that is defined using an :ref:`abstract base class
       
    64 <abstract-base-classes>`, you don't have to do anything special to serialize
       
    65 that model. Just call the serializer on the object (or objects) that you want to
       
    66 serialize, and the output will be a complete representation of the serialized
       
    67 object.
       
    68 
       
    69 However, if you have a model that uses :ref:`multi-table inheritance
       
    70 <multi-table-inheritance>`, you also need to serialize all of the base classes
       
    71 for the model. This is because only the fields that are locally defined on the
       
    72 model will be serialized. For example, consider the following models::
       
    73 
       
    74     class Place(models.Model):
       
    75         name = models.CharField(max_length=50)
       
    76 
       
    77     class Restaurant(Place):
       
    78         serves_hot_dogs = models.BooleanField()
       
    79 
       
    80 If you only serialize the Restaurant model::
       
    81 
       
    82     data = serializers.serialize('xml', Restaurant.objects.all())
       
    83 
       
    84 the fields on the serialized output will only contain the `serves_hot_dogs`
       
    85 attribute. The `name` attribute of the base class will be ignored.
       
    86 
       
    87 In order to fully serialize your Restaurant instances, you will need to
       
    88 serialize the Place models as well::
       
    89 
       
    90     all_objects = list(Restaurant.objects.all()) + list(Place.objects.all())
       
    91     data = serializers.serialize('xml', all_objects)
       
    92 
       
    93 Deserializing data
       
    94 ------------------
       
    95 
       
    96 Deserializing data is also a fairly simple operation::
       
    97 
       
    98     for obj in serializers.deserialize("xml", data):
       
    99         do_something_with(obj)
       
   100 
       
   101 As you can see, the ``deserialize`` function takes the same format argument as
       
   102 ``serialize``, a string or stream of data, and returns an iterator.
       
   103 
       
   104 However, here it gets slightly complicated. The objects returned by the
       
   105 ``deserialize`` iterator *aren't* simple Django objects. Instead, they are
       
   106 special ``DeserializedObject`` instances that wrap a created -- but unsaved --
       
   107 object and any associated relationship data.
       
   108 
       
   109 Calling ``DeserializedObject.save()`` saves the object to the database.
       
   110 
       
   111 This ensures that deserializing is a non-destructive operation even if the
       
   112 data in your serialized representation doesn't match what's currently in the
       
   113 database. Usually, working with these ``DeserializedObject`` instances looks
       
   114 something like::
       
   115 
       
   116     for deserialized_object in serializers.deserialize("xml", data):
       
   117         if object_should_be_saved(deserialized_object):
       
   118             deserialized_object.save()
       
   119 
       
   120 In other words, the usual use is to examine the deserialized objects to make
       
   121 sure that they are "appropriate" for saving before doing so.  Of course, if you
       
   122 trust your data source you could just save the object and move on.
       
   123 
       
   124 The Django object itself can be inspected as ``deserialized_object.object``.
       
   125 
       
   126 .. _serialization-formats:
       
   127 
       
   128 Serialization formats
       
   129 ---------------------
       
   130 
       
   131 Django supports a number of serialization formats, some of which require you
       
   132 to install third-party Python modules:
       
   133 
       
   134     ==========  ==============================================================
       
   135     Identifier  Information
       
   136     ==========  ==============================================================
       
   137     ``xml``     Serializes to and from a simple XML dialect.
       
   138 
       
   139     ``json``    Serializes to and from JSON_ (using a version of simplejson_
       
   140                 bundled with Django).
       
   141 
       
   142     ``yaml``    Serializes to YAML (YAML Ain't a Markup Language). This
       
   143                 serializer is only available if PyYAML_ is installed.
       
   144     ==========  ==============================================================
       
   145 
       
   146 .. _json: http://json.org/
       
   147 .. _simplejson: http://undefined.org/python/#simplejson
       
   148 .. _PyYAML: http://www.pyyaml.org/
       
   149 
       
   150 Notes for specific serialization formats
       
   151 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       
   152 
       
   153 json
       
   154 ^^^^
       
   155 
       
   156 If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON
       
   157 serializer, you must pass ``ensure_ascii=False`` as a parameter to the
       
   158 ``serialize()`` call. Otherwise, the output won't be encoded correctly.
       
   159 
       
   160 For example::
       
   161 
       
   162     json_serializer = serializers.get_serializer("json")()
       
   163     json_serializer.serialize(queryset, ensure_ascii=False, stream=response)
       
   164 
       
   165 The Django source code includes the simplejson_ module. However, if you're
       
   166 using Python 2.6 or later (which includes a builtin version of the module), Django will
       
   167 use the builtin ``json`` module automatically. If you have a system installed
       
   168 version that includes the C-based speedup extension, or your system version is
       
   169 more recent than the version shipped with Django (currently, 2.0.7), the
       
   170 system version will be used instead of the version included with Django.
       
   171 
       
   172 Be aware that if you're serializing using that module directly, not all Django
       
   173 output can be passed unmodified to simplejson. In particular, :ref:`lazy
       
   174 translation objects <lazy-translations>` need a `special encoder`_ written for
       
   175 them. Something like this will work::
       
   176 
       
   177     from django.utils.functional import Promise
       
   178     from django.utils.encoding import force_unicode
       
   179 
       
   180     class LazyEncoder(simplejson.JSONEncoder):
       
   181         def default(self, obj):
       
   182             if isinstance(obj, Promise):
       
   183                 return force_unicode(obj)
       
   184             return super(LazyEncoder, self).default(obj)
       
   185 
       
   186 .. _special encoder: http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html
       
   187 
       
   188 .. _topics-serialization-natural-keys:
       
   189 
       
   190 Natural keys
       
   191 ------------
       
   192 
       
   193 .. versionadded:: 1.2
       
   194 
       
   195    The ability to use natural keys when serializing/deserializing data was
       
   196    added in the 1.2 release.
       
   197 
       
   198 The default serialization strategy for foreign keys and many-to-many
       
   199 relations is to serialize the value of the primary key(s) of the
       
   200 objects in the relation. This strategy works well for most types of
       
   201 object, but it can cause difficulty in some circumstances.
       
   202 
       
   203 Consider the case of a list of objects that have foreign key on
       
   204 :class:`ContentType`. If you're going to serialize an object that
       
   205 refers to a content type, you need to have a way to refer to that
       
   206 content type. Content Types are automatically created by Django as
       
   207 part of the database synchronization process, so you don't need to
       
   208 include content types in a fixture or other serialized data. As a
       
   209 result, the primary key of any given content type isn't easy to
       
   210 predict - it will depend on how and when :djadmin:`syncdb` was
       
   211 executed to create the content types.
       
   212 
       
   213 There is also the matter of convenience. An integer id isn't always
       
   214 the most convenient way to refer to an object; sometimes, a
       
   215 more natural reference would be helpful.
       
   216 
       
   217 It is for these reasons that Django provides *natural keys*. A natural
       
   218 key is a tuple of values that can be used to uniquely identify an
       
   219 object instance without using the primary key value.
       
   220 
       
   221 Deserialization of natural keys
       
   222 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       
   223 
       
   224 Consider the following two models::
       
   225 
       
   226     from django.db import models
       
   227 
       
   228     class Person(models.Model):
       
   229         first_name = models.CharField(max_length=100)
       
   230         last_name = models.CharField(max_length=100)
       
   231 
       
   232         birthdate = models.DateField()
       
   233 
       
   234         class Meta:
       
   235             unique_together = (('first_name', 'last_name'),)
       
   236 
       
   237     class Book(models.Model):
       
   238         name = models.CharField(max_length=100)
       
   239         author = models.ForeignKey(Person)
       
   240 
       
   241 Ordinarily, serialized data for ``Book`` would use an integer to refer to
       
   242 the author. For example, in JSON, a Book might be serialized as::
       
   243 
       
   244     ...
       
   245     {
       
   246         "pk": 1,
       
   247         "model": "store.book",
       
   248         "fields": {
       
   249             "name": "Mostly Harmless",
       
   250             "author": 42
       
   251         }
       
   252     }
       
   253     ...
       
   254 
       
   255 This isn't a particularly natural way to refer to an author. It
       
   256 requires that you know the primary key value for the author; it also
       
   257 requires that this primary key value is stable and predictable.
       
   258 
       
   259 However, if we add natural key handling to Person, the fixture becomes
       
   260 much more humane. To add natural key handling, you define a default
       
   261 Manager for Person with a ``get_by_natural_key()`` method. In the case
       
   262 of a Person, a good natural key might be the pair of first and last
       
   263 name::
       
   264 
       
   265     from django.db import models
       
   266 
       
   267     class PersonManager(models.Manager):
       
   268         def get_by_natural_key(self, first_name, last_name):
       
   269             return self.get(first_name=first_name, last_name=last_name)
       
   270 
       
   271     class Person(models.Model):
       
   272         objects = PersonManager()
       
   273 
       
   274         first_name = models.CharField(max_length=100)
       
   275         last_name = models.CharField(max_length=100)
       
   276 
       
   277         birthdate = models.DateField()
       
   278 
       
   279         class Meta:
       
   280             unique_together = (('first_name', 'last_name'),)
       
   281 
       
   282 Now books can use that natural key to refer to ``Person`` objects::
       
   283 
       
   284     ...
       
   285     {
       
   286         "pk": 1,
       
   287         "model": "store.book",
       
   288         "fields": {
       
   289             "name": "Mostly Harmless",
       
   290             "author": ["Douglas", "Adams"]
       
   291         }
       
   292     }
       
   293     ...
       
   294 
       
   295 When you try to load this serialized data, Django will use the
       
   296 ``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]``
       
   297 into the primary key of an actual ``Person`` object.
       
   298 
       
   299 .. note::
       
   300 
       
   301     Whatever fields you use for a natural key must be able to uniquely
       
   302     identify an object. This will usually mean that your model will
       
   303     have a uniqueness clause (either unique=True on a single field, or
       
   304     ``unique_together`` over multiple fields) for the field or fields
       
   305     in your natural key. However, uniqueness doesn't need to be
       
   306     enforced at the database level. If you are certain that a set of
       
   307     fields will be effectively unique, you can still use those fields
       
   308     as a natural key.
       
   309 
       
   310 Serialization of natural keys
       
   311 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       
   312 
       
   313 So how do you get Django to emit a natural key when serializing an object?
       
   314 Firstly, you need to add another method -- this time to the model itself::
       
   315 
       
   316     class Person(models.Model):
       
   317         objects = PersonManager()
       
   318 
       
   319         first_name = models.CharField(max_length=100)
       
   320         last_name = models.CharField(max_length=100)
       
   321 
       
   322         birthdate = models.DateField()
       
   323 
       
   324         def natural_key(self):
       
   325             return (self.first_name, self.last_name)
       
   326 
       
   327         class Meta:
       
   328             unique_together = (('first_name', 'last_name'),)
       
   329 
       
   330 That method should always return a natural key tuple -- in this
       
   331 example, ``(first name, last name)``. Then, when you call
       
   332 ``serializers.serialize()``, you provide a ``use_natural_keys=True``
       
   333 argument::
       
   334 
       
   335     >>> serializers.serialize('json', [book1, book2], indent=2, use_natural_keys=True)
       
   336 
       
   337 When ``use_natural_keys=True`` is specified, Django will use the
       
   338 ``natural_key()`` method to serialize any reference to objects of the
       
   339 type that defines the method.
       
   340 
       
   341 If you are using :djadmin:`dumpdata` to generate serialized data, you
       
   342 use the `--natural` command line flag to generate natural keys.
       
   343 
       
   344 .. note::
       
   345 
       
   346     You don't need to define both ``natural_key()`` and
       
   347     ``get_by_natural_key()``. If you don't want Django to output
       
   348     natural keys during serialization, but you want to retain the
       
   349     ability to load natural keys, then you can opt to not implement
       
   350     the ``natural_key()`` method.
       
   351 
       
   352     Conversely, if (for some strange reason) you want Django to output
       
   353     natural keys during serialization, but *not* be able to load those
       
   354     key values, just don't define the ``get_by_natural_key()`` method.
       
   355 
       
   356 Dependencies during serialization
       
   357 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       
   358 
       
   359 Since natural keys rely on database lookups to resolve references, it
       
   360 is important that data exists before it is referenced. You can't make
       
   361 a `forward reference` with natural keys - the data you are referencing
       
   362 must exist before you include a natural key reference to that data.
       
   363 
       
   364 To accommodate this limitation, calls to :djadmin:`dumpdata` that use
       
   365 the :djadminopt:`--natural` option will serialize any model with a
       
   366 ``natural_key()`` method before it serializes normal key objects.
       
   367 
       
   368 However, this may not always be enough. If your natural key refers to
       
   369 another object (by using a foreign key or natural key to another object
       
   370 as part of a natural key), then you need to be able to ensure that
       
   371 the objects on which a natural key depends occur in the serialized data
       
   372 before the natural key requires them.
       
   373 
       
   374 To control this ordering, you can define dependencies on your
       
   375 ``natural_key()`` methods. You do this by setting a ``dependencies``
       
   376 attribute on the ``natural_key()`` method itself.
       
   377 
       
   378 For example, consider the ``Permission`` model in ``contrib.auth``.
       
   379 The following is a simplified version of the ``Permission`` model::
       
   380 
       
   381     class Permission(models.Model):
       
   382         name = models.CharField(max_length=50)
       
   383         content_type = models.ForeignKey(ContentType)
       
   384         codename = models.CharField(max_length=100)
       
   385         # ...
       
   386         def natural_key(self):
       
   387             return (self.codename,) + self.content_type.natural_key()
       
   388 
       
   389 The natural key for a ``Permission`` is a combination of the codename for the
       
   390 ``Permission``, and the ``ContentType`` to which the ``Permission`` applies. This means
       
   391 that ``ContentType`` must be serialized before ``Permission``. To define this
       
   392 dependency, we add one extra line::
       
   393 
       
   394     class Permission(models.Model):
       
   395         # ...
       
   396         def natural_key(self):
       
   397             return (self.codename,) + self.content_type.natural_key()
       
   398         natural_key.dependencies = ['contenttypes.contenttype']
       
   399 
       
   400 This definition ensures that ``ContentType`` models are serialized before
       
   401 ``Permission`` models. In turn, any object referencing ``Permission`` will
       
   402 be serialized after both ``ContentType`` and ``Permission``.