|
1 ========================== |
|
2 Serializing Django objects |
|
3 ========================== |
|
4 |
|
5 Django's serialization framework provides a mechanism for "translating" Django |
|
6 objects into other formats. Usually these other formats will be text-based and |
|
7 used for sending Django objects over a wire, but it's possible for a |
|
8 serializer to handle any format (text-based or not). |
|
9 |
|
10 .. seealso:: |
|
11 |
|
12 If you just want to get some data from your tables into a serialized |
|
13 form, you could use the :djadmin:`dumpdata` management command. |
|
14 |
|
15 Serializing data |
|
16 ---------------- |
|
17 |
|
18 At the highest level, serializing data is a very simple operation:: |
|
19 |
|
20 from django.core import serializers |
|
21 data = serializers.serialize("xml", SomeModel.objects.all()) |
|
22 |
|
23 The arguments to the ``serialize`` function are the format to serialize the data |
|
24 to (see `Serialization formats`_) and a :class:`~django.db.models.QuerySet` to |
|
25 serialize. (Actually, the second argument can be any iterator that yields Django |
|
26 objects, but it'll almost always be a QuerySet). |
|
27 |
|
28 You can also use a serializer object directly:: |
|
29 |
|
30 XMLSerializer = serializers.get_serializer("xml") |
|
31 xml_serializer = XMLSerializer() |
|
32 xml_serializer.serialize(queryset) |
|
33 data = xml_serializer.getvalue() |
|
34 |
|
35 This is useful if you want to serialize data directly to a file-like object |
|
36 (which includes an :class:`~django.http.HttpResponse`):: |
|
37 |
|
38 out = open("file.xml", "w") |
|
39 xml_serializer.serialize(SomeModel.objects.all(), stream=out) |
|
40 |
|
41 Subset of fields |
|
42 ~~~~~~~~~~~~~~~~ |
|
43 |
|
44 If you only want a subset of fields to be serialized, you can |
|
45 specify a ``fields`` argument to the serializer:: |
|
46 |
|
47 from django.core import serializers |
|
48 data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size')) |
|
49 |
|
50 In this example, only the ``name`` and ``size`` attributes of each model will |
|
51 be serialized. |
|
52 |
|
53 .. note:: |
|
54 |
|
55 Depending on your model, you may find that it is not possible to |
|
56 deserialize a model that only serializes a subset of its fields. If a |
|
57 serialized object doesn't specify all the fields that are required by a |
|
58 model, the deserializer will not be able to save deserialized instances. |
|
59 |
|
60 Inherited Models |
|
61 ~~~~~~~~~~~~~~~~ |
|
62 |
|
63 If you have a model that is defined using an :ref:`abstract base class |
|
64 <abstract-base-classes>`, you don't have to do anything special to serialize |
|
65 that model. Just call the serializer on the object (or objects) that you want to |
|
66 serialize, and the output will be a complete representation of the serialized |
|
67 object. |
|
68 |
|
69 However, if you have a model that uses :ref:`multi-table inheritance |
|
70 <multi-table-inheritance>`, you also need to serialize all of the base classes |
|
71 for the model. This is because only the fields that are locally defined on the |
|
72 model will be serialized. For example, consider the following models:: |
|
73 |
|
74 class Place(models.Model): |
|
75 name = models.CharField(max_length=50) |
|
76 |
|
77 class Restaurant(Place): |
|
78 serves_hot_dogs = models.BooleanField() |
|
79 |
|
80 If you only serialize the Restaurant model:: |
|
81 |
|
82 data = serializers.serialize('xml', Restaurant.objects.all()) |
|
83 |
|
84 the fields on the serialized output will only contain the `serves_hot_dogs` |
|
85 attribute. The `name` attribute of the base class will be ignored. |
|
86 |
|
87 In order to fully serialize your Restaurant instances, you will need to |
|
88 serialize the Place models as well:: |
|
89 |
|
90 all_objects = list(Restaurant.objects.all()) + list(Place.objects.all()) |
|
91 data = serializers.serialize('xml', all_objects) |
|
92 |
|
93 Deserializing data |
|
94 ------------------ |
|
95 |
|
96 Deserializing data is also a fairly simple operation:: |
|
97 |
|
98 for obj in serializers.deserialize("xml", data): |
|
99 do_something_with(obj) |
|
100 |
|
101 As you can see, the ``deserialize`` function takes the same format argument as |
|
102 ``serialize``, a string or stream of data, and returns an iterator. |
|
103 |
|
104 However, here it gets slightly complicated. The objects returned by the |
|
105 ``deserialize`` iterator *aren't* simple Django objects. Instead, they are |
|
106 special ``DeserializedObject`` instances that wrap a created -- but unsaved -- |
|
107 object and any associated relationship data. |
|
108 |
|
109 Calling ``DeserializedObject.save()`` saves the object to the database. |
|
110 |
|
111 This ensures that deserializing is a non-destructive operation even if the |
|
112 data in your serialized representation doesn't match what's currently in the |
|
113 database. Usually, working with these ``DeserializedObject`` instances looks |
|
114 something like:: |
|
115 |
|
116 for deserialized_object in serializers.deserialize("xml", data): |
|
117 if object_should_be_saved(deserialized_object): |
|
118 deserialized_object.save() |
|
119 |
|
120 In other words, the usual use is to examine the deserialized objects to make |
|
121 sure that they are "appropriate" for saving before doing so. Of course, if you |
|
122 trust your data source you could just save the object and move on. |
|
123 |
|
124 The Django object itself can be inspected as ``deserialized_object.object``. |
|
125 |
|
126 .. _serialization-formats: |
|
127 |
|
128 Serialization formats |
|
129 --------------------- |
|
130 |
|
131 Django supports a number of serialization formats, some of which require you |
|
132 to install third-party Python modules: |
|
133 |
|
134 ========== ============================================================== |
|
135 Identifier Information |
|
136 ========== ============================================================== |
|
137 ``xml`` Serializes to and from a simple XML dialect. |
|
138 |
|
139 ``json`` Serializes to and from JSON_ (using a version of simplejson_ |
|
140 bundled with Django). |
|
141 |
|
142 ``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This |
|
143 serializer is only available if PyYAML_ is installed. |
|
144 ========== ============================================================== |
|
145 |
|
146 .. _json: http://json.org/ |
|
147 .. _simplejson: http://undefined.org/python/#simplejson |
|
148 .. _PyYAML: http://www.pyyaml.org/ |
|
149 |
|
150 Notes for specific serialization formats |
|
151 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
152 |
|
153 json |
|
154 ^^^^ |
|
155 |
|
156 If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON |
|
157 serializer, you must pass ``ensure_ascii=False`` as a parameter to the |
|
158 ``serialize()`` call. Otherwise, the output won't be encoded correctly. |
|
159 |
|
160 For example:: |
|
161 |
|
162 json_serializer = serializers.get_serializer("json")() |
|
163 json_serializer.serialize(queryset, ensure_ascii=False, stream=response) |
|
164 |
|
165 The Django source code includes the simplejson_ module. However, if you're |
|
166 using Python 2.6 or later (which includes a builtin version of the module), Django will |
|
167 use the builtin ``json`` module automatically. If you have a system installed |
|
168 version that includes the C-based speedup extension, or your system version is |
|
169 more recent than the version shipped with Django (currently, 2.0.7), the |
|
170 system version will be used instead of the version included with Django. |
|
171 |
|
172 Be aware that if you're serializing using that module directly, not all Django |
|
173 output can be passed unmodified to simplejson. In particular, :ref:`lazy |
|
174 translation objects <lazy-translations>` need a `special encoder`_ written for |
|
175 them. Something like this will work:: |
|
176 |
|
177 from django.utils.functional import Promise |
|
178 from django.utils.encoding import force_unicode |
|
179 |
|
180 class LazyEncoder(simplejson.JSONEncoder): |
|
181 def default(self, obj): |
|
182 if isinstance(obj, Promise): |
|
183 return force_unicode(obj) |
|
184 return super(LazyEncoder, self).default(obj) |
|
185 |
|
186 .. _special encoder: http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html |
|
187 |
|
188 .. _topics-serialization-natural-keys: |
|
189 |
|
190 Natural keys |
|
191 ------------ |
|
192 |
|
193 .. versionadded:: 1.2 |
|
194 |
|
195 The ability to use natural keys when serializing/deserializing data was |
|
196 added in the 1.2 release. |
|
197 |
|
198 The default serialization strategy for foreign keys and many-to-many |
|
199 relations is to serialize the value of the primary key(s) of the |
|
200 objects in the relation. This strategy works well for most types of |
|
201 object, but it can cause difficulty in some circumstances. |
|
202 |
|
203 Consider the case of a list of objects that have foreign key on |
|
204 :class:`ContentType`. If you're going to serialize an object that |
|
205 refers to a content type, you need to have a way to refer to that |
|
206 content type. Content Types are automatically created by Django as |
|
207 part of the database synchronization process, so you don't need to |
|
208 include content types in a fixture or other serialized data. As a |
|
209 result, the primary key of any given content type isn't easy to |
|
210 predict - it will depend on how and when :djadmin:`syncdb` was |
|
211 executed to create the content types. |
|
212 |
|
213 There is also the matter of convenience. An integer id isn't always |
|
214 the most convenient way to refer to an object; sometimes, a |
|
215 more natural reference would be helpful. |
|
216 |
|
217 It is for these reasons that Django provides *natural keys*. A natural |
|
218 key is a tuple of values that can be used to uniquely identify an |
|
219 object instance without using the primary key value. |
|
220 |
|
221 Deserialization of natural keys |
|
222 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
223 |
|
224 Consider the following two models:: |
|
225 |
|
226 from django.db import models |
|
227 |
|
228 class Person(models.Model): |
|
229 first_name = models.CharField(max_length=100) |
|
230 last_name = models.CharField(max_length=100) |
|
231 |
|
232 birthdate = models.DateField() |
|
233 |
|
234 class Meta: |
|
235 unique_together = (('first_name', 'last_name'),) |
|
236 |
|
237 class Book(models.Model): |
|
238 name = models.CharField(max_length=100) |
|
239 author = models.ForeignKey(Person) |
|
240 |
|
241 Ordinarily, serialized data for ``Book`` would use an integer to refer to |
|
242 the author. For example, in JSON, a Book might be serialized as:: |
|
243 |
|
244 ... |
|
245 { |
|
246 "pk": 1, |
|
247 "model": "store.book", |
|
248 "fields": { |
|
249 "name": "Mostly Harmless", |
|
250 "author": 42 |
|
251 } |
|
252 } |
|
253 ... |
|
254 |
|
255 This isn't a particularly natural way to refer to an author. It |
|
256 requires that you know the primary key value for the author; it also |
|
257 requires that this primary key value is stable and predictable. |
|
258 |
|
259 However, if we add natural key handling to Person, the fixture becomes |
|
260 much more humane. To add natural key handling, you define a default |
|
261 Manager for Person with a ``get_by_natural_key()`` method. In the case |
|
262 of a Person, a good natural key might be the pair of first and last |
|
263 name:: |
|
264 |
|
265 from django.db import models |
|
266 |
|
267 class PersonManager(models.Manager): |
|
268 def get_by_natural_key(self, first_name, last_name): |
|
269 return self.get(first_name=first_name, last_name=last_name) |
|
270 |
|
271 class Person(models.Model): |
|
272 objects = PersonManager() |
|
273 |
|
274 first_name = models.CharField(max_length=100) |
|
275 last_name = models.CharField(max_length=100) |
|
276 |
|
277 birthdate = models.DateField() |
|
278 |
|
279 class Meta: |
|
280 unique_together = (('first_name', 'last_name'),) |
|
281 |
|
282 Now books can use that natural key to refer to ``Person`` objects:: |
|
283 |
|
284 ... |
|
285 { |
|
286 "pk": 1, |
|
287 "model": "store.book", |
|
288 "fields": { |
|
289 "name": "Mostly Harmless", |
|
290 "author": ["Douglas", "Adams"] |
|
291 } |
|
292 } |
|
293 ... |
|
294 |
|
295 When you try to load this serialized data, Django will use the |
|
296 ``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]`` |
|
297 into the primary key of an actual ``Person`` object. |
|
298 |
|
299 .. note:: |
|
300 |
|
301 Whatever fields you use for a natural key must be able to uniquely |
|
302 identify an object. This will usually mean that your model will |
|
303 have a uniqueness clause (either unique=True on a single field, or |
|
304 ``unique_together`` over multiple fields) for the field or fields |
|
305 in your natural key. However, uniqueness doesn't need to be |
|
306 enforced at the database level. If you are certain that a set of |
|
307 fields will be effectively unique, you can still use those fields |
|
308 as a natural key. |
|
309 |
|
310 Serialization of natural keys |
|
311 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
312 |
|
313 So how do you get Django to emit a natural key when serializing an object? |
|
314 Firstly, you need to add another method -- this time to the model itself:: |
|
315 |
|
316 class Person(models.Model): |
|
317 objects = PersonManager() |
|
318 |
|
319 first_name = models.CharField(max_length=100) |
|
320 last_name = models.CharField(max_length=100) |
|
321 |
|
322 birthdate = models.DateField() |
|
323 |
|
324 def natural_key(self): |
|
325 return (self.first_name, self.last_name) |
|
326 |
|
327 class Meta: |
|
328 unique_together = (('first_name', 'last_name'),) |
|
329 |
|
330 That method should always return a natural key tuple -- in this |
|
331 example, ``(first name, last name)``. Then, when you call |
|
332 ``serializers.serialize()``, you provide a ``use_natural_keys=True`` |
|
333 argument:: |
|
334 |
|
335 >>> serializers.serialize('json', [book1, book2], indent=2, use_natural_keys=True) |
|
336 |
|
337 When ``use_natural_keys=True`` is specified, Django will use the |
|
338 ``natural_key()`` method to serialize any reference to objects of the |
|
339 type that defines the method. |
|
340 |
|
341 If you are using :djadmin:`dumpdata` to generate serialized data, you |
|
342 use the `--natural` command line flag to generate natural keys. |
|
343 |
|
344 .. note:: |
|
345 |
|
346 You don't need to define both ``natural_key()`` and |
|
347 ``get_by_natural_key()``. If you don't want Django to output |
|
348 natural keys during serialization, but you want to retain the |
|
349 ability to load natural keys, then you can opt to not implement |
|
350 the ``natural_key()`` method. |
|
351 |
|
352 Conversely, if (for some strange reason) you want Django to output |
|
353 natural keys during serialization, but *not* be able to load those |
|
354 key values, just don't define the ``get_by_natural_key()`` method. |
|
355 |
|
356 Dependencies during serialization |
|
357 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
358 |
|
359 Since natural keys rely on database lookups to resolve references, it |
|
360 is important that data exists before it is referenced. You can't make |
|
361 a `forward reference` with natural keys - the data you are referencing |
|
362 must exist before you include a natural key reference to that data. |
|
363 |
|
364 To accommodate this limitation, calls to :djadmin:`dumpdata` that use |
|
365 the :djadminopt:`--natural` option will serialize any model with a |
|
366 ``natural_key()`` method before it serializes normal key objects. |
|
367 |
|
368 However, this may not always be enough. If your natural key refers to |
|
369 another object (by using a foreign key or natural key to another object |
|
370 as part of a natural key), then you need to be able to ensure that |
|
371 the objects on which a natural key depends occur in the serialized data |
|
372 before the natural key requires them. |
|
373 |
|
374 To control this ordering, you can define dependencies on your |
|
375 ``natural_key()`` methods. You do this by setting a ``dependencies`` |
|
376 attribute on the ``natural_key()`` method itself. |
|
377 |
|
378 For example, consider the ``Permission`` model in ``contrib.auth``. |
|
379 The following is a simplified version of the ``Permission`` model:: |
|
380 |
|
381 class Permission(models.Model): |
|
382 name = models.CharField(max_length=50) |
|
383 content_type = models.ForeignKey(ContentType) |
|
384 codename = models.CharField(max_length=100) |
|
385 # ... |
|
386 def natural_key(self): |
|
387 return (self.codename,) + self.content_type.natural_key() |
|
388 |
|
389 The natural key for a ``Permission`` is a combination of the codename for the |
|
390 ``Permission``, and the ``ContentType`` to which the ``Permission`` applies. This means |
|
391 that ``ContentType`` must be serialized before ``Permission``. To define this |
|
392 dependency, we add one extra line:: |
|
393 |
|
394 class Permission(models.Model): |
|
395 # ... |
|
396 def natural_key(self): |
|
397 return (self.codename,) + self.content_type.natural_key() |
|
398 natural_key.dependencies = ['contenttypes.contenttype'] |
|
399 |
|
400 This definition ensures that ``ContentType`` models are serialized before |
|
401 ``Permission`` models. In turn, any object referencing ``Permission`` will |
|
402 be serialized after both ``ContentType`` and ``Permission``. |