parts/django/docs/ref/unicode.txt
author Nishanth Amuluru <nishanth@fossee.in>
Sat, 08 Jan 2011 12:11:30 +0530
changeset 70 dca28aad6760
parent 69 c6bca38c1cbf
permissions -rw-r--r--
corrected the registration templates to suit new design
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
69
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
     1
============
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
     2
Unicode data
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
     3
============
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
     4
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
     5
.. versionadded:: 1.0
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
     6
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
     7
Django natively supports Unicode data everywhere. Providing your database can
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
     8
somehow store the data, you can safely pass around Unicode strings to
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
     9
templates, models and the database.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    10
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    11
This document tells you what you need to know if you're writing applications
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    12
that use data or templates that are encoded in something other than ASCII.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    13
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    14
Creating the database
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    15
=====================
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    16
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    17
Make sure your database is configured to be able to store arbitrary string
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    18
data. Normally, this means giving it an encoding of UTF-8 or UTF-16. If you use
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    19
a more restrictive encoding -- for example, latin1 (iso8859-1) -- you won't be
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    20
able to store certain characters in the database, and information will be lost.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    21
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    22
 * MySQL users, refer to the `MySQL manual`_ (section 9.1.3.2 for MySQL 5.1)
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    23
   for details on how to set or alter the database character set encoding.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    24
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    25
 * PostgreSQL users, refer to the `PostgreSQL manual`_ (section 21.2.2 in
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    26
   PostgreSQL 8) for details on creating databases with the correct encoding.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    27
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    28
 * SQLite users, there is nothing you need to do. SQLite always uses UTF-8
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    29
   for internal encoding.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    30
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    31
.. _MySQL manual: http://dev.mysql.com/doc/refman/5.1/en/charset-database.html
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    32
.. _PostgreSQL manual: http://www.postgresql.org/docs/8.2/static/multibyte.html#AEN24104
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    33
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    34
All of Django's database backends automatically convert Unicode strings into
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    35
the appropriate encoding for talking to the database. They also automatically
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    36
convert strings retrieved from the database into Python Unicode strings. You
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    37
don't even need to tell Django what encoding your database uses: that is
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    38
handled transparently.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    39
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    40
For more, see the section "The database API" below.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    41
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    42
General string handling
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    43
=======================
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    44
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    45
Whenever you use strings with Django -- e.g., in database lookups, template
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    46
rendering or anywhere else -- you have two choices for encoding those strings.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    47
You can use Unicode strings, or you can use normal strings (sometimes called
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    48
"bytestrings") that are encoded using UTF-8.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    49
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    50
.. admonition:: Warning
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    51
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    52
    A bytestring does not carry any information with it about its encoding.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    53
    For that reason, we have to make an assumption, and Django assumes that all
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    54
    bytestrings are in UTF-8.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    55
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    56
    If you pass a string to Django that has been encoded in some other format,
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    57
    things will go wrong in interesting ways. Usually, Django will raise a
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    58
    ``UnicodeDecodeError`` at some point.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    59
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    60
If your code only uses ASCII data, it's safe to use your normal strings,
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    61
passing them around at will, because ASCII is a subset of UTF-8.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    62
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    63
Don't be fooled into thinking that if your :setting:`DEFAULT_CHARSET` setting is set
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    64
to something other than ``'utf-8'`` you can use that other encoding in your
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    65
bytestrings! :setting:`DEFAULT_CHARSET` only applies to the strings generated as
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    66
the result of template rendering (and e-mail). Django will always assume UTF-8
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    67
encoding for internal bytestrings. The reason for this is that the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    68
:setting:`DEFAULT_CHARSET` setting is not actually under your control (if you are the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    69
application developer). It's under the control of the person installing and
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    70
using your application -- and if that person chooses a different setting, your
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    71
code must still continue to work. Ergo, it cannot rely on that setting.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    72
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    73
In most cases when Django is dealing with strings, it will convert them to
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    74
Unicode strings before doing anything else. So, as a general rule, if you pass
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    75
in a bytestring, be prepared to receive a Unicode string back in the result.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    76
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    77
Translated strings
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    78
------------------
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    79
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    80
Aside from Unicode strings and bytestrings, there's a third type of string-like
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    81
object you may encounter when using Django. The framework's
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    82
internationalization features introduce the concept of a "lazy translation" --
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    83
a string that has been marked as translated but whose actual translation result
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    84
isn't determined until the object is used in a string. This feature is useful
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    85
in cases where the translation locale is unknown until the string is used, even
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    86
though the string might have originally been created when the code was first
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    87
imported.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    88
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    89
Normally, you won't have to worry about lazy translations. Just be aware that
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    90
if you examine an object and it claims to be a
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    91
``django.utils.functional.__proxy__`` object, it is a lazy translation.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    92
Calling ``unicode()`` with the lazy translation as the argument will generate a
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    93
Unicode string in the current locale.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    94
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    95
For more details about lazy translation objects, refer to the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    96
:doc:`internationalization </topics/i18n/index>` documentation.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    97
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    98
Useful utility functions
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
    99
------------------------
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   100
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   101
Because some string operations come up again and again, Django ships with a few
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   102
useful functions that should make working with Unicode and bytestring objects
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   103
a bit easier.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   104
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   105
Conversion functions
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   106
~~~~~~~~~~~~~~~~~~~~
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   107
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   108
The ``django.utils.encoding`` module contains a few functions that are handy
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   109
for converting back and forth between Unicode and bytestrings.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   110
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   111
    * ``smart_unicode(s, encoding='utf-8', strings_only=False, errors='strict')``
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   112
      converts its input to a Unicode string. The ``encoding`` parameter
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   113
      specifies the input encoding. (For example, Django uses this internally
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   114
      when processing form input data, which might not be UTF-8 encoded.) The
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   115
      ``strings_only`` parameter, if set to True, will result in Python
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   116
      numbers, booleans and ``None`` not being converted to a string (they keep
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   117
      their original types). The ``errors`` parameter takes any of the values
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   118
      that are accepted by Python's ``unicode()`` function for its error
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   119
      handling.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   120
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   121
      If you pass ``smart_unicode()`` an object that has a ``__unicode__``
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   122
      method, it will use that method to do the conversion.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   123
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   124
    * ``force_unicode(s, encoding='utf-8', strings_only=False,
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   125
      errors='strict')`` is identical to ``smart_unicode()`` in almost all
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   126
      cases. The difference is when the first argument is a :ref:`lazy
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   127
      translation <lazy-translations>` instance. While ``smart_unicode()``
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   128
      preserves lazy translations, ``force_unicode()`` forces those objects to a
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   129
      Unicode string (causing the translation to occur). Normally, you'll want
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   130
      to use ``smart_unicode()``. However, ``force_unicode()`` is useful in
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   131
      template tags and filters that absolutely *must* have a string to work
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   132
      with, not just something that can be converted to a string.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   133
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   134
    * ``smart_str(s, encoding='utf-8', strings_only=False, errors='strict')``
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   135
      is essentially the opposite of ``smart_unicode()``. It forces the first
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   136
      argument to a bytestring. The ``strings_only`` parameter has the same
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   137
      behavior as for ``smart_unicode()`` and ``force_unicode()``. This is
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   138
      slightly different semantics from Python's builtin ``str()`` function,
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   139
      but the difference is needed in a few places within Django's internals.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   140
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   141
Normally, you'll only need to use ``smart_unicode()``. Call it as early as
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   142
possible on any input data that might be either Unicode or a bytestring, and
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   143
from then on, you can treat the result as always being Unicode.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   144
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   145
URI and IRI handling
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   146
~~~~~~~~~~~~~~~~~~~~
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   147
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   148
Web frameworks have to deal with URLs (which are a type of IRI_). One
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   149
requirement of URLs is that they are encoded using only ASCII characters.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   150
However, in an international environment, you might need to construct a
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   151
URL from an IRI_ -- very loosely speaking, a URI that can contain Unicode
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   152
characters. Quoting and converting an IRI to URI can be a little tricky, so
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   153
Django provides some assistance.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   154
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   155
    * The function ``django.utils.encoding.iri_to_uri()`` implements the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   156
      conversion from IRI to URI as required by the specification (`RFC
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   157
      3987`_).
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   158
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   159
    * The functions ``django.utils.http.urlquote()`` and
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   160
      ``django.utils.http.urlquote_plus()`` are versions of Python's standard
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   161
      ``urllib.quote()`` and ``urllib.quote_plus()`` that work with non-ASCII
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   162
      characters. (The data is converted to UTF-8 prior to encoding.)
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   163
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   164
These two groups of functions have slightly different purposes, and it's
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   165
important to keep them straight. Normally, you would use ``urlquote()`` on the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   166
individual portions of the IRI or URI path so that any reserved characters
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   167
such as '&' or '%' are correctly encoded. Then, you apply ``iri_to_uri()`` to
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   168
the full IRI and it converts any non-ASCII characters to the correct encoded
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   169
values.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   170
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   171
.. note::
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   172
    Technically, it isn't correct to say that ``iri_to_uri()`` implements the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   173
    full algorithm in the IRI specification. It doesn't (yet) perform the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   174
    international domain name encoding portion of the algorithm.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   175
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   176
The ``iri_to_uri()`` function will not change ASCII characters that are
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   177
otherwise permitted in a URL. So, for example, the character '%' is not
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   178
further encoded when passed to ``iri_to_uri()``. This means you can pass a
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   179
full URL to this function and it will not mess up the query string or anything
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   180
like that.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   181
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   182
An example might clarify things here::
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   183
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   184
    >>> urlquote(u'Paris & Orléans')
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   185
    u'Paris%20%26%20Orl%C3%A9ans'
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   186
    >>> iri_to_uri(u'/favorites/François/%s' % urlquote(u'Paris & Orléans'))
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   187
    '/favorites/Fran%C3%A7ois/Paris%20%26%20Orl%C3%A9ans'
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   188
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   189
If you look carefully, you can see that the portion that was generated by
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   190
``urlquote()`` in the second example was not double-quoted when passed to
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   191
``iri_to_uri()``. This is a very important and useful feature. It means that
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   192
you can construct your IRI without worrying about whether it contains
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   193
non-ASCII characters and then, right at the end, call ``iri_to_uri()`` on the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   194
result.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   195
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   196
The ``iri_to_uri()`` function is also idempotent, which means the following is
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   197
always true::
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   198
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   199
    iri_to_uri(iri_to_uri(some_string)) = iri_to_uri(some_string)
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   200
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   201
So you can safely call it multiple times on the same IRI without risking
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   202
double-quoting problems.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   203
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   204
.. _URI: http://www.ietf.org/rfc/rfc2396.txt
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   205
.. _IRI: http://www.ietf.org/rfc/rfc3987.txt
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   206
.. _RFC 3987: IRI_
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   207
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   208
Models
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   209
======
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   210
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   211
Because all strings are returned from the database as Unicode strings, model
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   212
fields that are character based (CharField, TextField, URLField, etc) will
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   213
contain Unicode values when Django retrieves data from the database. This
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   214
is *always* the case, even if the data could fit into an ASCII bytestring.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   215
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   216
You can pass in bytestrings when creating a model or populating a field, and
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   217
Django will convert it to Unicode when it needs to.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   218
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   219
Choosing between ``__str__()`` and ``__unicode__()``
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   220
----------------------------------------------------
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   221
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   222
One consequence of using Unicode by default is that you have to take some care
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   223
when printing data from the model.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   224
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   225
In particular, rather than giving your model a ``__str__()`` method, we
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   226
recommended you implement a ``__unicode__()`` method. In the ``__unicode__()``
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   227
method, you can quite safely return the values of all your fields without
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   228
having to worry about whether they fit into a bytestring or not. (The way
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   229
Python works, the result of ``__str__()`` is *always* a bytestring, even if you
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   230
accidentally try to return a Unicode object).
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   231
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   232
You can still create a ``__str__()`` method on your models if you want, of
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   233
course, but you shouldn't need to do this unless you have a good reason.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   234
Django's ``Model`` base class automatically provides a ``__str__()``
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   235
implementation that calls ``__unicode__()`` and encodes the result into UTF-8.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   236
This means you'll normally only need to implement a ``__unicode__()`` method
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   237
and let Django handle the coercion to a bytestring when required.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   238
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   239
Taking care in ``get_absolute_url()``
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   240
-------------------------------------
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   241
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   242
URLs can only contain ASCII characters. If you're constructing a URL from
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   243
pieces of data that might be non-ASCII, be careful to encode the results in a
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   244
way that is suitable for a URL. The ``django.db.models.permalink()`` decorator
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   245
handles this for you automatically.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   246
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   247
If you're constructing a URL manually (i.e., *not* using the ``permalink()``
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   248
decorator), you'll need to take care of the encoding yourself. In this case,
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   249
use the ``iri_to_uri()`` and ``urlquote()`` functions that were documented
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   250
above_. For example::
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   251
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   252
    from django.utils.encoding import iri_to_uri
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   253
    from django.utils.http import urlquote
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   254
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   255
    def get_absolute_url(self):
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   256
        url = u'/person/%s/?x=0&y=0' % urlquote(self.location)
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   257
        return iri_to_uri(url)
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   258
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   259
This function returns a correctly encoded URL even if ``self.location`` is
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   260
something like "Jack visited Paris & Orléans". (In fact, the ``iri_to_uri()``
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   261
call isn't strictly necessary in the above example, because all the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   262
non-ASCII characters would have been removed in quoting in the first line.)
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   263
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   264
.. _above: `URI and IRI handling`_
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   265
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   266
The database API
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   267
================
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   268
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   269
You can pass either Unicode strings or UTF-8 bytestrings as arguments to
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   270
``filter()`` methods and the like in the database API. The following two
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   271
querysets are identical::
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   272
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   273
    qs = People.objects.filter(name__contains=u'Å')
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   274
    qs = People.objects.filter(name__contains='\xc3\x85') # UTF-8 encoding of Å
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   275
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   276
Templates
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   277
=========
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   278
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   279
You can use either Unicode or bytestrings when creating templates manually::
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   280
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   281
	from django.template import Template
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   282
	t1 = Template('This is a bytestring template.')
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   283
	t2 = Template(u'This is a Unicode template.')
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   284
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   285
But the common case is to read templates from the filesystem, and this creates
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   286
a slight complication: not all filesystems store their data encoded as UTF-8.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   287
If your template files are not stored with a UTF-8 encoding, set the :setting:`FILE_CHARSET`
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   288
setting to the encoding of the files on disk. When Django reads in a template
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   289
file, it will convert the data from this encoding to Unicode. (:setting:`FILE_CHARSET`
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   290
is set to ``'utf-8'`` by default.)
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   291
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   292
The :setting:`DEFAULT_CHARSET` setting controls the encoding of rendered templates.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   293
This is set to UTF-8 by default.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   294
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   295
Template tags and filters
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   296
-------------------------
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   297
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   298
A couple of tips to remember when writing your own template tags and filters:
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   299
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   300
    * Always return Unicode strings from a template tag's ``render()`` method
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   301
      and from template filters.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   302
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   303
    * Use ``force_unicode()`` in preference to ``smart_unicode()`` in these
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   304
      places. Tag rendering and filter calls occur as the template is being
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   305
      rendered, so there is no advantage to postponing the conversion of lazy
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   306
      translation objects into strings. It's easier to work solely with Unicode
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   307
      strings at that point.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   308
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   309
E-mail
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   310
======
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   311
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   312
Django's e-mail framework (in ``django.core.mail``) supports Unicode
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   313
transparently. You can use Unicode data in the message bodies and any headers.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   314
However, you're still obligated to respect the requirements of the e-mail
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   315
specifications, so, for example, e-mail addresses should use only ASCII
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   316
characters.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   317
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   318
The following code example demonstrates that everything except e-mail addresses
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   319
can be non-ASCII::
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   320
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   321
    from django.core.mail import EmailMessage
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   322
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   323
    subject = u'My visit to Sør-Trøndelag'
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   324
    sender = u'Arnbjörg Ráðormsdóttir <arnbjorg@example.com>'
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   325
    recipients = ['Fred <fred@example.com']
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   326
    body = u'...'
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   327
    EmailMessage(subject, body, sender, recipients).send()
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   328
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   329
Form submission
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   330
===============
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   331
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   332
HTML form submission is a tricky area. There's no guarantee that the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   333
submission will include encoding information, which means the framework might
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   334
have to guess at the encoding of submitted data.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   335
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   336
Django adopts a "lazy" approach to decoding form data. The data in an
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   337
``HttpRequest`` object is only decoded when you access it. In fact, most of
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   338
the data is not decoded at all. Only the ``HttpRequest.GET`` and
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   339
``HttpRequest.POST`` data structures have any decoding applied to them. Those
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   340
two fields will return their members as Unicode data. All other attributes and
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   341
methods of ``HttpRequest`` return data exactly as it was submitted by the
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   342
client.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   343
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   344
By default, the :setting:`DEFAULT_CHARSET` setting is used as the assumed encoding
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   345
for form data. If you need to change this for a particular form, you can set
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   346
the ``encoding`` attribute on an ``HttpRequest`` instance. For example::
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   347
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   348
    def some_view(request):
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   349
        # We know that the data must be encoded as KOI8-R (for some reason).
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   350
        request.encoding = 'koi8-r'
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   351
        ...
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   352
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   353
You can even change the encoding after having accessed ``request.GET`` or
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   354
``request.POST``, and all subsequent accesses will use the new encoding.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   355
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   356
Most developers won't need to worry about changing form encoding, but this is
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   357
a useful feature for applications that talk to legacy systems whose encoding
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   358
you cannot control.
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   359
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   360
Django does not decode the data of file uploads, because that data is normally
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   361
treated as collections of bytes, rather than strings. Any automatic decoding
c6bca38c1cbf Added buildout stuff and made changes accordingly
Nishanth Amuluru <nishanth@fossee.in>
parents:
diff changeset
   362
there would alter the meaning of the stream of bytes.