parts/django/docs/topics/db/sql.txt
author Nishanth Amuluru <nishanth@fossee.in>
Tue, 11 Jan 2011 14:57:16 +0530
changeset 381 da4c6b1cec7d
parent 307 c6bca38c1cbf
permissions -rw-r--r--
add reviewer works now

==========================
Performing raw SQL queries
==========================

.. currentmodule:: django.db.models

When the :doc:`model query APIs </topics/db/queries>` don't go far enough, you
can fall back to writing raw SQL. Django gives you two ways of performing raw
SQL queries: you can use :meth:`Manager.raw()` to `perform raw queries and
return model instances`__, or you can avoid the model layer entirely and
`execute custom SQL directly`__.

__ `performing raw queries`_
__ `executing custom SQL directly`_

Performing raw queries
======================

.. versionadded:: 1.2

The ``raw()`` manager method can be used to perform raw SQL queries that
return model instances:

.. method:: Manager.raw(raw_query, params=None, translations=None)

This method method takes a raw SQL query, executes it, and returns a
:class:`~django.db.models.query.RawQuerySet` instance. This
:class:`~django.db.models.query.RawQuerySet` instance can be iterated
over just like an normal QuerySet to provide object instances.

This is best illustrated with an example. Suppose you've got the following model::

    class Person(models.Model):
        first_name = models.CharField(...)
        last_name = models.CharField(...)
        birth_date = models.DateField(...)

You could then execute custom SQL like so::

    >>> for p in Person.objects.raw('SELECT * FROM myapp_person'):
    ...     print p
    John Smith
    Jane Jones

.. admonition:: Model table names

    Where'd the name of the ``Person`` table come from in that example?

    By default, Django figures out a database table name by joining the
    model's "app label" -- the name you used in ``manage.py startapp`` -- to
    the model's class name, with an underscore between them. In the example
    we've assumed that the ``Person`` model lives in an app named ``myapp``,
    so its table would be ``myapp_person``.

    For more details check out the documentation for the
    :attr:`~Options.db_table` option, which also lets you manually set the
    database table name.

Of course, this example isn't very exciting -- it's exactly the same as
running ``Person.objects.all()``. However, ``raw()`` has a bunch of other
options that make it very powerful.

Mapping query fields to model fields
------------------------------------

``raw()`` automatically maps fields in the query to fields on the model.

The order of fields in your query doesn't matter. In other words, both
of the following queries work identically::

    >>> Person.objects.raw('SELECT id, first_name, last_name, birth_date FROM myapp_person')
    ...
    >>> Person.objects.raw('SELECT last_name, birth_date, first_name, id FROM myapp_person')
    ...

Matching is done by name. This means that you can use SQL's ``AS`` clauses to
map fields in the query to model fields. So if you had some other table that
had ``Person`` data in it, you could easily map it into ``Person`` instances::

    >>> Person.objects.raw('''SELECT first AS first_name,
    ...                              last AS last_name,
    ...                              bd AS birth_date,
    ...                              pk as id,
    ...                       FROM some_other_table''')

As long as the names match, the model instances will be created correctly.

Alternatively, you can map fields in the query to model fields using the
``translations`` argument to ``raw()``. This is a dictionary mapping names of
fields in the query to names of fields on the model. For example, the above
query could also be written::

    >>> name_map = {'first': 'first_name', 'last': 'last_name', 'bd': 'birth_date', 'pk': 'id'}
    >>> Person.objects.raw('SELECT * FROM some_other_table', translations=name_map)

Index lookups
-------------

``raw()`` supports indexing, so if you need only the first result you can
write::

    >>> first_person = Person.objects.raw('SELECT * from myapp_person')[0]

However, the indexing and slicing are not performed at the database level. If
you have a big amount of ``Person`` objects in your database, it is more
efficient to limit the query at the SQL level::

    >>> first_person = Person.objects.raw('SELECT * from myapp_person LIMIT 1')[0]

Deferring model fields
----------------------

Fields may also be left out::

    >>> people = Person.objects.raw('SELECT id, first_name FROM myapp_person')

The ``Person`` objects returned by this query will be deferred model instances
(see :meth:`~django.db.models.QuerySet.defer()`). This means that the fields
that are omitted from the query will be loaded on demand. For example::

    >>> for p in Person.objects.raw('SELECT id, first_name FROM myapp_person'):
    ...     print p.first_name, # This will be retrieved by the original query
    ...     print p.last_name # This will be retrieved on demand
    ...
    John Smith
    Jane Jones

From outward appearances, this looks like the query has retrieved both
the first name and last name. However, this example actually issued 3
queries. Only the first names were retrieved by the raw() query -- the
last names were both retrieved on demand when they were printed.

There is only one field that you can't leave out - the primary key
field. Django uses the primary key to identify model instances, so it
must always be included in a raw query. An ``InvalidQuery`` exception
will be raised if you forget to include the primary key.

Adding annotations
------------------

You can also execute queries containing fields that aren't defined on the
model. For example, we could use `PostgreSQL's age() function`__ to get a list
of people with their ages calculated by the database::

    >>> people = Person.objects.raw('SELECT *, age(birth_date) AS age FROM myapp_person')
    >>> for p in people:
    ...     print "%s is %s." % (p.first_name, p.age)
    John is 37.
    Jane is 42.
    ...

__ http://www.postgresql.org/docs/8.4/static/functions-datetime.html

Passing parameters into ``raw()``
---------------------------------

If you need to perform parameterized queries, you can use the ``params``
argument to ``raw()``::

    >>> lname = 'Doe'
    >>> Person.objects.raw('SELECT * FROM myapp_person WHERE last_name = %s', [lname])

``params`` is a list of parameters. You'll use ``%s`` placeholders in the
query string (regardless of your database engine); they'll be replaced with
parameters from the ``params`` list.

.. warning::

    **Do not use string formatting on raw queries!**

    It's tempting to write the above query as::

        >>> query = 'SELECT * FROM myapp_person WHERE last_name = %s' % lname
        >>> Person.objects.raw(query)

    **Don't.**

    Using the ``params`` list completely protects you from `SQL injection
    attacks`__, a common exploit where attackers inject arbitrary SQL into
    your database. If you use string interpolation, sooner or later you'll
    fall victim to SQL injection. As long as you remember to always use the
    ``params`` list you'll be protected.

__ http://en.wikipedia.org/wiki/SQL_injection

Executing custom SQL directly
=============================

Sometimes even :meth:`Manager.raw` isn't quite enough: you might need to
perform queries that don't map cleanly to models, or directly execute
``UPDATE``, ``INSERT``, or ``DELETE`` queries.

In these cases, you can always access the database directly, routing around
the model layer entirely.

The object ``django.db.connection`` represents the
default database connection, and ``django.db.transaction`` represents the
default database transaction. To use the database connection, call
``connection.cursor()`` to get a cursor object. Then, call
``cursor.execute(sql, [params])`` to execute the SQL and ``cursor.fetchone()``
or ``cursor.fetchall()`` to return the resulting rows. After performing a data
changing operation, you should then call
``transaction.commit_unless_managed()`` to ensure your changes are committed
to the database. If your query is purely a data retrieval operation, no commit
is required. For example::

    def my_custom_sql():
        from django.db import connection, transaction
        cursor = connection.cursor()

        # Data modifying operation - commit required
        cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [self.baz])
        transaction.commit_unless_managed()

        # Data retrieval operation - no commit required
        cursor.execute("SELECT foo FROM bar WHERE baz = %s", [self.baz])
        row = cursor.fetchone()

        return row

If you are using more than one database you can use
``django.db.connections`` to obtain the connection (and cursor) for a
specific database. ``django.db.connections`` is a dictionary-like
object that allows you to retrieve a specific connection using it's
alias::

    from django.db import connections
    cursor = connections['my_db_alias'].cursor()

.. _transactions-and-raw-sql:

Transactions and raw SQL
------------------------
If you are using transaction decorators (such as ``commit_on_success``) to
wrap your views and provide transaction control, you don't have to make a
manual call to ``transaction.commit_unless_managed()`` -- you can manually
commit if you want to, but you aren't required to, since the decorator will
commit for you. However, if you don't manually commit your changes, you will
need to manually mark the transaction as dirty, using
``transaction.set_dirty()``::

    @commit_on_success
    def my_custom_sql_view(request, value):
        from django.db import connection, transaction
        cursor = connection.cursor()

        # Data modifying operation
        cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [value])

        # Since we modified data, mark the transaction as dirty
        transaction.set_dirty()

        # Data retrieval operation. This doesn't dirty the transaction,
        # so no call to set_dirty() is required.
        cursor.execute("SELECT foo FROM bar WHERE baz = %s", [value])
        row = cursor.fetchone()

        return render_to_response('template.html', {'row': row})

The call to ``set_dirty()`` is made automatically when you use the Django ORM
to make data modifying database calls. However, when you use raw SQL, Django
has no way of knowing if your SQL modifies data or not. The manual call to
``set_dirty()`` ensures that Django knows that there are modifications that
must be committed.

Connections and cursors
-----------------------

``connection`` and ``cursor`` mostly implement the standard `Python DB-API`_
(except when it comes to :doc:`transaction handling </topics/db/transactions>`).
If you're not familiar with the Python DB-API, note that the SQL statement in
``cursor.execute()`` uses placeholders, ``"%s"``, rather than adding parameters
directly within the SQL. If you use this technique, the underlying database
library will automatically add quotes and escaping to your parameter(s) as
necessary. (Also note that Django expects the ``"%s"`` placeholder, *not* the
``"?"`` placeholder, which is used by the SQLite Python bindings. This is for
the sake of consistency and sanity.)

.. _Python DB-API: http://www.python.org/dev/peps/pep-0249/