parts/django/docs/topics/db/multi-db.txt
changeset 69 c6bca38c1cbf
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/parts/django/docs/topics/db/multi-db.txt	Sat Jan 08 11:20:57 2011 +0530
@@ -0,0 +1,574 @@
+==================
+Multiple databases
+==================
+
+.. versionadded:: 1.2
+
+This topic guide describes Django's support for interacting with
+multiple databases. Most of the rest of Django's documentation assumes
+you are interacting with a single database. If you want to interact
+with multiple databases, you'll need to take some additional steps.
+
+Defining your databases
+=======================
+
+The first step to using more than one database with Django is to tell
+Django about the database servers you'll be using. This is done using
+the :setting:`DATABASES` setting. This setting maps database aliases,
+which are a way to refer to a specific database throughout Django, to
+a dictionary of settings for that specific connection. The settings in
+the inner dictionaries are described fully in the :setting:`DATABASES`
+documentation.
+
+Databases can have any alias you choose. However, the alias
+``default`` has special significance. Django uses the database with
+the alias of ``default`` when no other database has been selected. If
+you don't have a ``default`` database, you need to be careful to
+always specify the database that you want to use.
+
+The following is an example ``settings.py`` snippet defining two
+databases -- a default PostgreSQL database and a MySQL database called
+``users``:
+
+.. code-block:: python
+
+    DATABASES = {
+        'default': {
+            'NAME': 'app_data',
+            'ENGINE': 'django.db.backends.postgresql_psycopg2',
+            'USER': 'postgres_user',
+            'PASSWORD': 's3krit'
+        },
+        'users': {
+            'NAME': 'user_data',
+            'ENGINE': 'django.db.backends.mysql',
+            'USER': 'mysql_user',
+            'PASSWORD': 'priv4te'
+        }
+    }
+
+If you attempt to access a database that you haven't defined in your
+:setting:`DATABASES` setting, Django will raise a
+``django.db.utils.ConnectionDoesNotExist`` exception.
+
+Synchronizing your databases
+============================
+
+The :djadmin:`syncdb` management command operates on one database at a
+time. By default, it operates on the ``default`` database, but by
+providing a :djadminopt:`--database` argument, you can tell syncdb to
+synchronize a different database. So, to synchronize all models onto
+all databases in our example, you would need to call::
+
+    $ ./manage.py syncdb
+    $ ./manage.py syncdb --database=users
+
+If you don't want every application to be synchronized onto a
+particular database, you can define a :ref:`database
+router<topics-db-multi-db-routing>` that implements a policy
+constraining the availability of particular models.
+
+Alternatively, if you want fine-grained control of synchronization,
+you can pipe all or part of the output of :djadmin:`sqlall` for a
+particular application directly into your database prompt, like this::
+
+    $ ./manage.py sqlall sales | ./manage.py dbshell
+
+Using other management commands
+-------------------------------
+
+The other ``django-admin.py`` commands that interact with the database
+operate in the same way as :djadmin:`syncdb` -- they only ever operate
+on one database at a time, using :djadminopt:`--database` to control
+the database used.
+
+.. _topics-db-multi-db-routing:
+
+Automatic database routing
+==========================
+
+The easiest way to use multiple databases is to set up a database
+routing scheme. The default routing scheme ensures that objects remain
+'sticky' to their original database (i.e., an object retrieved from
+the ``foo`` database will be saved on the same database). The default
+routing scheme ensures that if a database isn't specified, all queries
+fall back to the ``default`` database.
+
+You don't have to do anything to activate the default routing scheme
+-- it is provided 'out of the box' on every Django project. However,
+if you want to implement more interesting database allocation
+behaviors, you can define and install your own database routers.
+
+Database routers
+----------------
+
+A database Router is a class that provides up to four methods:
+
+.. method:: db_for_read(model, **hints)
+
+    Suggest the database that should be used for read operations for
+    objects of type ``model``.
+
+    If a database operation is able to provide any additional
+    information that might assist in selecting a database, it will be
+    provided in the ``hints`` dictionary. Details on valid hints are
+    provided :ref:`below <topics-db-multi-db-hints>`.
+
+    Returns None if there is no suggestion.
+
+.. method:: db_for_write(model, **hints)
+
+    Suggest the database that should be used for writes of objects of
+    type Model.
+
+    If a database operation is able to provide any additional
+    information that might assist in selecting a database, it will be
+    provided in the ``hints`` dictionary. Details on valid hints are
+    provided :ref:`below <topics-db-multi-db-hints>`.
+
+    Returns None if there is no suggestion.
+
+.. method:: allow_relation(obj1, obj2, **hints)
+
+    Return True if a relation between obj1 and obj2 should be
+    allowed, False if the relation should be prevented, or None if
+    the router has no opinion. This is purely a validation operation,
+    used by foreign key and many to many operations to determine if a
+    relation should be allowed between two objects.
+
+.. method:: allow_syncdb(db, model)
+
+    Determine if the ``model`` should be synchronized onto the
+    database with alias ``db``. Return True if the model should be
+    synchronized, False if it should not be synchronized, or None if
+    the router has no opinion. This method can be used to determine
+    the availability of a model on a given database.
+
+A router doesn't have to provide *all* these methods - it omit one or
+more of them. If one of the methods is omitted, Django will skip that
+router when performing the relevant check.
+
+.. _topics-db-multi-db-hints:
+
+Hints
+~~~~~
+
+The hints received by the database router can be used to decide which
+database should receive a given request.
+
+At present, the only hint that will be provided is ``instance``, an
+object instance that is related to the read or write operation that is
+underway. This might be the instance that is being saved, or it might
+be an instance that is being added in a many-to-many relation. In some
+cases, no instance hint will be provided at all. The router checks for
+the existence of an instance hint, and determine if that hint should be
+used to alter routing behavior.
+
+Using routers
+-------------
+
+Database routers are installed using the :setting:`DATABASE_ROUTERS`
+setting. This setting defines a list of class names, each specifying a
+router that should be used by the master router
+(``django.db.router``).
+
+The master router is used by Django's database operations to allocate
+database usage. Whenever a query needs to know which database to use,
+it calls the master router, providing a model and a hint (if
+available). Django then tries each router in turn until a database
+suggestion can be found. If no suggestion can be found, it tries the
+current ``_state.db`` of the hint instance. If a hint instance wasn't
+provided, or the instance doesn't currently have database state, the
+master router will allocate the ``default`` database.
+
+An example
+----------
+
+.. admonition:: Example purposes only!
+
+    This example is intended as a demonstration of how the router
+    infrastructure can be used to alter database usage. It
+    intentionally ignores some complex issues in order to
+    demonstrate how routers are used.
+
+    This example won't work if any of the models in ``myapp`` contain
+    relationships to models outside of the ``other`` database.
+    :ref:`Cross-database relationships <no_cross_database_relations>`
+    introduce referential integrity problems that Django can't
+    currently handle.
+
+    The master/slave configuration described is also flawed -- it
+    doesn't provide any solution for handling replication lag (i.e.,
+    query inconsistencies introduced because of the time taken for a
+    write to propagate to the slaves). It also doesn't consider the
+    interaction of transactions with the database utilization strategy.
+
+So - what does this mean in practice? Say you want ``myapp`` to
+exist on the ``other`` database, and you want all other models in a
+master/slave relationship between the databases ``master``, ``slave1`` and
+``slave2``. To implement this, you would need 2 routers::
+
+    class MyAppRouter(object):
+        """A router to control all database operations on models in
+        the myapp application"""
+
+        def db_for_read(self, model, **hints):
+            "Point all operations on myapp models to 'other'"
+            if model._meta.app_label == 'myapp':
+                return 'other'
+            return None
+
+        def db_for_write(self, model, **hints):
+            "Point all operations on myapp models to 'other'"
+            if model._meta.app_label == 'myapp':
+                return 'other'
+            return None
+
+        def allow_relation(self, obj1, obj2, **hints):
+            "Allow any relation if a model in myapp is involved"
+            if obj1._meta.app_label == 'myapp' or obj2._meta.app_label == 'myapp':
+                return True
+            return None
+
+        def allow_syncdb(self, db, model):
+            "Make sure the myapp app only appears on the 'other' db"
+            if db == 'other':
+                return model._meta.app_label == 'myapp'
+            elif model._meta.app_label == 'myapp':
+                return False
+            return None
+
+    class MasterSlaveRouter(object):
+        """A router that sets up a simple master/slave configuration"""
+
+        def db_for_read(self, model, **hints):
+            "Point all read operations to a random slave"
+            return random.choice(['slave1','slave2'])
+
+        def db_for_write(self, model, **hints):
+            "Point all write operations to the master"
+            return 'master'
+
+        def allow_relation(self, obj1, obj2, **hints):
+            "Allow any relation between two objects in the db pool"
+            db_list = ('master','slave1','slave2')
+            if obj1._state.db in db_list and obj2._state.db in db_list:
+                return True
+            return None
+
+        def allow_syncdb(self, db, model):
+            "Explicitly put all models on all databases."
+            return True
+
+Then, in your settings file, add the following (substituting ``path.to.`` with
+the actual python path to the module where you define the routers)::
+
+    DATABASE_ROUTERS = ['path.to.MyAppRouter', 'path.to.MasterSlaveRouter']
+
+The order in which routers are processed is significant. Routers will
+be queried in the order the are listed in the
+:setting:`DATABASE_ROUTERS` setting . In this example, the
+``MyAppRouter`` is processed before the ``MasterSlaveRouter``, and as a
+result, decisions concerning the models in ``myapp`` are processed
+before any other decision is made. If the :setting:`DATABASE_ROUTERS`
+setting listed the two routers in the other order,
+``MasterSlaveRouter.allow_syncdb()`` would be processed first. The
+catch-all nature of the MasterSlaveRouter implementation would mean
+that all models would be available on all databases.
+
+With this setup installed, lets run some Django code::
+
+    >>> # This retrieval will be performed on the 'credentials' database
+    >>> fred = User.objects.get(username='fred')
+    >>> fred.first_name = 'Frederick'
+
+    >>> # This save will also be directed to 'credentials'
+    >>> fred.save()
+
+    >>> # These retrieval will be randomly allocated to a slave database
+    >>> dna = Person.objects.get(name='Douglas Adams')
+
+    >>> # A new object has no database allocation when created
+    >>> mh = Book(title='Mostly Harmless')
+
+    >>> # This assignment will consult the router, and set mh onto
+    >>> # the same database as the author object
+    >>> mh.author = dna
+
+    >>> # This save will force the 'mh' instance onto the master database...
+    >>> mh.save()
+
+    >>> # ... but if we re-retrieve the object, it will come back on a slave
+    >>> mh = Book.objects.get(title='Mostly Harmless')
+
+
+Manually selecting a database
+=============================
+
+Django also provides an API that allows you to maintain complete control
+over database usage in your code. A manually specified database allocation
+will take priority over a database allocated by a router.
+
+Manually selecting a database for a ``QuerySet``
+------------------------------------------------
+
+You can select the database for a ``QuerySet`` at any point in the
+``QuerySet`` "chain." Just call ``using()`` on the ``QuerySet`` to get
+another ``QuerySet`` that uses the specified database.
+
+``using()`` takes a single argument: the alias of the database on
+which you want to run the query. For example::
+
+    >>> # This will run on the 'default' database.
+    >>> Author.objects.all()
+
+    >>> # So will this.
+    >>> Author.objects.using('default').all()
+
+    >>> # This will run on the 'other' database.
+    >>> Author.objects.using('other').all()
+
+Selecting a database for ``save()``
+-----------------------------------
+
+Use the ``using`` keyword to ``Model.save()`` to specify to which
+database the data should be saved.
+
+For example, to save an object to the ``legacy_users`` database, you'd
+use this::
+
+    >>> my_object.save(using='legacy_users')
+
+If you don't specify ``using``, the ``save()`` method will save into
+the default database allocated by the routers.
+
+Moving an object from one database to another
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you've saved an instance to one database, it might be tempting to
+use ``save(using=...)`` as a way to migrate the instance to a new
+database. However, if you don't take appropriate steps, this could
+have some unexpected consequences.
+
+Consider the following example::
+
+    >>> p = Person(name='Fred')
+    >>> p.save(using='first')  # (statement 1)
+    >>> p.save(using='second') # (statement 2)
+
+In statement 1, a new ``Person`` object is saved to the ``first``
+database. At this time, ``p`` doesn't have a primary key, so Django
+issues a SQL ``INSERT`` statement. This creates a primary key, and
+Django assigns that primary key to ``p``.
+
+When the save occurs in statement 2, ``p`` already has a primary key
+value, and Django will attempt to use that primary key on the new
+database. If the primary key value isn't in use in the ``second``
+database, then you won't have any problems -- the object will be
+copied to the new database.
+
+However, if the primary key of ``p`` is already in use on the
+``second`` database, the existing object in the ``second`` database
+will be overridden when ``p`` is saved.
+
+You can avoid this in two ways. First, you can clear the primary key
+of the instance. If an object has no primary key, Django will treat it
+as a new object, avoiding any loss of data on the ``second``
+database::
+
+    >>> p = Person(name='Fred')
+    >>> p.save(using='first')
+    >>> p.pk = None # Clear the primary key.
+    >>> p.save(using='second') # Write a completely new object.
+
+The second option is to use the ``force_insert`` option to ``save()``
+to ensure that Django does a SQL ``INSERT``::
+
+    >>> p = Person(name='Fred')
+    >>> p.save(using='first')
+    >>> p.save(using='second', force_insert=True)
+
+This will ensure that the person named ``Fred`` will have the same
+primary key on both databases. If that primary key is already in use
+when you try to save onto the ``second`` database, an error will be
+raised.
+
+Selecting a database to delete from
+-----------------------------------
+
+By default, a call to delete an existing object will be executed on
+the same database that was used to retrieve the object in the first
+place::
+
+    >>> u = User.objects.using('legacy_users').get(username='fred')
+    >>> u.delete() # will delete from the `legacy_users` database
+
+To specify the database from which a model will be deleted, pass a
+``using`` keyword argument to the ``Model.delete()`` method. This
+argument works just like the ``using`` keyword argument to ``save()``.
+
+For example, if you're migrating a user from the ``legacy_users``
+database to the ``new_users`` database, you might use these commands::
+
+    >>> user_obj.save(using='new_users')
+    >>> user_obj.delete(using='legacy_users')
+
+Using managers with multiple databases
+--------------------------------------
+
+Use the ``db_manager()`` method on managers to give managers access to
+a non-default database.
+
+For example, say you have a custom manager method that touches the
+database -- ``User.objects.create_user()``. Because ``create_user()``
+is a manager method, not a ``QuerySet`` method, you can't do
+``User.objects.using('new_users').create_user()``. (The
+``create_user()`` method is only available on ``User.objects``, the
+manager, not on ``QuerySet`` objects derived from the manager.) The
+solution is to use ``db_manager()``, like this::
+
+    User.objects.db_manager('new_users').create_user(...)
+
+``db_manager()`` returns a copy of the manager bound to the database you specify.
+
+Using ``get_query_set()`` with multiple databases
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you're overriding ``get_query_set()`` on your manager, be sure to
+either call the method on the parent (using ``super()``) or do the
+appropriate handling of the ``_db`` attribute on the manager (a string
+containing the name of the database to use).
+
+For example, if you want to return a custom ``QuerySet`` class from
+the ``get_query_set`` method, you could do this::
+
+    class MyManager(models.Manager):
+        def get_query_set(self):
+            qs = CustomQuerySet(self.model)
+            if self._db is not None:
+                qs = qs.using(self._db)
+            return qs
+
+Exposing multiple databases in Django's admin interface
+=======================================================
+
+Django's admin doesn't have any explicit support for multiple
+databases. If you want to provide an admin interface for a model on a
+database other than that that specified by your router chain, you'll
+need to write custom :class:`~django.contrib.admin.ModelAdmin` classes
+that will direct the admin to use a specific database for content.
+
+``ModelAdmin`` objects have four methods that require customization for
+multiple-database support::
+
+    class MultiDBModelAdmin(admin.ModelAdmin):
+        # A handy constant for the name of the alternate database.
+        using = 'other'
+
+        def save_model(self, request, obj, form, change):
+            # Tell Django to save objects to the 'other' database.
+            obj.save(using=self.using)
+
+        def queryset(self, request):
+            # Tell Django to look for objects on the 'other' database.
+            return super(MultiDBModelAdmin, self).queryset(request).using(self.using)
+
+        def formfield_for_foreignkey(self, db_field, request=None, **kwargs):
+            # Tell Django to populate ForeignKey widgets using a query
+            # on the 'other' database.
+            return super(MultiDBModelAdmin, self).formfield_for_foreignkey(db_field, request=request, using=self.using, **kwargs)
+
+        def formfield_for_manytomany(self, db_field, request=None, **kwargs):
+            # Tell Django to populate ManyToMany widgets using a query
+            # on the 'other' database.
+            return super(MultiDBModelAdmin, self).formfield_for_manytomany(db_field, request=request, using=self.using, **kwargs)
+
+The implementation provided here implements a multi-database strategy
+where all objects of a given type are stored on a specific database
+(e.g., all ``User`` objects are in the ``other`` database). If your
+usage of multiple databases is more complex, your ``ModelAdmin`` will
+need to reflect that strategy.
+
+Inlines can be handled in a similar fashion. They require three customized methods::
+
+    class MultiDBTabularInline(admin.TabularInline):
+        using = 'other'
+
+        def queryset(self, request):
+            # Tell Django to look for inline objects on the 'other' database.
+            return super(MultiDBTabularInline, self).queryset(request).using(self.using)
+
+        def formfield_for_foreignkey(self, db_field, request=None, **kwargs):
+            # Tell Django to populate ForeignKey widgets using a query
+            # on the 'other' database.
+            return super(MultiDBTabularInline, self).formfield_for_foreignkey(db_field, request=request, using=self.using, **kwargs)
+
+        def formfield_for_manytomany(self, db_field, request=None, **kwargs):
+            # Tell Django to populate ManyToMany widgets using a query
+            # on the 'other' database.
+            return super(MultiDBTabularInline, self).formfield_for_manytomany(db_field, request=request, using=self.using, **kwargs)
+
+Once you've written your model admin definitions, they can be
+registered with any ``Admin`` instance::
+
+    from django.contrib import admin
+
+    # Specialize the multi-db admin objects for use with specific models.
+    class BookInline(MultiDBTabularInline):
+        model = Book
+
+    class PublisherAdmin(MultiDBModelAdmin):
+        inlines = [BookInline]
+
+    admin.site.register(Author, MultiDBModelAdmin)
+    admin.site.register(Publisher, PublisherAdmin)
+
+    othersite = admin.Site('othersite')
+    othersite.register(Publisher, MultiDBModelAdmin)
+
+This example sets up two admin sites. On the first site, the
+``Author`` and ``Publisher`` objects are exposed; ``Publisher``
+objects have an tabular inline showing books published by that
+publisher. The second site exposes just publishers, without the
+inlines.
+
+Using raw cursors with multiple databases
+=========================================
+
+If you are using more than one database you can use
+``django.db.connections`` to obtain the connection (and cursor) for a
+specific database. ``django.db.connections`` is a dictionary-like
+object that allows you to retrieve a specific connection using it's
+alias::
+
+    from django.db import connections
+    cursor = connections['my_db_alias'].cursor()
+
+Limitations of multiple databases
+=================================
+
+.. _no_cross_database_relations:
+
+Cross-database relations
+------------------------
+
+Django doesn't currently provide any support for foreign key or
+many-to-many relationships spanning multiple databases. If you
+have used a router to partition models to different databases,
+any foreign key and many-to-many relationships defined by those
+models must be internal to a single database.
+
+This is because of referential integrity. In order to maintain a
+relationship between two objects, Django needs to know that the
+primary key of the related object is valid. If the primary key is
+stored on a separate database, it's not possible to easily evaluate
+the validity of a primary key.
+
+If you're using Postgres, Oracle, or MySQL with InnoDB, this is
+enforced at the database integrity level -- database level key
+constraints prevent the creation of relations that can't be validated.
+
+However, if you're using SQLite or MySQL with MyISAM tables, there is
+no enforced referential integrity; as a result, you may be able to
+'fake' cross database foreign keys. However, this configuration is not
+officially supported by Django.