diff -r 5ff1fc726848 -r c6bca38c1cbf parts/django/docs/topics/db/multi-db.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/parts/django/docs/topics/db/multi-db.txt Sat Jan 08 11:20:57 2011 +0530 @@ -0,0 +1,574 @@ +================== +Multiple databases +================== + +.. versionadded:: 1.2 + +This topic guide describes Django's support for interacting with +multiple databases. Most of the rest of Django's documentation assumes +you are interacting with a single database. If you want to interact +with multiple databases, you'll need to take some additional steps. + +Defining your databases +======================= + +The first step to using more than one database with Django is to tell +Django about the database servers you'll be using. This is done using +the :setting:`DATABASES` setting. This setting maps database aliases, +which are a way to refer to a specific database throughout Django, to +a dictionary of settings for that specific connection. The settings in +the inner dictionaries are described fully in the :setting:`DATABASES` +documentation. + +Databases can have any alias you choose. However, the alias +``default`` has special significance. Django uses the database with +the alias of ``default`` when no other database has been selected. If +you don't have a ``default`` database, you need to be careful to +always specify the database that you want to use. + +The following is an example ``settings.py`` snippet defining two +databases -- a default PostgreSQL database and a MySQL database called +``users``: + +.. code-block:: python + + DATABASES = { + 'default': { + 'NAME': 'app_data', + 'ENGINE': 'django.db.backends.postgresql_psycopg2', + 'USER': 'postgres_user', + 'PASSWORD': 's3krit' + }, + 'users': { + 'NAME': 'user_data', + 'ENGINE': 'django.db.backends.mysql', + 'USER': 'mysql_user', + 'PASSWORD': 'priv4te' + } + } + +If you attempt to access a database that you haven't defined in your +:setting:`DATABASES` setting, Django will raise a +``django.db.utils.ConnectionDoesNotExist`` exception. + +Synchronizing your databases +============================ + +The :djadmin:`syncdb` management command operates on one database at a +time. By default, it operates on the ``default`` database, but by +providing a :djadminopt:`--database` argument, you can tell syncdb to +synchronize a different database. So, to synchronize all models onto +all databases in our example, you would need to call:: + + $ ./manage.py syncdb + $ ./manage.py syncdb --database=users + +If you don't want every application to be synchronized onto a +particular database, you can define a :ref:`database +router` that implements a policy +constraining the availability of particular models. + +Alternatively, if you want fine-grained control of synchronization, +you can pipe all or part of the output of :djadmin:`sqlall` for a +particular application directly into your database prompt, like this:: + + $ ./manage.py sqlall sales | ./manage.py dbshell + +Using other management commands +------------------------------- + +The other ``django-admin.py`` commands that interact with the database +operate in the same way as :djadmin:`syncdb` -- they only ever operate +on one database at a time, using :djadminopt:`--database` to control +the database used. + +.. _topics-db-multi-db-routing: + +Automatic database routing +========================== + +The easiest way to use multiple databases is to set up a database +routing scheme. The default routing scheme ensures that objects remain +'sticky' to their original database (i.e., an object retrieved from +the ``foo`` database will be saved on the same database). The default +routing scheme ensures that if a database isn't specified, all queries +fall back to the ``default`` database. + +You don't have to do anything to activate the default routing scheme +-- it is provided 'out of the box' on every Django project. However, +if you want to implement more interesting database allocation +behaviors, you can define and install your own database routers. + +Database routers +---------------- + +A database Router is a class that provides up to four methods: + +.. method:: db_for_read(model, **hints) + + Suggest the database that should be used for read operations for + objects of type ``model``. + + If a database operation is able to provide any additional + information that might assist in selecting a database, it will be + provided in the ``hints`` dictionary. Details on valid hints are + provided :ref:`below `. + + Returns None if there is no suggestion. + +.. method:: db_for_write(model, **hints) + + Suggest the database that should be used for writes of objects of + type Model. + + If a database operation is able to provide any additional + information that might assist in selecting a database, it will be + provided in the ``hints`` dictionary. Details on valid hints are + provided :ref:`below `. + + Returns None if there is no suggestion. + +.. method:: allow_relation(obj1, obj2, **hints) + + Return True if a relation between obj1 and obj2 should be + allowed, False if the relation should be prevented, or None if + the router has no opinion. This is purely a validation operation, + used by foreign key and many to many operations to determine if a + relation should be allowed between two objects. + +.. method:: allow_syncdb(db, model) + + Determine if the ``model`` should be synchronized onto the + database with alias ``db``. Return True if the model should be + synchronized, False if it should not be synchronized, or None if + the router has no opinion. This method can be used to determine + the availability of a model on a given database. + +A router doesn't have to provide *all* these methods - it omit one or +more of them. If one of the methods is omitted, Django will skip that +router when performing the relevant check. + +.. _topics-db-multi-db-hints: + +Hints +~~~~~ + +The hints received by the database router can be used to decide which +database should receive a given request. + +At present, the only hint that will be provided is ``instance``, an +object instance that is related to the read or write operation that is +underway. This might be the instance that is being saved, or it might +be an instance that is being added in a many-to-many relation. In some +cases, no instance hint will be provided at all. The router checks for +the existence of an instance hint, and determine if that hint should be +used to alter routing behavior. + +Using routers +------------- + +Database routers are installed using the :setting:`DATABASE_ROUTERS` +setting. This setting defines a list of class names, each specifying a +router that should be used by the master router +(``django.db.router``). + +The master router is used by Django's database operations to allocate +database usage. Whenever a query needs to know which database to use, +it calls the master router, providing a model and a hint (if +available). Django then tries each router in turn until a database +suggestion can be found. If no suggestion can be found, it tries the +current ``_state.db`` of the hint instance. If a hint instance wasn't +provided, or the instance doesn't currently have database state, the +master router will allocate the ``default`` database. + +An example +---------- + +.. admonition:: Example purposes only! + + This example is intended as a demonstration of how the router + infrastructure can be used to alter database usage. It + intentionally ignores some complex issues in order to + demonstrate how routers are used. + + This example won't work if any of the models in ``myapp`` contain + relationships to models outside of the ``other`` database. + :ref:`Cross-database relationships ` + introduce referential integrity problems that Django can't + currently handle. + + The master/slave configuration described is also flawed -- it + doesn't provide any solution for handling replication lag (i.e., + query inconsistencies introduced because of the time taken for a + write to propagate to the slaves). It also doesn't consider the + interaction of transactions with the database utilization strategy. + +So - what does this mean in practice? Say you want ``myapp`` to +exist on the ``other`` database, and you want all other models in a +master/slave relationship between the databases ``master``, ``slave1`` and +``slave2``. To implement this, you would need 2 routers:: + + class MyAppRouter(object): + """A router to control all database operations on models in + the myapp application""" + + def db_for_read(self, model, **hints): + "Point all operations on myapp models to 'other'" + if model._meta.app_label == 'myapp': + return 'other' + return None + + def db_for_write(self, model, **hints): + "Point all operations on myapp models to 'other'" + if model._meta.app_label == 'myapp': + return 'other' + return None + + def allow_relation(self, obj1, obj2, **hints): + "Allow any relation if a model in myapp is involved" + if obj1._meta.app_label == 'myapp' or obj2._meta.app_label == 'myapp': + return True + return None + + def allow_syncdb(self, db, model): + "Make sure the myapp app only appears on the 'other' db" + if db == 'other': + return model._meta.app_label == 'myapp' + elif model._meta.app_label == 'myapp': + return False + return None + + class MasterSlaveRouter(object): + """A router that sets up a simple master/slave configuration""" + + def db_for_read(self, model, **hints): + "Point all read operations to a random slave" + return random.choice(['slave1','slave2']) + + def db_for_write(self, model, **hints): + "Point all write operations to the master" + return 'master' + + def allow_relation(self, obj1, obj2, **hints): + "Allow any relation between two objects in the db pool" + db_list = ('master','slave1','slave2') + if obj1._state.db in db_list and obj2._state.db in db_list: + return True + return None + + def allow_syncdb(self, db, model): + "Explicitly put all models on all databases." + return True + +Then, in your settings file, add the following (substituting ``path.to.`` with +the actual python path to the module where you define the routers):: + + DATABASE_ROUTERS = ['path.to.MyAppRouter', 'path.to.MasterSlaveRouter'] + +The order in which routers are processed is significant. Routers will +be queried in the order the are listed in the +:setting:`DATABASE_ROUTERS` setting . In this example, the +``MyAppRouter`` is processed before the ``MasterSlaveRouter``, and as a +result, decisions concerning the models in ``myapp`` are processed +before any other decision is made. If the :setting:`DATABASE_ROUTERS` +setting listed the two routers in the other order, +``MasterSlaveRouter.allow_syncdb()`` would be processed first. The +catch-all nature of the MasterSlaveRouter implementation would mean +that all models would be available on all databases. + +With this setup installed, lets run some Django code:: + + >>> # This retrieval will be performed on the 'credentials' database + >>> fred = User.objects.get(username='fred') + >>> fred.first_name = 'Frederick' + + >>> # This save will also be directed to 'credentials' + >>> fred.save() + + >>> # These retrieval will be randomly allocated to a slave database + >>> dna = Person.objects.get(name='Douglas Adams') + + >>> # A new object has no database allocation when created + >>> mh = Book(title='Mostly Harmless') + + >>> # This assignment will consult the router, and set mh onto + >>> # the same database as the author object + >>> mh.author = dna + + >>> # This save will force the 'mh' instance onto the master database... + >>> mh.save() + + >>> # ... but if we re-retrieve the object, it will come back on a slave + >>> mh = Book.objects.get(title='Mostly Harmless') + + +Manually selecting a database +============================= + +Django also provides an API that allows you to maintain complete control +over database usage in your code. A manually specified database allocation +will take priority over a database allocated by a router. + +Manually selecting a database for a ``QuerySet`` +------------------------------------------------ + +You can select the database for a ``QuerySet`` at any point in the +``QuerySet`` "chain." Just call ``using()`` on the ``QuerySet`` to get +another ``QuerySet`` that uses the specified database. + +``using()`` takes a single argument: the alias of the database on +which you want to run the query. For example:: + + >>> # This will run on the 'default' database. + >>> Author.objects.all() + + >>> # So will this. + >>> Author.objects.using('default').all() + + >>> # This will run on the 'other' database. + >>> Author.objects.using('other').all() + +Selecting a database for ``save()`` +----------------------------------- + +Use the ``using`` keyword to ``Model.save()`` to specify to which +database the data should be saved. + +For example, to save an object to the ``legacy_users`` database, you'd +use this:: + + >>> my_object.save(using='legacy_users') + +If you don't specify ``using``, the ``save()`` method will save into +the default database allocated by the routers. + +Moving an object from one database to another +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you've saved an instance to one database, it might be tempting to +use ``save(using=...)`` as a way to migrate the instance to a new +database. However, if you don't take appropriate steps, this could +have some unexpected consequences. + +Consider the following example:: + + >>> p = Person(name='Fred') + >>> p.save(using='first') # (statement 1) + >>> p.save(using='second') # (statement 2) + +In statement 1, a new ``Person`` object is saved to the ``first`` +database. At this time, ``p`` doesn't have a primary key, so Django +issues a SQL ``INSERT`` statement. This creates a primary key, and +Django assigns that primary key to ``p``. + +When the save occurs in statement 2, ``p`` already has a primary key +value, and Django will attempt to use that primary key on the new +database. If the primary key value isn't in use in the ``second`` +database, then you won't have any problems -- the object will be +copied to the new database. + +However, if the primary key of ``p`` is already in use on the +``second`` database, the existing object in the ``second`` database +will be overridden when ``p`` is saved. + +You can avoid this in two ways. First, you can clear the primary key +of the instance. If an object has no primary key, Django will treat it +as a new object, avoiding any loss of data on the ``second`` +database:: + + >>> p = Person(name='Fred') + >>> p.save(using='first') + >>> p.pk = None # Clear the primary key. + >>> p.save(using='second') # Write a completely new object. + +The second option is to use the ``force_insert`` option to ``save()`` +to ensure that Django does a SQL ``INSERT``:: + + >>> p = Person(name='Fred') + >>> p.save(using='first') + >>> p.save(using='second', force_insert=True) + +This will ensure that the person named ``Fred`` will have the same +primary key on both databases. If that primary key is already in use +when you try to save onto the ``second`` database, an error will be +raised. + +Selecting a database to delete from +----------------------------------- + +By default, a call to delete an existing object will be executed on +the same database that was used to retrieve the object in the first +place:: + + >>> u = User.objects.using('legacy_users').get(username='fred') + >>> u.delete() # will delete from the `legacy_users` database + +To specify the database from which a model will be deleted, pass a +``using`` keyword argument to the ``Model.delete()`` method. This +argument works just like the ``using`` keyword argument to ``save()``. + +For example, if you're migrating a user from the ``legacy_users`` +database to the ``new_users`` database, you might use these commands:: + + >>> user_obj.save(using='new_users') + >>> user_obj.delete(using='legacy_users') + +Using managers with multiple databases +-------------------------------------- + +Use the ``db_manager()`` method on managers to give managers access to +a non-default database. + +For example, say you have a custom manager method that touches the +database -- ``User.objects.create_user()``. Because ``create_user()`` +is a manager method, not a ``QuerySet`` method, you can't do +``User.objects.using('new_users').create_user()``. (The +``create_user()`` method is only available on ``User.objects``, the +manager, not on ``QuerySet`` objects derived from the manager.) The +solution is to use ``db_manager()``, like this:: + + User.objects.db_manager('new_users').create_user(...) + +``db_manager()`` returns a copy of the manager bound to the database you specify. + +Using ``get_query_set()`` with multiple databases +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you're overriding ``get_query_set()`` on your manager, be sure to +either call the method on the parent (using ``super()``) or do the +appropriate handling of the ``_db`` attribute on the manager (a string +containing the name of the database to use). + +For example, if you want to return a custom ``QuerySet`` class from +the ``get_query_set`` method, you could do this:: + + class MyManager(models.Manager): + def get_query_set(self): + qs = CustomQuerySet(self.model) + if self._db is not None: + qs = qs.using(self._db) + return qs + +Exposing multiple databases in Django's admin interface +======================================================= + +Django's admin doesn't have any explicit support for multiple +databases. If you want to provide an admin interface for a model on a +database other than that that specified by your router chain, you'll +need to write custom :class:`~django.contrib.admin.ModelAdmin` classes +that will direct the admin to use a specific database for content. + +``ModelAdmin`` objects have four methods that require customization for +multiple-database support:: + + class MultiDBModelAdmin(admin.ModelAdmin): + # A handy constant for the name of the alternate database. + using = 'other' + + def save_model(self, request, obj, form, change): + # Tell Django to save objects to the 'other' database. + obj.save(using=self.using) + + def queryset(self, request): + # Tell Django to look for objects on the 'other' database. + return super(MultiDBModelAdmin, self).queryset(request).using(self.using) + + def formfield_for_foreignkey(self, db_field, request=None, **kwargs): + # Tell Django to populate ForeignKey widgets using a query + # on the 'other' database. + return super(MultiDBModelAdmin, self).formfield_for_foreignkey(db_field, request=request, using=self.using, **kwargs) + + def formfield_for_manytomany(self, db_field, request=None, **kwargs): + # Tell Django to populate ManyToMany widgets using a query + # on the 'other' database. + return super(MultiDBModelAdmin, self).formfield_for_manytomany(db_field, request=request, using=self.using, **kwargs) + +The implementation provided here implements a multi-database strategy +where all objects of a given type are stored on a specific database +(e.g., all ``User`` objects are in the ``other`` database). If your +usage of multiple databases is more complex, your ``ModelAdmin`` will +need to reflect that strategy. + +Inlines can be handled in a similar fashion. They require three customized methods:: + + class MultiDBTabularInline(admin.TabularInline): + using = 'other' + + def queryset(self, request): + # Tell Django to look for inline objects on the 'other' database. + return super(MultiDBTabularInline, self).queryset(request).using(self.using) + + def formfield_for_foreignkey(self, db_field, request=None, **kwargs): + # Tell Django to populate ForeignKey widgets using a query + # on the 'other' database. + return super(MultiDBTabularInline, self).formfield_for_foreignkey(db_field, request=request, using=self.using, **kwargs) + + def formfield_for_manytomany(self, db_field, request=None, **kwargs): + # Tell Django to populate ManyToMany widgets using a query + # on the 'other' database. + return super(MultiDBTabularInline, self).formfield_for_manytomany(db_field, request=request, using=self.using, **kwargs) + +Once you've written your model admin definitions, they can be +registered with any ``Admin`` instance:: + + from django.contrib import admin + + # Specialize the multi-db admin objects for use with specific models. + class BookInline(MultiDBTabularInline): + model = Book + + class PublisherAdmin(MultiDBModelAdmin): + inlines = [BookInline] + + admin.site.register(Author, MultiDBModelAdmin) + admin.site.register(Publisher, PublisherAdmin) + + othersite = admin.Site('othersite') + othersite.register(Publisher, MultiDBModelAdmin) + +This example sets up two admin sites. On the first site, the +``Author`` and ``Publisher`` objects are exposed; ``Publisher`` +objects have an tabular inline showing books published by that +publisher. The second site exposes just publishers, without the +inlines. + +Using raw cursors with multiple databases +========================================= + +If you are using more than one database you can use +``django.db.connections`` to obtain the connection (and cursor) for a +specific database. ``django.db.connections`` is a dictionary-like +object that allows you to retrieve a specific connection using it's +alias:: + + from django.db import connections + cursor = connections['my_db_alias'].cursor() + +Limitations of multiple databases +================================= + +.. _no_cross_database_relations: + +Cross-database relations +------------------------ + +Django doesn't currently provide any support for foreign key or +many-to-many relationships spanning multiple databases. If you +have used a router to partition models to different databases, +any foreign key and many-to-many relationships defined by those +models must be internal to a single database. + +This is because of referential integrity. In order to maintain a +relationship between two objects, Django needs to know that the +primary key of the related object is valid. If the primary key is +stored on a separate database, it's not possible to easily evaluate +the validity of a primary key. + +If you're using Postgres, Oracle, or MySQL with InnoDB, this is +enforced at the database integrity level -- database level key +constraints prevent the creation of relations that can't be validated. + +However, if you're using SQLite or MySQL with MyISAM tables, there is +no enforced referential integrity; as a result, you may be able to +'fake' cross database foreign keys. However, this configuration is not +officially supported by Django.