parts/django/docs/topics/http/file-uploads.txt
author Nishanth Amuluru <nishanth@fossee.in>
Tue, 11 Jan 2011 14:57:16 +0530
changeset 381 da4c6b1cec7d
parent 307 c6bca38c1cbf
permissions -rw-r--r--
add reviewer works now

============
File Uploads
============

.. currentmodule:: django.core.files

.. versionadded:: 1.0

When Django handles a file upload, the file data ends up placed in
:attr:`request.FILES <django.http.HttpRequest.FILES>` (for more on the
``request`` object see the documentation for :doc:`request and response objects
</ref/request-response>`). This document explains how files are stored on disk
and in memory, and how to customize the default behavior.

Basic file uploads
==================

Consider a simple form containing a :class:`~django.forms.FileField`::

    from django import forms

    class UploadFileForm(forms.Form):
        title = forms.CharField(max_length=50)
        file  = forms.FileField()

A view handling this form will receive the file data in
:attr:`request.FILES <django.http.HttpRequest.FILES>`, which is a dictionary
containing a key for each :class:`~django.forms.FileField` (or
:class:`~django.forms.ImageField`, or other :class:`~django.forms.FileField`
subclass) in the form. So the data from the above form would
be accessible as ``request.FILES['file']``.

Note that :attr:`request.FILES <django.http.HttpRequest.FILES>` will only
contain data if the request method was ``POST`` and the ``<form>`` that posted
the request has the attribute ``enctype="multipart/form-data"``. Otherwise,
``request.FILES`` will be empty.

Most of the time, you'll simply pass the file data from ``request`` into the
form as described in :ref:`binding-uploaded-files`. This would look
something like::

    from django.http import HttpResponseRedirect
    from django.shortcuts import render_to_response

    # Imaginary function to handle an uploaded file.
    from somewhere import handle_uploaded_file

    def upload_file(request):
        if request.method == 'POST':
            form = UploadFileForm(request.POST, request.FILES)
            if form.is_valid():
                handle_uploaded_file(request.FILES['file'])
                return HttpResponseRedirect('/success/url/')
        else:
            form = UploadFileForm()
        return render_to_response('upload.html', {'form': form})

Notice that we have to pass :attr:`request.FILES <django.http.HttpRequest.FILES>`
into the form's constructor; this is how file data gets bound into a form.

Handling uploaded files
-----------------------

The final piece of the puzzle is handling the actual file data from
:attr:`request.FILES <django.http.HttpRequest.FILES>`. Each entry in this
dictionary is an ``UploadedFile`` object -- a simple wrapper around an uploaded
file. You'll usually use one of these methods to access the uploaded content:

    ``UploadedFile.read()``
        Read the entire uploaded data from the file. Be careful with this
        method: if the uploaded file is huge it can overwhelm your system if you
        try to read it into memory. You'll probably want to use ``chunks()``
        instead; see below.

    ``UploadedFile.multiple_chunks()``
        Returns ``True`` if the uploaded file is big enough to require
        reading in multiple chunks. By default this will be any file
        larger than 2.5 megabytes, but that's configurable; see below.

    ``UploadedFile.chunks()``
        A generator returning chunks of the file. If ``multiple_chunks()`` is
        ``True``, you should use this method in a loop instead of ``read()``.

        In practice, it's often easiest simply to use ``chunks()`` all the time;
        see the example below.

    ``UploadedFile.name``
        The name of the uploaded file (e.g. ``my_file.txt``).

    ``UploadedFile.size``
        The size, in bytes, of the uploaded file.

There are a few other methods and attributes available on ``UploadedFile``
objects; see `UploadedFile objects`_ for a complete reference.

Putting it all together, here's a common way you might handle an uploaded file::

    def handle_uploaded_file(f):
        destination = open('some/file/name.txt', 'wb+')
        for chunk in f.chunks():
            destination.write(chunk)
        destination.close()

Looping over ``UploadedFile.chunks()`` instead of using ``read()`` ensures that
large files don't overwhelm your system's memory.

Where uploaded data is stored
-----------------------------

Before you save uploaded files, the data needs to be stored somewhere.

By default, if an uploaded file is smaller than 2.5 megabytes, Django will hold
the entire contents of the upload in memory. This means that saving the file
involves only a read from memory and a write to disk and thus is very fast.

However, if an uploaded file is too large, Django will write the uploaded file
to a temporary file stored in your system's temporary directory. On a Unix-like
platform this means you can expect Django to generate a file called something
like ``/tmp/tmpzfp6I6.upload``. If an upload is large enough, you can watch this
file grow in size as Django streams the data onto disk.

These specifics -- 2.5 megabytes; ``/tmp``; etc. -- are simply "reasonable
defaults". Read on for details on how you can customize or completely replace
upload behavior.

Changing upload handler behavior
--------------------------------

Three settings control Django's file upload behavior:

    :setting:`FILE_UPLOAD_MAX_MEMORY_SIZE`
        The maximum size, in bytes, for files that will be uploaded into memory.
        Files larger than :setting:`FILE_UPLOAD_MAX_MEMORY_SIZE` will be
        streamed to disk.

        Defaults to 2.5 megabytes.

    :setting:`FILE_UPLOAD_TEMP_DIR`
        The directory where uploaded files larger than
        :setting:`FILE_UPLOAD_MAX_MEMORY_SIZE` will be stored.

        Defaults to your system's standard temporary directory (i.e. ``/tmp`` on
        most Unix-like systems).

    :setting:`FILE_UPLOAD_PERMISSIONS`
        The numeric mode (i.e. ``0644``) to set newly uploaded files to. For
        more information about what these modes mean, see the `documentation for
        os.chmod`_

        If this isn't given or is ``None``, you'll get operating-system
        dependent behavior. On most platforms, temporary files will have a mode
        of ``0600``, and files saved from memory will be saved using the
        system's standard umask.

        .. warning::

            If you're not familiar with file modes, please note that the leading
            ``0`` is very important: it indicates an octal number, which is the
            way that modes must be specified. If you try to use ``644``, you'll
            get totally incorrect behavior.

            **Always prefix the mode with a 0.**

    :setting:`FILE_UPLOAD_HANDLERS`
        The actual handlers for uploaded files. Changing this setting allows
        complete customization -- even replacement -- of Django's upload
        process. See `upload handlers`_, below, for details.

        Defaults to::

            ("django.core.files.uploadhandler.MemoryFileUploadHandler",
             "django.core.files.uploadhandler.TemporaryFileUploadHandler",)

        Which means "try to upload to memory first, then fall back to temporary
        files."

.. _documentation for os.chmod: http://docs.python.org/library/os.html#os.chmod

``UploadedFile`` objects
========================

.. class:: UploadedFile

In addition to those inherited from :class:`File`, all ``UploadedFile`` objects
define the following methods/attributes:

    ``UploadedFile.content_type``
        The content-type header uploaded with the file (e.g. ``text/plain`` or
        ``application/pdf``). Like any data supplied by the user, you shouldn't
        trust that the uploaded file is actually this type. You'll still need to
        validate that the file contains the content that the content-type header
        claims -- "trust but verify."

    ``UploadedFile.charset``
        For ``text/*`` content-types, the character set (i.e. ``utf8``) supplied
        by the browser. Again, "trust but verify" is the best policy here.

    ``UploadedFile.temporary_file_path()``
        Only files uploaded onto disk will have this method; it returns the full
        path to the temporary uploaded file.

.. note::

    Like regular Python files, you can read the file line-by-line simply by
    iterating over the uploaded file:

    .. code-block:: python

        for line in uploadedfile:
            do_something_with(line)

    However, *unlike* standard Python files, :class:`UploadedFile` only
    understands ``\n`` (also known as "Unix-style") line endings. If you know
    that you need to handle uploaded files with different line endings, you'll
    need to do so in your view.

Upload Handlers
===============

When a user uploads a file, Django passes off the file data to an *upload
handler* -- a small class that handles file data as it gets uploaded. Upload
handlers are initially defined in the ``FILE_UPLOAD_HANDLERS`` setting, which
defaults to::

    ("django.core.files.uploadhandler.MemoryFileUploadHandler",
     "django.core.files.uploadhandler.TemporaryFileUploadHandler",)

Together the ``MemoryFileUploadHandler`` and ``TemporaryFileUploadHandler``
provide Django's default file upload behavior of reading small files into memory
and large ones onto disk.

You can write custom handlers that customize how Django handles files. You
could, for example, use custom handlers to enforce user-level quotas, compress
data on the fly, render progress bars, and even send data to another storage
location directly without storing it locally.

Modifying upload handlers on the fly
------------------------------------

Sometimes particular views require different upload behavior. In these cases,
you can override upload handlers on a per-request basis by modifying
``request.upload_handlers``. By default, this list will contain the upload
handlers given by ``FILE_UPLOAD_HANDLERS``, but you can modify the list as you
would any other list.

For instance, suppose you've written a ``ProgressBarUploadHandler`` that
provides feedback on upload progress to some sort of AJAX widget. You'd add this
handler to your upload handlers like this::

    request.upload_handlers.insert(0, ProgressBarUploadHandler())

You'd probably want to use ``list.insert()`` in this case (instead of
``append()``) because a progress bar handler would need to run *before* any
other handlers. Remember, the upload handlers are processed in order.

If you want to replace the upload handlers completely, you can just assign a new
list::

   request.upload_handlers = [ProgressBarUploadHandler()]

.. note::

    You can only modify upload handlers *before* accessing
    ``request.POST`` or ``request.FILES`` -- it doesn't make sense to
    change upload handlers after upload handling has already
    started. If you try to modify ``request.upload_handlers`` after
    reading from ``request.POST`` or ``request.FILES`` Django will
    throw an error.

    Thus, you should always modify uploading handlers as early in your view as
    possible.

    Also, ``request.POST`` is accessed by
    :class:`~django.middleware.csrf.CsrfViewMiddleware` which is enabled by
    default. This means you will probably need to use
    :func:`~django.views.decorators.csrf.csrf_exempt` on your view to allow you
    to change the upload handlers. Assuming you do need CSRF protection, you
    will then need to use :func:`~django.views.decorators.csrf.csrf_protect` on
    the function that actually processes the request.  Note that this means that
    the handlers may start receiving the file upload before the CSRF checks have
    been done. Example code:

    .. code-block:: python

        from django.views.decorators.csrf import csrf_exempt, csrf_protect

        @csrf_exempt
        def upload_file_view(request):
            request.upload_handlers.insert(0, ProgressBarUploadHandler())
            return _upload_file_view(request)

        @csrf_protect
        def _upload_file_view(request):
            ... # Process request


Writing custom upload handlers
------------------------------

All file upload handlers should be subclasses of
``django.core.files.uploadhandler.FileUploadHandler``. You can define upload
handlers wherever you wish.

Required methods
~~~~~~~~~~~~~~~~

Custom file upload handlers **must** define the following methods:

    ``FileUploadHandler.receive_data_chunk(self, raw_data, start)``
        Receives a "chunk" of data from the file upload.

        ``raw_data`` is a byte string containing the uploaded data.

        ``start`` is the position in the file where this ``raw_data`` chunk
        begins.

        The data you return will get fed into the subsequent upload handlers'
        ``receive_data_chunk`` methods. In this way, one handler can be a
        "filter" for other handlers.

        Return ``None`` from ``receive_data_chunk`` to sort-circuit remaining
        upload handlers from getting this chunk.. This is useful if you're
        storing the uploaded data yourself and don't want future handlers to
        store a copy of the data.

        If you raise a ``StopUpload`` or a ``SkipFile`` exception, the upload
        will abort or the file will be completely skipped.

    ``FileUploadHandler.file_complete(self, file_size)``
        Called when a file has finished uploading.

        The handler should return an ``UploadedFile`` object that will be stored
        in ``request.FILES``. Handlers may also return ``None`` to indicate that
        the ``UploadedFile`` object should come from subsequent upload handlers.

Optional methods
~~~~~~~~~~~~~~~~

Custom upload handlers may also define any of the following optional methods or
attributes:

    ``FileUploadHandler.chunk_size``
        Size, in bytes, of the "chunks" Django should store into memory and feed
        into the handler. That is, this attribute controls the size of chunks
        fed into ``FileUploadHandler.receive_data_chunk``.

        For maximum performance the chunk sizes should be divisible by ``4`` and
        should not exceed 2 GB (2\ :sup:`31` bytes) in size. When there are
        multiple chunk sizes provided by multiple handlers, Django will use the
        smallest chunk size defined by any handler.

        The default is 64*2\ :sup:`10` bytes, or 64 KB.

    ``FileUploadHandler.new_file(self, field_name, file_name, content_type, content_length, charset)``
        Callback signaling that a new file upload is starting. This is called
        before any data has been fed to any upload handlers.

        ``field_name`` is a string name of the file ``<input>`` field.

        ``file_name`` is the unicode filename that was provided by the browser.

        ``content_type`` is the MIME type provided by the browser -- E.g.
        ``'image/jpeg'``.

        ``content_length`` is the length of the image given by the browser.
        Sometimes this won't be provided and will be ``None``.

        ``charset`` is the character set (i.e. ``utf8``) given by the browser.
        Like ``content_length``, this sometimes won't be provided.

        This method may raise a ``StopFutureHandlers`` exception to prevent
        future handlers from handling this file.

    ``FileUploadHandler.upload_complete(self)``
        Callback signaling that the entire upload (all files) has completed.

    ``FileUploadHandler.handle_raw_input(self, input_data, META, content_length, boundary, encoding)``
        Allows the handler to completely override the parsing of the raw
        HTTP input.

        ``input_data`` is a file-like object that supports ``read()``-ing.

        ``META`` is the same object as ``request.META``.

        ``content_length`` is the length of the data in ``input_data``. Don't
        read more than ``content_length`` bytes from ``input_data``.

        ``boundary`` is the MIME boundary for this request.

        ``encoding`` is the encoding of the request.

        Return ``None`` if you want upload handling to continue, or a tuple of
        ``(POST, FILES)`` if you want to return the new data structures suitable
        for the request directly.