thirdparty/google_appengine/lib/webob/docs/comment-example.txt
changeset 109 620f9b141567
equal deleted inserted replaced
108:261778de26ff 109:620f9b141567
       
     1 Comment Example
       
     2 ===============
       
     3 
       
     4 .. contents::
       
     5 
       
     6 Introduction
       
     7 ------------
       
     8 
       
     9 This is an example of how to write WSGI middleware with WebOb.  The
       
    10 specific example adds a simple comment form to HTML web pages; any
       
    11 page served through the middleware that is HTML gets a comment form
       
    12 added to it, and shows any existing comments.
       
    13 
       
    14 Code
       
    15 ----
       
    16 
       
    17 The finished code for this is available in
       
    18 `docs/comment-example-code/example.py
       
    19 <http://svn.pythonpaste.org/Paste/WebOb/trunk/docs/comment-example-code/example.py>`_
       
    20 -- you can run that file as a script to try it out.
       
    21 
       
    22 Instantiating Middleware
       
    23 ------------------------
       
    24 
       
    25 Middleware of any complexity at all is usually best created as a
       
    26 class with its configuration as arguments to that class.
       
    27 
       
    28 Every middleware needs an application (``app``) that it wraps.  This
       
    29 middleware also needs a location to store the comments; we'll put them
       
    30 all in a single directory.
       
    31 
       
    32 .. code-block::
       
    33 
       
    34     import os
       
    35     
       
    36     class Commenter(object):
       
    37         def __init__(self, app, storage_dir):
       
    38             self.app = app
       
    39             self.storage_dir = storage_dir
       
    40             if not os.path.exists(storage_dir):
       
    41                 os.makedirs(storage_dir)
       
    42 
       
    43 When you use this middleware, you'll use it like:
       
    44 
       
    45 .. code-block::
       
    46 
       
    47     app = ... make the application ...
       
    48     app = Commenter(app, storage_dir='./comments')
       
    49 
       
    50 For our application we'll use a simple static file server that is
       
    51 included with `Paste <http://pythonpaste.org>`_ (use ``easy_install
       
    52 Paste`` to install this).  The setup is all at the bottom of
       
    53 ``example.py``, and looks like this:
       
    54 
       
    55 .. code-block::
       
    56 
       
    57     if __name__ == '__main__':
       
    58         import optparse
       
    59         parser = optparse.OptionParser(
       
    60             usage='%prog --port=PORT BASE_DIRECTORY'
       
    61             )
       
    62         parser.add_option(
       
    63             '-p', '--port',
       
    64             default='8080',
       
    65             dest='port',
       
    66             type='int',
       
    67             help='Port to serve on (default 8080)')
       
    68         parser.add_option(
       
    69             '--comment-data',
       
    70             default='./comments',
       
    71             dest='comment_data',
       
    72             help='Place to put comment data into (default ./comments/)')
       
    73         options, args = parser.parse_args()
       
    74         if not args:
       
    75             parser.error('You must give a BASE_DIRECTORY')
       
    76         base_dir = args[0]
       
    77         from paste.urlparser import StaticURLParser
       
    78         app = StaticURLParser(base_dir)
       
    79         app = Commenter(app, options.comment_data)
       
    80         from wsgiref.simple_server import make_server
       
    81         httpd = make_server('localhost', options.port, app)
       
    82         print 'Serving on http://localhost:%s' % options.port
       
    83         try:
       
    84             httpd.serve_forever()
       
    85         except KeyboardInterrupt:
       
    86             print '^C'
       
    87 
       
    88 I won't explain it here, but basically it takes some options, creates
       
    89 an application that serves static files
       
    90 (``StaticURLParser(base_dir)``), wraps it with ``Commenter(app,
       
    91 options.comment_data)`` then serves that.
       
    92 
       
    93 The Middleware
       
    94 --------------
       
    95 
       
    96 While we've created the class structure for the middleware, it doesn't
       
    97 actually do anything.  Here's a kind of minimal version of the
       
    98 middleware (using WebOb):
       
    99 
       
   100 .. code-block::
       
   101 
       
   102     from webob import Request
       
   103 
       
   104     class Commenter(object):
       
   105 
       
   106         def __init__(self, app, storage_dir):
       
   107             self.app = app
       
   108             self.storage_dir = storage_dir
       
   109             if not os.path.exists(storage_dir):
       
   110                 os.makedirs(storage_dir)
       
   111 
       
   112         def __call__(self, environ, start_response):
       
   113             req = Request(environ)
       
   114             resp = req.get_response(self.app)
       
   115             return resp(environ, start_response)
       
   116 
       
   117 This doesn't modify the response it any way.  You could write it like
       
   118 this without WebOb:
       
   119 
       
   120 .. code-block::
       
   121 
       
   122     class Commenter(object):
       
   123         ...
       
   124         def __call__(self, environ, start_response):
       
   125             return self.app(environ, start_response)
       
   126 
       
   127 But it won't be as convenient later.  First, lets create a little bit
       
   128 of infrastructure for our middleware.  We need to save and load
       
   129 per-url data (the comments themselves).  We'll keep them in pickles,
       
   130 where each url has a pickle named after the url (but double-quoted, so
       
   131 ``http://localhost:8080/index.html`` becomes
       
   132 ``http%3A%2F%2Flocalhost%3A8080%2Findex.html``).
       
   133 
       
   134 .. code-block::
       
   135 
       
   136     from cPickle import load, dump
       
   137 
       
   138     class Commenter(object):
       
   139         ...
       
   140 
       
   141         def get_data(self, url):
       
   142             filename = self.url_filename(url)
       
   143             if not os.path.exists(filename):
       
   144                 return []
       
   145             else:
       
   146                 f = open(filename, 'rb')
       
   147                 data = load(f)
       
   148                 f.close()
       
   149                 return data
       
   150 
       
   151         def save_data(self, url, data):
       
   152             filename = self.url_filename(url)
       
   153             f = open(filename, 'wb')
       
   154             dump(data, f)
       
   155             f.close()
       
   156 
       
   157         def url_filename(self, url):
       
   158             # Double-quoting makes the filename safe
       
   159             return os.path.join(self.storage_dir, urllib.quote(url, ''))
       
   160 
       
   161 You can get the full request URL with ``req.url``, so to get the
       
   162 comment data with these methods you do ``data =
       
   163 self.get_data(req.url)``.
       
   164 
       
   165 Now we'll update the ``__call__`` method to filter *some* responses,
       
   166 and get the comment data for those.  We don't want to change responses
       
   167 that were error responses (anything but ``200``), nor do we want to
       
   168 filter responses that aren't HTML.  So we get:
       
   169 
       
   170 .. code-block::
       
   171 
       
   172     class Commenter(object):
       
   173         ...
       
   174 
       
   175         def __call__(self, environ, start_response):
       
   176             req = Request(environ)
       
   177             resp = req.get_response(self.app)
       
   178             if resp.content_type != 'text/html' or resp.status_int != 200:
       
   179                 return resp(environ, start_response)
       
   180             data = self.get_data(req.url)
       
   181             ... do stuff with data, update resp ...
       
   182             return resp(environ, start_response)
       
   183 
       
   184 So far we're punting on actually adding the comments to the page.  We
       
   185 also haven't defined what ``data`` will hold.  Let's say it's a list
       
   186 of dictionaries, where each dictionary looks like ``{'name': 'John
       
   187 Doe', 'homepage': 'http://blog.johndoe.com', 'comments': 'Great
       
   188 site!'}``.
       
   189 
       
   190 We'll also need a simple method to add stuff to the page.  We'll use a
       
   191 regular expression to find the end of the page and put text in:
       
   192 
       
   193 .. code-block::
       
   194 
       
   195     import re
       
   196 
       
   197     class Commenter(object):
       
   198         ...
       
   199 
       
   200         _end_body_re = re.compile(r'</body.*?>', re.I|re.S)
       
   201 
       
   202         def add_to_end(self, html, extra_html):
       
   203             """
       
   204             Adds extra_html to the end of the html page (before </body>)
       
   205             """
       
   206             match = self._end_body_re.search(html)
       
   207             if not match:
       
   208                 return html + extra_html
       
   209             else:
       
   210                 return html[:match.start()] + extra_html + html[match.start():]
       
   211 
       
   212 And then we'll use it like:
       
   213 
       
   214 .. code-block::
       
   215 
       
   216     data = self.get_data(req.url)
       
   217     body = resp.body
       
   218     body = self.add_to_end(body, self.format_comments(data))
       
   219     resp.body = body
       
   220     return resp(environ, start_response)
       
   221 
       
   222 We get the body, update it, and put it back in the response.  This
       
   223 also updates ``Content-Length``.  Then we define:
       
   224 
       
   225 .. code-block::
       
   226 
       
   227     from webob import html_escape
       
   228 
       
   229     class Commenter(object):
       
   230         ...
       
   231 
       
   232         def format_comments(self, comments):
       
   233             if not comments:
       
   234                 return ''
       
   235             text = []
       
   236             text.append('<hr>')
       
   237             text.append('<h2><a name="comment-area"></a>Comments (%s):</h2>' % len(comments))
       
   238             for comment in comments:
       
   239                 text.append('<h3><a href="%s">%s</a> at %s:</h3>' % (
       
   240                     html_escape(comment['homepage']), html_escape(comment['name']), 
       
   241                     time.strftime('%c', comment['time'])))
       
   242                 # Susceptible to XSS attacks!:
       
   243                 text.append(comment['comments'])
       
   244             return ''.join(text)
       
   245 
       
   246 We put in a header (with an anchor we'll use later), and a section for
       
   247 each comment.  Note that ``html_escape`` is the same as ``cgi.escape``
       
   248 and just turns ``&`` into ``&amp;``, etc.  
       
   249 
       
   250 Because we put in some text without quoting it is susceptible to a
       
   251 `Cross-Site Scripting
       
   252 <http://en.wikipedia.org/wiki/Cross-site_scripting>`_ attack.  Fixing
       
   253 that is beyond the scope of this tutorial; you could quote it or clean
       
   254 it with something like `lxml.html.clean
       
   255 <http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html>`_.
       
   256 
       
   257 Accepting Comments
       
   258 ------------------
       
   259 
       
   260 All of those pieces *display* comments, but still no one can actually
       
   261 make comments.  To handle this we'll take a little piece of the URL
       
   262 space for our own, everything under ``/.comments``, so when someone
       
   263 POSTs there it will add a comment.
       
   264 
       
   265 When the request comes in there are two parts to the path:
       
   266 ``SCRIPT_NAME`` and ``PATH_INFO``.  Everything in ``SCRIPT_NAME`` has
       
   267 already been parsed, and everything in ``PATH_INFO`` has yet to be
       
   268 parsed.  That means that the URL *without* ``PATH_INFO`` is the path
       
   269 to the middleware; we can intercept anything else below
       
   270 ``SCRIPT_NAME`` but nothing above it.  The name for the URL without
       
   271 ``PATH_INFO`` is ``req.application_url``.  We have to capture it early
       
   272 to make sure it doesn't change (since the WSGI application we are
       
   273 wrapping may update ``SCRIPT_NAME`` and ``PATH_INFO``).
       
   274 
       
   275 So here's what this all looks like:
       
   276 
       
   277 .. code-block::
       
   278 
       
   279     class Commenter(object):
       
   280         ...
       
   281 
       
   282         def __call__(self, environ, start_response):
       
   283             req = Request(environ)
       
   284             if req.path_info_peek() == '.comments':
       
   285                 return self.process_comment(req)(environ, start_response)
       
   286             # This is the base path of *this* middleware:
       
   287             base_url = req.application_url
       
   288             resp = req.get_response(self.app)
       
   289             if resp.content_type != 'text/html' or resp.status_int != 200:
       
   290                 # Not an HTML response, we don't want to
       
   291                 # do anything to it
       
   292                 return resp(environ, start_response)
       
   293             # Make sure the content isn't gzipped:
       
   294             resp.decode_content()
       
   295             comments = self.get_data(req.url)
       
   296             body = resp.body
       
   297             body = self.add_to_end(body, self.format_comments(comments))
       
   298             body = self.add_to_end(body, self.submit_form(base_url, req))
       
   299             resp.body = body
       
   300             return resp(environ, start_response)
       
   301 
       
   302 ``base_url`` is the path where the middleware is located (if you run
       
   303 the example server, it will be ``http://localhost:PORT/``).  We use
       
   304 ``req.path_info_peek()`` to look at the next segment of the URL --
       
   305 what comes after base_url.  If it is ``.comments`` then we handle it
       
   306 internally and don't pass the request on.
       
   307 
       
   308 We also put in a little guard, ``resp.decode_content()`` in case the
       
   309 application returns a gzipped response.
       
   310 
       
   311 Then we get the data, add the comments, add the *form* to make new
       
   312 comments, and return the result.
       
   313 
       
   314 submit_form
       
   315 ~~~~~~~~~~~
       
   316 
       
   317 Here's what the form looks like:
       
   318 
       
   319 .. code-block::
       
   320 
       
   321     class Commenter(object):
       
   322         ...
       
   323 
       
   324         def submit_form(self, base_path, req):
       
   325             return '''<h2>Leave a comment:</h2>
       
   326             <form action="%s/.comments" method="POST">
       
   327              <input type="hidden" name="url" value="%s">
       
   328              <table width="100%%">
       
   329               <tr><td>Name:</td>
       
   330                   <td><input type="text" name="name" style="width: 100%%"></td></tr>
       
   331               <tr><td>URL:</td>
       
   332                   <td><input type="text" name="homepage" style="width: 100%%"></td></tr>
       
   333              </table>
       
   334              Comments:<br>
       
   335              <textarea name="comments" rows=10 style="width: 100%%"></textarea><br>
       
   336              <input type="submit" value="Submit comment">
       
   337             </form>
       
   338             ''' % (base_path, html_escape(req.url))
       
   339 
       
   340 Nothing too exciting.  It submits a form with the keys ``url`` (the
       
   341 URL being commented on), ``name``, ``homepage``, and ``comments``.
       
   342 
       
   343 process_comment
       
   344 ~~~~~~~~~~~~~~~
       
   345 
       
   346 If you look at the method call, what we do is call the method then
       
   347 treat the result as a WSGI application:
       
   348 
       
   349 .. code-block::
       
   350 
       
   351     return self.process_comment(req)(environ, start_response)
       
   352 
       
   353 You could write this as:
       
   354 
       
   355 .. code-block::
       
   356 
       
   357     response = self.process_comment(req)
       
   358     return response(environ, start_response)
       
   359 
       
   360 A common pattern in WSGI middleware that *doesn't* use WebOb is to
       
   361 just do:
       
   362 
       
   363 .. code-block::
       
   364 
       
   365     return self.process_comment(environ, start_response)
       
   366 
       
   367 But the WebOb style makes it easier to modify the response if you want
       
   368 to; modifying a traditional WSGI response/application output requires
       
   369 changing your logic flow considerably.
       
   370 
       
   371 Here's the actual processing code:
       
   372 
       
   373 .. code-block::
       
   374 
       
   375     from webob import exc
       
   376     from webob import Response
       
   377 
       
   378     class Commenter(object):
       
   379         ...
       
   380 
       
   381         def process_comment(self, req):
       
   382             try:
       
   383                 url = req.params['url']
       
   384                 name = req.params['name']
       
   385                 homepage = req.params['homepage']
       
   386                 comments = req.params['comments']
       
   387             except KeyError, e:
       
   388                 resp = exc.HTTPBadRequest('Missing parameter: %s' % e)
       
   389                 return resp
       
   390             data = self.get_data(url)
       
   391             data.append(dict(
       
   392                 name=name,
       
   393                 homepage=homepage,
       
   394                 comments=comments,
       
   395                 time=time.gmtime()))
       
   396             self.save_data(url, data)
       
   397             resp = exc.HTTPSeeOther(location=url+'#comment-area')
       
   398             return resp
       
   399 
       
   400 We either give a Bad Request response (if the form submission is
       
   401 somehow malformed), or a redirect back to the original page.
       
   402 
       
   403 The classes in ``webob.exc`` (like ``HTTPBadRequest`` and
       
   404 ``HTTPSeeOther``) are Response subclasses that can be used to quickly
       
   405 create responses for these non-200 cases where the response body
       
   406 usually doesn't matter much.
       
   407 
       
   408 Conclusion
       
   409 ----------
       
   410 
       
   411 This shows how to make response modifying middleware, which is
       
   412 probably the most difficult kind of middleware to write with WSGI --
       
   413 modifying the request is quite simple in comparison, as you simply
       
   414 update ``environ``.