|
1 Comment Example |
|
2 =============== |
|
3 |
|
4 .. contents:: |
|
5 |
|
6 Introduction |
|
7 ------------ |
|
8 |
|
9 This is an example of how to write WSGI middleware with WebOb. The |
|
10 specific example adds a simple comment form to HTML web pages; any |
|
11 page served through the middleware that is HTML gets a comment form |
|
12 added to it, and shows any existing comments. |
|
13 |
|
14 Code |
|
15 ---- |
|
16 |
|
17 The finished code for this is available in |
|
18 `docs/comment-example-code/example.py |
|
19 <http://svn.pythonpaste.org/Paste/WebOb/trunk/docs/comment-example-code/example.py>`_ |
|
20 -- you can run that file as a script to try it out. |
|
21 |
|
22 Instantiating Middleware |
|
23 ------------------------ |
|
24 |
|
25 Middleware of any complexity at all is usually best created as a |
|
26 class with its configuration as arguments to that class. |
|
27 |
|
28 Every middleware needs an application (``app``) that it wraps. This |
|
29 middleware also needs a location to store the comments; we'll put them |
|
30 all in a single directory. |
|
31 |
|
32 .. code-block:: |
|
33 |
|
34 import os |
|
35 |
|
36 class Commenter(object): |
|
37 def __init__(self, app, storage_dir): |
|
38 self.app = app |
|
39 self.storage_dir = storage_dir |
|
40 if not os.path.exists(storage_dir): |
|
41 os.makedirs(storage_dir) |
|
42 |
|
43 When you use this middleware, you'll use it like: |
|
44 |
|
45 .. code-block:: |
|
46 |
|
47 app = ... make the application ... |
|
48 app = Commenter(app, storage_dir='./comments') |
|
49 |
|
50 For our application we'll use a simple static file server that is |
|
51 included with `Paste <http://pythonpaste.org>`_ (use ``easy_install |
|
52 Paste`` to install this). The setup is all at the bottom of |
|
53 ``example.py``, and looks like this: |
|
54 |
|
55 .. code-block:: |
|
56 |
|
57 if __name__ == '__main__': |
|
58 import optparse |
|
59 parser = optparse.OptionParser( |
|
60 usage='%prog --port=PORT BASE_DIRECTORY' |
|
61 ) |
|
62 parser.add_option( |
|
63 '-p', '--port', |
|
64 default='8080', |
|
65 dest='port', |
|
66 type='int', |
|
67 help='Port to serve on (default 8080)') |
|
68 parser.add_option( |
|
69 '--comment-data', |
|
70 default='./comments', |
|
71 dest='comment_data', |
|
72 help='Place to put comment data into (default ./comments/)') |
|
73 options, args = parser.parse_args() |
|
74 if not args: |
|
75 parser.error('You must give a BASE_DIRECTORY') |
|
76 base_dir = args[0] |
|
77 from paste.urlparser import StaticURLParser |
|
78 app = StaticURLParser(base_dir) |
|
79 app = Commenter(app, options.comment_data) |
|
80 from wsgiref.simple_server import make_server |
|
81 httpd = make_server('localhost', options.port, app) |
|
82 print 'Serving on http://localhost:%s' % options.port |
|
83 try: |
|
84 httpd.serve_forever() |
|
85 except KeyboardInterrupt: |
|
86 print '^C' |
|
87 |
|
88 I won't explain it here, but basically it takes some options, creates |
|
89 an application that serves static files |
|
90 (``StaticURLParser(base_dir)``), wraps it with ``Commenter(app, |
|
91 options.comment_data)`` then serves that. |
|
92 |
|
93 The Middleware |
|
94 -------------- |
|
95 |
|
96 While we've created the class structure for the middleware, it doesn't |
|
97 actually do anything. Here's a kind of minimal version of the |
|
98 middleware (using WebOb): |
|
99 |
|
100 .. code-block:: |
|
101 |
|
102 from webob import Request |
|
103 |
|
104 class Commenter(object): |
|
105 |
|
106 def __init__(self, app, storage_dir): |
|
107 self.app = app |
|
108 self.storage_dir = storage_dir |
|
109 if not os.path.exists(storage_dir): |
|
110 os.makedirs(storage_dir) |
|
111 |
|
112 def __call__(self, environ, start_response): |
|
113 req = Request(environ) |
|
114 resp = req.get_response(self.app) |
|
115 return resp(environ, start_response) |
|
116 |
|
117 This doesn't modify the response it any way. You could write it like |
|
118 this without WebOb: |
|
119 |
|
120 .. code-block:: |
|
121 |
|
122 class Commenter(object): |
|
123 ... |
|
124 def __call__(self, environ, start_response): |
|
125 return self.app(environ, start_response) |
|
126 |
|
127 But it won't be as convenient later. First, lets create a little bit |
|
128 of infrastructure for our middleware. We need to save and load |
|
129 per-url data (the comments themselves). We'll keep them in pickles, |
|
130 where each url has a pickle named after the url (but double-quoted, so |
|
131 ``http://localhost:8080/index.html`` becomes |
|
132 ``http%3A%2F%2Flocalhost%3A8080%2Findex.html``). |
|
133 |
|
134 .. code-block:: |
|
135 |
|
136 from cPickle import load, dump |
|
137 |
|
138 class Commenter(object): |
|
139 ... |
|
140 |
|
141 def get_data(self, url): |
|
142 filename = self.url_filename(url) |
|
143 if not os.path.exists(filename): |
|
144 return [] |
|
145 else: |
|
146 f = open(filename, 'rb') |
|
147 data = load(f) |
|
148 f.close() |
|
149 return data |
|
150 |
|
151 def save_data(self, url, data): |
|
152 filename = self.url_filename(url) |
|
153 f = open(filename, 'wb') |
|
154 dump(data, f) |
|
155 f.close() |
|
156 |
|
157 def url_filename(self, url): |
|
158 # Double-quoting makes the filename safe |
|
159 return os.path.join(self.storage_dir, urllib.quote(url, '')) |
|
160 |
|
161 You can get the full request URL with ``req.url``, so to get the |
|
162 comment data with these methods you do ``data = |
|
163 self.get_data(req.url)``. |
|
164 |
|
165 Now we'll update the ``__call__`` method to filter *some* responses, |
|
166 and get the comment data for those. We don't want to change responses |
|
167 that were error responses (anything but ``200``), nor do we want to |
|
168 filter responses that aren't HTML. So we get: |
|
169 |
|
170 .. code-block:: |
|
171 |
|
172 class Commenter(object): |
|
173 ... |
|
174 |
|
175 def __call__(self, environ, start_response): |
|
176 req = Request(environ) |
|
177 resp = req.get_response(self.app) |
|
178 if resp.content_type != 'text/html' or resp.status_int != 200: |
|
179 return resp(environ, start_response) |
|
180 data = self.get_data(req.url) |
|
181 ... do stuff with data, update resp ... |
|
182 return resp(environ, start_response) |
|
183 |
|
184 So far we're punting on actually adding the comments to the page. We |
|
185 also haven't defined what ``data`` will hold. Let's say it's a list |
|
186 of dictionaries, where each dictionary looks like ``{'name': 'John |
|
187 Doe', 'homepage': 'http://blog.johndoe.com', 'comments': 'Great |
|
188 site!'}``. |
|
189 |
|
190 We'll also need a simple method to add stuff to the page. We'll use a |
|
191 regular expression to find the end of the page and put text in: |
|
192 |
|
193 .. code-block:: |
|
194 |
|
195 import re |
|
196 |
|
197 class Commenter(object): |
|
198 ... |
|
199 |
|
200 _end_body_re = re.compile(r'</body.*?>', re.I|re.S) |
|
201 |
|
202 def add_to_end(self, html, extra_html): |
|
203 """ |
|
204 Adds extra_html to the end of the html page (before </body>) |
|
205 """ |
|
206 match = self._end_body_re.search(html) |
|
207 if not match: |
|
208 return html + extra_html |
|
209 else: |
|
210 return html[:match.start()] + extra_html + html[match.start():] |
|
211 |
|
212 And then we'll use it like: |
|
213 |
|
214 .. code-block:: |
|
215 |
|
216 data = self.get_data(req.url) |
|
217 body = resp.body |
|
218 body = self.add_to_end(body, self.format_comments(data)) |
|
219 resp.body = body |
|
220 return resp(environ, start_response) |
|
221 |
|
222 We get the body, update it, and put it back in the response. This |
|
223 also updates ``Content-Length``. Then we define: |
|
224 |
|
225 .. code-block:: |
|
226 |
|
227 from webob import html_escape |
|
228 |
|
229 class Commenter(object): |
|
230 ... |
|
231 |
|
232 def format_comments(self, comments): |
|
233 if not comments: |
|
234 return '' |
|
235 text = [] |
|
236 text.append('<hr>') |
|
237 text.append('<h2><a name="comment-area"></a>Comments (%s):</h2>' % len(comments)) |
|
238 for comment in comments: |
|
239 text.append('<h3><a href="%s">%s</a> at %s:</h3>' % ( |
|
240 html_escape(comment['homepage']), html_escape(comment['name']), |
|
241 time.strftime('%c', comment['time']))) |
|
242 # Susceptible to XSS attacks!: |
|
243 text.append(comment['comments']) |
|
244 return ''.join(text) |
|
245 |
|
246 We put in a header (with an anchor we'll use later), and a section for |
|
247 each comment. Note that ``html_escape`` is the same as ``cgi.escape`` |
|
248 and just turns ``&`` into ``&``, etc. |
|
249 |
|
250 Because we put in some text without quoting it is susceptible to a |
|
251 `Cross-Site Scripting |
|
252 <http://en.wikipedia.org/wiki/Cross-site_scripting>`_ attack. Fixing |
|
253 that is beyond the scope of this tutorial; you could quote it or clean |
|
254 it with something like `lxml.html.clean |
|
255 <http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html>`_. |
|
256 |
|
257 Accepting Comments |
|
258 ------------------ |
|
259 |
|
260 All of those pieces *display* comments, but still no one can actually |
|
261 make comments. To handle this we'll take a little piece of the URL |
|
262 space for our own, everything under ``/.comments``, so when someone |
|
263 POSTs there it will add a comment. |
|
264 |
|
265 When the request comes in there are two parts to the path: |
|
266 ``SCRIPT_NAME`` and ``PATH_INFO``. Everything in ``SCRIPT_NAME`` has |
|
267 already been parsed, and everything in ``PATH_INFO`` has yet to be |
|
268 parsed. That means that the URL *without* ``PATH_INFO`` is the path |
|
269 to the middleware; we can intercept anything else below |
|
270 ``SCRIPT_NAME`` but nothing above it. The name for the URL without |
|
271 ``PATH_INFO`` is ``req.application_url``. We have to capture it early |
|
272 to make sure it doesn't change (since the WSGI application we are |
|
273 wrapping may update ``SCRIPT_NAME`` and ``PATH_INFO``). |
|
274 |
|
275 So here's what this all looks like: |
|
276 |
|
277 .. code-block:: |
|
278 |
|
279 class Commenter(object): |
|
280 ... |
|
281 |
|
282 def __call__(self, environ, start_response): |
|
283 req = Request(environ) |
|
284 if req.path_info_peek() == '.comments': |
|
285 return self.process_comment(req)(environ, start_response) |
|
286 # This is the base path of *this* middleware: |
|
287 base_url = req.application_url |
|
288 resp = req.get_response(self.app) |
|
289 if resp.content_type != 'text/html' or resp.status_int != 200: |
|
290 # Not an HTML response, we don't want to |
|
291 # do anything to it |
|
292 return resp(environ, start_response) |
|
293 # Make sure the content isn't gzipped: |
|
294 resp.decode_content() |
|
295 comments = self.get_data(req.url) |
|
296 body = resp.body |
|
297 body = self.add_to_end(body, self.format_comments(comments)) |
|
298 body = self.add_to_end(body, self.submit_form(base_url, req)) |
|
299 resp.body = body |
|
300 return resp(environ, start_response) |
|
301 |
|
302 ``base_url`` is the path where the middleware is located (if you run |
|
303 the example server, it will be ``http://localhost:PORT/``). We use |
|
304 ``req.path_info_peek()`` to look at the next segment of the URL -- |
|
305 what comes after base_url. If it is ``.comments`` then we handle it |
|
306 internally and don't pass the request on. |
|
307 |
|
308 We also put in a little guard, ``resp.decode_content()`` in case the |
|
309 application returns a gzipped response. |
|
310 |
|
311 Then we get the data, add the comments, add the *form* to make new |
|
312 comments, and return the result. |
|
313 |
|
314 submit_form |
|
315 ~~~~~~~~~~~ |
|
316 |
|
317 Here's what the form looks like: |
|
318 |
|
319 .. code-block:: |
|
320 |
|
321 class Commenter(object): |
|
322 ... |
|
323 |
|
324 def submit_form(self, base_path, req): |
|
325 return '''<h2>Leave a comment:</h2> |
|
326 <form action="%s/.comments" method="POST"> |
|
327 <input type="hidden" name="url" value="%s"> |
|
328 <table width="100%%"> |
|
329 <tr><td>Name:</td> |
|
330 <td><input type="text" name="name" style="width: 100%%"></td></tr> |
|
331 <tr><td>URL:</td> |
|
332 <td><input type="text" name="homepage" style="width: 100%%"></td></tr> |
|
333 </table> |
|
334 Comments:<br> |
|
335 <textarea name="comments" rows=10 style="width: 100%%"></textarea><br> |
|
336 <input type="submit" value="Submit comment"> |
|
337 </form> |
|
338 ''' % (base_path, html_escape(req.url)) |
|
339 |
|
340 Nothing too exciting. It submits a form with the keys ``url`` (the |
|
341 URL being commented on), ``name``, ``homepage``, and ``comments``. |
|
342 |
|
343 process_comment |
|
344 ~~~~~~~~~~~~~~~ |
|
345 |
|
346 If you look at the method call, what we do is call the method then |
|
347 treat the result as a WSGI application: |
|
348 |
|
349 .. code-block:: |
|
350 |
|
351 return self.process_comment(req)(environ, start_response) |
|
352 |
|
353 You could write this as: |
|
354 |
|
355 .. code-block:: |
|
356 |
|
357 response = self.process_comment(req) |
|
358 return response(environ, start_response) |
|
359 |
|
360 A common pattern in WSGI middleware that *doesn't* use WebOb is to |
|
361 just do: |
|
362 |
|
363 .. code-block:: |
|
364 |
|
365 return self.process_comment(environ, start_response) |
|
366 |
|
367 But the WebOb style makes it easier to modify the response if you want |
|
368 to; modifying a traditional WSGI response/application output requires |
|
369 changing your logic flow considerably. |
|
370 |
|
371 Here's the actual processing code: |
|
372 |
|
373 .. code-block:: |
|
374 |
|
375 from webob import exc |
|
376 from webob import Response |
|
377 |
|
378 class Commenter(object): |
|
379 ... |
|
380 |
|
381 def process_comment(self, req): |
|
382 try: |
|
383 url = req.params['url'] |
|
384 name = req.params['name'] |
|
385 homepage = req.params['homepage'] |
|
386 comments = req.params['comments'] |
|
387 except KeyError, e: |
|
388 resp = exc.HTTPBadRequest('Missing parameter: %s' % e) |
|
389 return resp |
|
390 data = self.get_data(url) |
|
391 data.append(dict( |
|
392 name=name, |
|
393 homepage=homepage, |
|
394 comments=comments, |
|
395 time=time.gmtime())) |
|
396 self.save_data(url, data) |
|
397 resp = exc.HTTPSeeOther(location=url+'#comment-area') |
|
398 return resp |
|
399 |
|
400 We either give a Bad Request response (if the form submission is |
|
401 somehow malformed), or a redirect back to the original page. |
|
402 |
|
403 The classes in ``webob.exc`` (like ``HTTPBadRequest`` and |
|
404 ``HTTPSeeOther``) are Response subclasses that can be used to quickly |
|
405 create responses for these non-200 cases where the response body |
|
406 usually doesn't matter much. |
|
407 |
|
408 Conclusion |
|
409 ---------- |
|
410 |
|
411 This shows how to make response modifying middleware, which is |
|
412 probably the most difficult kind of middleware to write with WSGI -- |
|
413 modifying the request is quite simple in comparison, as you simply |
|
414 update ``environ``. |