app/feedparser/__init__.py
author Lennard de Rijk <ljvderijk@gmail.com>
Tue, 17 Feb 2009 17:28:54 +0000
changeset 1373 178bd19966fe
parent 151 6f8eb27752dc
permissions -rwxr-xr-x
Fixes the problem of <Entity> Saved not being shown whenever a new entity is created. Patch by: Madhusudan C.S. Reviewed by: Lennard de Rijk
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
140
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     1
#!/usr/bin/env python
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     2
"""Universal feed parser
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     3
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     4
Handles RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 feeds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     5
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     6
Visit http://feedparser.org/ for the latest version
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     7
Visit http://feedparser.org/docs/ for the latest documentation
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     8
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     9
Required: Python 2.1 or later
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    10
Recommended: Python 2.3 or later
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    11
Recommended: CJKCodecs and iconv_codec <http://cjkpython.i18n.org/>
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    12
"""
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    13
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    14
__version__ = "4.1"# + "$Revision: 1.92 $"[11:15] + "-cvs"
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    15
__license__ = """Copyright (c) 2002-2006, Mark Pilgrim, All rights reserved.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    16
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    17
Redistribution and use in source and binary forms, with or without modification,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    18
are permitted provided that the following conditions are met:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    19
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    20
* Redistributions of source code must retain the above copyright notice,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    21
  this list of conditions and the following disclaimer.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    22
* Redistributions in binary form must reproduce the above copyright notice,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    23
  this list of conditions and the following disclaimer in the documentation
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    24
  and/or other materials provided with the distribution.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    25
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    26
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    27
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    28
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    29
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    30
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    31
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    32
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    33
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    34
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    35
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    36
POSSIBILITY OF SUCH DAMAGE."""
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    37
__author__ = "Mark Pilgrim <http://diveintomark.org/>"
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    38
__contributors__ = ["Jason Diamond <http://injektilo.org/>",
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    39
                    "John Beimler <http://john.beimler.org/>",
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    40
                    "Fazal Majid <http://www.majid.info/mylos/weblog/>",
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    41
                    "Aaron Swartz <http://aaronsw.com/>",
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    42
                    "Kevin Marks <http://epeus.blogspot.com/>"]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    43
_debug = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    44
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    45
# HTTP "User-Agent" header to send to servers when downloading feeds.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    46
# If you are embedding feedparser in a larger application, you should
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    47
# change this to your application name and URL.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    48
USER_AGENT = "UniversalFeedParser/%s +http://feedparser.org/" % __version__
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    49
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    50
# HTTP "Accept" header to send to servers when downloading feeds.  If you don't
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    51
# want to send an Accept header, set this to None.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    52
ACCEPT_HEADER = "application/atom+xml,application/rdf+xml,application/rss+xml,application/x-netcdf,application/xml;q=0.9,text/xml;q=0.2,*/*;q=0.1"
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    53
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    54
# List of preferred XML parsers, by SAX driver name.  These will be tried first,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    55
# but if they're not installed, Python will keep searching through its own list
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    56
# of pre-installed parsers until it finds one that supports everything we need.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    57
PREFERRED_XML_PARSERS = ["drv_libxml2"]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    58
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    59
# If you want feedparser to automatically run HTML markup through HTML Tidy, set
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    60
# this to 1.  Requires mxTidy <http://www.egenix.com/files/python/mxTidy.html>
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    61
# or utidylib <http://utidylib.berlios.de/>.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    62
TIDY_MARKUP = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    63
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    64
# List of Python interfaces for HTML Tidy, in order of preference.  Only useful
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    65
# if TIDY_MARKUP = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    66
PREFERRED_TIDY_INTERFACES = ["uTidy", "mxTidy"]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    67
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    68
# ---------- required modules (should come with any Python distribution) ----------
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    69
import sgmllib, re, sys, copy, urlparse, time, rfc822, types, cgi, urllib, urllib2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    70
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    71
    from cStringIO import StringIO as _StringIO
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    72
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    73
    from StringIO import StringIO as _StringIO
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    74
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    75
# ---------- optional modules (feedparser will work without these, but with reduced functionality) ----------
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    76
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    77
# gzip is included with most Python distributions, but may not be available if you compiled your own
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    78
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    79
    import gzip
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    80
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    81
    gzip = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    82
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    83
    import zlib
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    84
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    85
    zlib = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    86
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    87
# If a real XML parser is available, feedparser will attempt to use it.  feedparser has
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    88
# been tested with the built-in SAX parser, PyXML, and libxml2.  On platforms where the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    89
# Python distribution does not come with an XML parser (such as Mac OS X 10.2 and some
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    90
# versions of FreeBSD), feedparser will quietly fall back on regex-based parsing.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    91
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    92
    import xml.sax
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    93
    xml.sax.make_parser(PREFERRED_XML_PARSERS) # test for valid parsers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    94
    from xml.sax.saxutils import escape as _xmlescape
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    95
    _XML_AVAILABLE = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    96
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    97
    _XML_AVAILABLE = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    98
    def _xmlescape(data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    99
        data = data.replace('&', '&amp;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   100
        data = data.replace('>', '&gt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   101
        data = data.replace('<', '&lt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   102
        return data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   103
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   104
# base64 support for Atom feeds that contain embedded binary data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   105
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   106
    import base64, binascii
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   107
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   108
    base64 = binascii = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   109
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   110
# cjkcodecs and iconv_codec provide support for more character encodings.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   111
# Both are available from http://cjkpython.i18n.org/
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   112
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   113
    import cjkcodecs.aliases
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   114
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   115
    pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   116
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   117
    import iconv_codec
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   118
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   119
    pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   120
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   121
# chardet library auto-detects character encodings
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   122
# Download from http://chardet.feedparser.org/
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   123
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   124
    import chardet
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   125
    if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   126
        import chardet.constants
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   127
        chardet.constants._debug = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   128
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   129
    chardet = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   130
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   131
# ---------- don't touch these ----------
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   132
class ThingsNobodyCaresAboutButMe(Exception): pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   133
class CharacterEncodingOverride(ThingsNobodyCaresAboutButMe): pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   134
class CharacterEncodingUnknown(ThingsNobodyCaresAboutButMe): pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   135
class NonXMLContentType(ThingsNobodyCaresAboutButMe): pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   136
class UndeclaredNamespace(Exception): pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   137
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   138
sgmllib.tagfind = re.compile('[a-zA-Z][-_.:a-zA-Z0-9]*')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   139
sgmllib.special = re.compile('<!')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   140
sgmllib.charref = re.compile('&#(x?[0-9A-Fa-f]+)[^0-9A-Fa-f]')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   141
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   142
SUPPORTED_VERSIONS = {'': 'unknown',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   143
                      'rss090': 'RSS 0.90',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   144
                      'rss091n': 'RSS 0.91 (Netscape)',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   145
                      'rss091u': 'RSS 0.91 (Userland)',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   146
                      'rss092': 'RSS 0.92',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   147
                      'rss093': 'RSS 0.93',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   148
                      'rss094': 'RSS 0.94',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   149
                      'rss20': 'RSS 2.0',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   150
                      'rss10': 'RSS 1.0',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   151
                      'rss': 'RSS (unknown version)',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   152
                      'atom01': 'Atom 0.1',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   153
                      'atom02': 'Atom 0.2',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   154
                      'atom03': 'Atom 0.3',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   155
                      'atom10': 'Atom 1.0',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   156
                      'atom': 'Atom (unknown version)',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   157
                      'cdf': 'CDF',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   158
                      'hotrss': 'Hot RSS'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   159
                      }
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   160
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   161
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   162
    UserDict = dict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   163
except NameError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   164
    # Python 2.1 does not have dict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   165
    from UserDict import UserDict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   166
    def dict(aList):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   167
        rc = {}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   168
        for k, v in aList:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   169
            rc[k] = v
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   170
        return rc
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   171
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   172
class FeedParserDict(UserDict):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   173
    keymap = {'channel': 'feed',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   174
              'items': 'entries',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   175
              'guid': 'id',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   176
              'date': 'updated',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   177
              'date_parsed': 'updated_parsed',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   178
              'description': ['subtitle', 'summary'],
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   179
              'url': ['href'],
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   180
              'modified': 'updated',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   181
              'modified_parsed': 'updated_parsed',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   182
              'issued': 'published',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   183
              'issued_parsed': 'published_parsed',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   184
              'copyright': 'rights',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   185
              'copyright_detail': 'rights_detail',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   186
              'tagline': 'subtitle',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   187
              'tagline_detail': 'subtitle_detail'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   188
    def __getitem__(self, key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   189
        if key == 'category':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   190
            return UserDict.__getitem__(self, 'tags')[0]['term']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   191
        if key == 'categories':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   192
            return [(tag['scheme'], tag['term']) for tag in UserDict.__getitem__(self, 'tags')]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   193
        realkey = self.keymap.get(key, key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   194
        if type(realkey) == types.ListType:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   195
            for k in realkey:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   196
                if UserDict.has_key(self, k):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   197
                    return UserDict.__getitem__(self, k)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   198
        if UserDict.has_key(self, key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   199
            return UserDict.__getitem__(self, key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   200
        return UserDict.__getitem__(self, realkey)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   201
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   202
    def __setitem__(self, key, value):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   203
        for k in self.keymap.keys():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   204
            if key == k:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   205
                key = self.keymap[k]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   206
                if type(key) == types.ListType:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   207
                    key = key[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   208
        return UserDict.__setitem__(self, key, value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   209
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   210
    def get(self, key, default=None):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   211
        if self.has_key(key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   212
            return self[key]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   213
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   214
            return default
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   215
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   216
    def setdefault(self, key, value):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   217
        if not self.has_key(key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   218
            self[key] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   219
        return self[key]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   220
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   221
    def has_key(self, key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   222
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   223
            return hasattr(self, key) or UserDict.has_key(self, key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   224
        except AttributeError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   225
            return False
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   226
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   227
    def __getattr__(self, key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   228
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   229
            return self.__dict__[key]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   230
        except KeyError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   231
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   232
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   233
            assert not key.startswith('_')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   234
            return self.__getitem__(key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   235
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   236
            raise AttributeError, "object has no attribute '%s'" % key
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   237
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   238
    def __setattr__(self, key, value):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   239
        if key.startswith('_') or key == 'data':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   240
            self.__dict__[key] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   241
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   242
            return self.__setitem__(key, value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   243
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   244
    def __contains__(self, key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   245
        return self.has_key(key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   246
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   247
def zopeCompatibilityHack():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   248
    global FeedParserDict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   249
    del FeedParserDict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   250
    def FeedParserDict(aDict=None):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   251
        rc = {}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   252
        if aDict:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   253
            rc.update(aDict)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   254
        return rc
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   255
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   256
_ebcdic_to_ascii_map = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   257
def _ebcdic_to_ascii(s):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   258
    global _ebcdic_to_ascii_map
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   259
    if not _ebcdic_to_ascii_map:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   260
        emap = (
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   261
            0,1,2,3,156,9,134,127,151,141,142,11,12,13,14,15,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   262
            16,17,18,19,157,133,8,135,24,25,146,143,28,29,30,31,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   263
            128,129,130,131,132,10,23,27,136,137,138,139,140,5,6,7,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   264
            144,145,22,147,148,149,150,4,152,153,154,155,20,21,158,26,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   265
            32,160,161,162,163,164,165,166,167,168,91,46,60,40,43,33,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   266
            38,169,170,171,172,173,174,175,176,177,93,36,42,41,59,94,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   267
            45,47,178,179,180,181,182,183,184,185,124,44,37,95,62,63,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   268
            186,187,188,189,190,191,192,193,194,96,58,35,64,39,61,34,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   269
            195,97,98,99,100,101,102,103,104,105,196,197,198,199,200,201,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   270
            202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   271
            209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   272
            216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   273
            123,65,66,67,68,69,70,71,72,73,232,233,234,235,236,237,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   274
            125,74,75,76,77,78,79,80,81,82,238,239,240,241,242,243,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   275
            92,159,83,84,85,86,87,88,89,90,244,245,246,247,248,249,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   276
            48,49,50,51,52,53,54,55,56,57,250,251,252,253,254,255
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   277
            )
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   278
        import string
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   279
        _ebcdic_to_ascii_map = string.maketrans( \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   280
            ''.join(map(chr, range(256))), ''.join(map(chr, emap)))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   281
    return s.translate(_ebcdic_to_ascii_map)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   282
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   283
_urifixer = re.compile('^([A-Za-z][A-Za-z0-9+-.]*://)(/*)(.*?)')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   284
def _urljoin(base, uri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   285
    uri = _urifixer.sub(r'\1\3', uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   286
    return urlparse.urljoin(base, uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   287
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   288
class _FeedParserMixin:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   289
    namespaces = {'': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   290
                  'http://backend.userland.com/rss': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   291
                  'http://blogs.law.harvard.edu/tech/rss': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   292
                  'http://purl.org/rss/1.0/': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   293
                  'http://my.netscape.com/rdf/simple/0.9/': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   294
                  'http://example.com/newformat#': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   295
                  'http://example.com/necho': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   296
                  'http://purl.org/echo/': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   297
                  'uri/of/echo/namespace#': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   298
                  'http://purl.org/pie/': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   299
                  'http://purl.org/atom/ns#': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   300
                  'http://www.w3.org/2005/Atom': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   301
                  'http://purl.org/rss/1.0/modules/rss091#': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   302
                  
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   303
                  'http://webns.net/mvcb/':                               'admin',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   304
                  'http://purl.org/rss/1.0/modules/aggregation/':         'ag',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   305
                  'http://purl.org/rss/1.0/modules/annotate/':            'annotate',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   306
                  'http://media.tangent.org/rss/1.0/':                    'audio',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   307
                  'http://backend.userland.com/blogChannelModule':        'blogChannel',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   308
                  'http://web.resource.org/cc/':                          'cc',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   309
                  'http://backend.userland.com/creativeCommonsRssModule': 'creativeCommons',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   310
                  'http://purl.org/rss/1.0/modules/company':              'co',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   311
                  'http://purl.org/rss/1.0/modules/content/':             'content',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   312
                  'http://my.theinfo.org/changed/1.0/rss/':               'cp',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   313
                  'http://purl.org/dc/elements/1.1/':                     'dc',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   314
                  'http://purl.org/dc/terms/':                            'dcterms',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   315
                  'http://purl.org/rss/1.0/modules/email/':               'email',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   316
                  'http://purl.org/rss/1.0/modules/event/':               'ev',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   317
                  'http://rssnamespace.org/feedburner/ext/1.0':           'feedburner',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   318
                  'http://freshmeat.net/rss/fm/':                         'fm',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   319
                  'http://xmlns.com/foaf/0.1/':                           'foaf',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   320
                  'http://www.w3.org/2003/01/geo/wgs84_pos#':             'geo',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   321
                  'http://postneo.com/icbm/':                             'icbm',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   322
                  'http://purl.org/rss/1.0/modules/image/':               'image',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   323
                  'http://www.itunes.com/DTDs/PodCast-1.0.dtd':           'itunes',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   324
                  'http://example.com/DTDs/PodCast-1.0.dtd':              'itunes',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   325
                  'http://purl.org/rss/1.0/modules/link/':                'l',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   326
                  'http://search.yahoo.com/mrss':                         'media',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   327
                  'http://madskills.com/public/xml/rss/module/pingback/': 'pingback',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   328
                  'http://prismstandard.org/namespaces/1.2/basic/':       'prism',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   329
                  'http://www.w3.org/1999/02/22-rdf-syntax-ns#':          'rdf',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   330
                  'http://www.w3.org/2000/01/rdf-schema#':                'rdfs',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   331
                  'http://purl.org/rss/1.0/modules/reference/':           'ref',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   332
                  'http://purl.org/rss/1.0/modules/richequiv/':           'reqv',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   333
                  'http://purl.org/rss/1.0/modules/search/':              'search',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   334
                  'http://purl.org/rss/1.0/modules/slash/':               'slash',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   335
                  'http://schemas.xmlsoap.org/soap/envelope/':            'soap',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   336
                  'http://purl.org/rss/1.0/modules/servicestatus/':       'ss',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   337
                  'http://hacks.benhammersley.com/rss/streaming/':        'str',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   338
                  'http://purl.org/rss/1.0/modules/subscription/':        'sub',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   339
                  'http://purl.org/rss/1.0/modules/syndication/':         'sy',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   340
                  'http://purl.org/rss/1.0/modules/taxonomy/':            'taxo',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   341
                  'http://purl.org/rss/1.0/modules/threading/':           'thr',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   342
                  'http://purl.org/rss/1.0/modules/textinput/':           'ti',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   343
                  'http://madskills.com/public/xml/rss/module/trackback/':'trackback',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   344
                  'http://wellformedweb.org/commentAPI/':                 'wfw',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   345
                  'http://purl.org/rss/1.0/modules/wiki/':                'wiki',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   346
                  'http://www.w3.org/1999/xhtml':                         'xhtml',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   347
                  'http://www.w3.org/XML/1998/namespace':                 'xml',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   348
                  'http://schemas.pocketsoap.com/rss/myDescModule/':      'szf'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   349
}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   350
    _matchnamespaces = {}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   351
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   352
    can_be_relative_uri = ['link', 'id', 'wfw_comment', 'wfw_commentrss', 'docs', 'url', 'href', 'comments', 'license', 'icon', 'logo']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   353
    can_contain_relative_uris = ['content', 'title', 'summary', 'info', 'tagline', 'subtitle', 'copyright', 'rights', 'description']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   354
    can_contain_dangerous_markup = ['content', 'title', 'summary', 'info', 'tagline', 'subtitle', 'copyright', 'rights', 'description']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   355
    html_types = ['text/html', 'application/xhtml+xml']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   356
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   357
    def __init__(self, baseuri=None, baselang=None, encoding='utf-8'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   358
        if _debug: sys.stderr.write('initializing FeedParser\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   359
        if not self._matchnamespaces:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   360
            for k, v in self.namespaces.items():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   361
                self._matchnamespaces[k.lower()] = v
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   362
        self.feeddata = FeedParserDict() # feed-level data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   363
        self.encoding = encoding # character encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   364
        self.entries = [] # list of entry-level data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   365
        self.version = '' # feed type/version, see SUPPORTED_VERSIONS
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   366
        self.namespacesInUse = {} # dictionary of namespaces defined by the feed
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   367
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   368
        # the following are used internally to track state;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   369
        # this is really out of control and should be refactored
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   370
        self.infeed = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   371
        self.inentry = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   372
        self.incontent = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   373
        self.intextinput = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   374
        self.inimage = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   375
        self.inauthor = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   376
        self.incontributor = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   377
        self.inpublisher = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   378
        self.insource = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   379
        self.sourcedata = FeedParserDict()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   380
        self.contentparams = FeedParserDict()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   381
        self._summaryKey = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   382
        self.namespacemap = {}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   383
        self.elementstack = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   384
        self.basestack = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   385
        self.langstack = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   386
        self.baseuri = baseuri or ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   387
        self.lang = baselang or None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   388
        if baselang:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   389
            self.feeddata['language'] = baselang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   390
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   391
    def unknown_starttag(self, tag, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   392
        if _debug: sys.stderr.write('start %s with %s\n' % (tag, attrs))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   393
        # normalize attrs
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   394
        attrs = [(k.lower(), v) for k, v in attrs]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   395
        attrs = [(k, k in ('rel', 'type') and v.lower() or v) for k, v in attrs]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   396
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   397
        # track xml:base and xml:lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   398
        attrsD = dict(attrs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   399
        baseuri = attrsD.get('xml:base', attrsD.get('base')) or self.baseuri
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   400
        self.baseuri = _urljoin(self.baseuri, baseuri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   401
        lang = attrsD.get('xml:lang', attrsD.get('lang'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   402
        if lang == '':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   403
            # xml:lang could be explicitly set to '', we need to capture that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   404
            lang = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   405
        elif lang is None:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   406
            # if no xml:lang is specified, use parent lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   407
            lang = self.lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   408
        if lang:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   409
            if tag in ('feed', 'rss', 'rdf:RDF'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   410
                self.feeddata['language'] = lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   411
        self.lang = lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   412
        self.basestack.append(self.baseuri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   413
        self.langstack.append(lang)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   414
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   415
        # track namespaces
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   416
        for prefix, uri in attrs:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   417
            if prefix.startswith('xmlns:'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   418
                self.trackNamespace(prefix[6:], uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   419
            elif prefix == 'xmlns':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   420
                self.trackNamespace(None, uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   421
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   422
        # track inline content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   423
        if self.incontent and self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   424
            # element declared itself as escaped markup, but it isn't really
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   425
            self.contentparams['type'] = 'application/xhtml+xml'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   426
        if self.incontent and self.contentparams.get('type') == 'application/xhtml+xml':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   427
            # Note: probably shouldn't simply recreate localname here, but
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   428
            # our namespace handling isn't actually 100% correct in cases where
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   429
            # the feed redefines the default namespace (which is actually
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   430
            # the usual case for inline content, thanks Sam), so here we
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   431
            # cheat and just reconstruct the element based on localname
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   432
            # because that compensates for the bugs in our namespace handling.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   433
            # This will horribly munge inline content with non-empty qnames,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   434
            # but nobody actually does that, so I'm not fixing it.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   435
            tag = tag.split(':')[-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   436
            return self.handle_data('<%s%s>' % (tag, ''.join([' %s="%s"' % t for t in attrs])), escape=0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   437
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   438
        # match namespaces
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   439
        if tag.find(':') <> -1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   440
            prefix, suffix = tag.split(':', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   441
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   442
            prefix, suffix = '', tag
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   443
        prefix = self.namespacemap.get(prefix, prefix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   444
        if prefix:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   445
            prefix = prefix + '_'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   446
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   447
        # special hack for better tracking of empty textinput/image elements in illformed feeds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   448
        if (not prefix) and tag not in ('title', 'link', 'description', 'name'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   449
            self.intextinput = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   450
        if (not prefix) and tag not in ('title', 'link', 'description', 'url', 'href', 'width', 'height'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   451
            self.inimage = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   452
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   453
        # call special handler (if defined) or default handler
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   454
        methodname = '_start_' + prefix + suffix
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   455
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   456
            method = getattr(self, methodname)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   457
            return method(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   458
        except AttributeError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   459
            return self.push(prefix + suffix, 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   460
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   461
    def unknown_endtag(self, tag):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   462
        if _debug: sys.stderr.write('end %s\n' % tag)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   463
        # match namespaces
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   464
        if tag.find(':') <> -1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   465
            prefix, suffix = tag.split(':', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   466
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   467
            prefix, suffix = '', tag
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   468
        prefix = self.namespacemap.get(prefix, prefix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   469
        if prefix:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   470
            prefix = prefix + '_'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   471
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   472
        # call special handler (if defined) or default handler
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   473
        methodname = '_end_' + prefix + suffix
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   474
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   475
            method = getattr(self, methodname)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   476
            method()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   477
        except AttributeError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   478
            self.pop(prefix + suffix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   479
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   480
        # track inline content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   481
        if self.incontent and self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   482
            # element declared itself as escaped markup, but it isn't really
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   483
            self.contentparams['type'] = 'application/xhtml+xml'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   484
        if self.incontent and self.contentparams.get('type') == 'application/xhtml+xml':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   485
            tag = tag.split(':')[-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   486
            self.handle_data('</%s>' % tag, escape=0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   487
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   488
        # track xml:base and xml:lang going out of scope
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   489
        if self.basestack:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   490
            self.basestack.pop()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   491
            if self.basestack and self.basestack[-1]:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   492
                self.baseuri = self.basestack[-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   493
        if self.langstack:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   494
            self.langstack.pop()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   495
            if self.langstack: # and (self.langstack[-1] is not None):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   496
                self.lang = self.langstack[-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   497
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   498
    def handle_charref(self, ref):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   499
        # called for each character reference, e.g. for '&#160;', ref will be '160'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   500
        if not self.elementstack: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   501
        ref = ref.lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   502
        if ref in ('34', '38', '39', '60', '62', 'x22', 'x26', 'x27', 'x3c', 'x3e'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   503
            text = '&#%s;' % ref
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   504
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   505
            if ref[0] == 'x':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   506
                c = int(ref[1:], 16)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   507
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   508
                c = int(ref)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   509
            text = unichr(c).encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   510
        self.elementstack[-1][2].append(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   511
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   512
    def handle_entityref(self, ref):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   513
        # called for each entity reference, e.g. for '&copy;', ref will be 'copy'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   514
        if not self.elementstack: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   515
        if _debug: sys.stderr.write('entering handle_entityref with %s\n' % ref)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   516
        if ref in ('lt', 'gt', 'quot', 'amp', 'apos'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   517
            text = '&%s;' % ref
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   518
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   519
            # entity resolution graciously donated by Aaron Swartz
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   520
            def name2cp(k):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   521
                import htmlentitydefs
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   522
                if hasattr(htmlentitydefs, 'name2codepoint'): # requires Python 2.3
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   523
                    return htmlentitydefs.name2codepoint[k]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   524
                k = htmlentitydefs.entitydefs[k]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   525
                if k.startswith('&#') and k.endswith(';'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   526
                    return int(k[2:-1]) # not in latin-1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   527
                return ord(k)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   528
            try: name2cp(ref)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   529
            except KeyError: text = '&%s;' % ref
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   530
            else: text = unichr(name2cp(ref)).encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   531
        self.elementstack[-1][2].append(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   532
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   533
    def handle_data(self, text, escape=1):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   534
        # called for each block of plain text, i.e. outside of any tag and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   535
        # not containing any character or entity references
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   536
        if not self.elementstack: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   537
        if escape and self.contentparams.get('type') == 'application/xhtml+xml':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   538
            text = _xmlescape(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   539
        self.elementstack[-1][2].append(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   540
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   541
    def handle_comment(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   542
        # called for each comment, e.g. <!-- insert message here -->
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   543
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   544
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   545
    def handle_pi(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   546
        # called for each processing instruction, e.g. <?instruction>
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   547
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   548
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   549
    def handle_decl(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   550
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   551
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   552
    def parse_declaration(self, i):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   553
        # override internal declaration handler to handle CDATA blocks
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   554
        if _debug: sys.stderr.write('entering parse_declaration\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   555
        if self.rawdata[i:i+9] == '<![CDATA[':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   556
            k = self.rawdata.find(']]>', i)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   557
            if k == -1: k = len(self.rawdata)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   558
            self.handle_data(_xmlescape(self.rawdata[i+9:k]), 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   559
            return k+3
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   560
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   561
            k = self.rawdata.find('>', i)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   562
            return k+1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   563
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   564
    def mapContentType(self, contentType):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   565
        contentType = contentType.lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   566
        if contentType == 'text':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   567
            contentType = 'text/plain'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   568
        elif contentType == 'html':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   569
            contentType = 'text/html'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   570
        elif contentType == 'xhtml':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   571
            contentType = 'application/xhtml+xml'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   572
        return contentType
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   573
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   574
    def trackNamespace(self, prefix, uri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   575
        loweruri = uri.lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   576
        if (prefix, loweruri) == (None, 'http://my.netscape.com/rdf/simple/0.9/') and not self.version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   577
            self.version = 'rss090'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   578
        if loweruri == 'http://purl.org/rss/1.0/' and not self.version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   579
            self.version = 'rss10'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   580
        if loweruri == 'http://www.w3.org/2005/atom' and not self.version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   581
            self.version = 'atom10'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   582
        if loweruri.find('backend.userland.com/rss') <> -1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   583
            # match any backend.userland.com namespace
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   584
            uri = 'http://backend.userland.com/rss'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   585
            loweruri = uri
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   586
        if self._matchnamespaces.has_key(loweruri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   587
            self.namespacemap[prefix] = self._matchnamespaces[loweruri]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   588
            self.namespacesInUse[self._matchnamespaces[loweruri]] = uri
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   589
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   590
            self.namespacesInUse[prefix or ''] = uri
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   591
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   592
    def resolveURI(self, uri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   593
        return _urljoin(self.baseuri or '', uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   594
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   595
    def decodeEntities(self, element, data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   596
        return data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   597
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   598
    def push(self, element, expectingText):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   599
        self.elementstack.append([element, expectingText, []])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   600
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   601
    def pop(self, element, stripWhitespace=1):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   602
        if not self.elementstack: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   603
        if self.elementstack[-1][0] != element: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   604
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   605
        element, expectingText, pieces = self.elementstack.pop()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   606
        output = ''.join(pieces)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   607
        if stripWhitespace:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   608
            output = output.strip()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   609
        if not expectingText: return output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   610
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   611
        # decode base64 content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   612
        if base64 and self.contentparams.get('base64', 0):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   613
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   614
                output = base64.decodestring(output)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   615
            except binascii.Error:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   616
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   617
            except binascii.Incomplete:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   618
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   619
                
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   620
        # resolve relative URIs
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   621
        if (element in self.can_be_relative_uri) and output:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   622
            output = self.resolveURI(output)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   623
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   624
        # decode entities within embedded markup
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   625
        if not self.contentparams.get('base64', 0):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   626
            output = self.decodeEntities(element, output)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   627
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   628
        # remove temporary cruft from contentparams
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   629
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   630
            del self.contentparams['mode']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   631
        except KeyError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   632
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   633
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   634
            del self.contentparams['base64']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   635
        except KeyError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   636
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   637
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   638
        # resolve relative URIs within embedded markup
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   639
        if self.mapContentType(self.contentparams.get('type', 'text/html')) in self.html_types:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   640
            if element in self.can_contain_relative_uris:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   641
                output = _resolveRelativeURIs(output, self.baseuri, self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   642
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   643
        # sanitize embedded markup
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   644
        if self.mapContentType(self.contentparams.get('type', 'text/html')) in self.html_types:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   645
            if element in self.can_contain_dangerous_markup:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   646
                output = _sanitizeHTML(output, self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   647
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   648
        if self.encoding and type(output) != type(u''):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   649
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   650
                output = unicode(output, self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   651
            except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   652
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   653
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   654
        # categories/tags/keywords/whatever are handled in _end_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   655
        if element == 'category':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   656
            return output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   657
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   658
        # store output in appropriate place(s)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   659
        if self.inentry and not self.insource:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   660
            if element == 'content':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   661
                self.entries[-1].setdefault(element, [])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   662
                contentparams = copy.deepcopy(self.contentparams)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   663
                contentparams['value'] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   664
                self.entries[-1][element].append(contentparams)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   665
            elif element == 'link':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   666
                self.entries[-1][element] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   667
                if output:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   668
                    self.entries[-1]['links'][-1]['href'] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   669
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   670
                if element == 'description':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   671
                    element = 'summary'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   672
                self.entries[-1][element] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   673
                if self.incontent:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   674
                    contentparams = copy.deepcopy(self.contentparams)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   675
                    contentparams['value'] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   676
                    self.entries[-1][element + '_detail'] = contentparams
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   677
        elif (self.infeed or self.insource) and (not self.intextinput) and (not self.inimage):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   678
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   679
            if element == 'description':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   680
                element = 'subtitle'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   681
            context[element] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   682
            if element == 'link':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   683
                context['links'][-1]['href'] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   684
            elif self.incontent:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   685
                contentparams = copy.deepcopy(self.contentparams)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   686
                contentparams['value'] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   687
                context[element + '_detail'] = contentparams
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   688
        return output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   689
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   690
    def pushContent(self, tag, attrsD, defaultContentType, expectingText):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   691
        self.incontent += 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   692
        self.contentparams = FeedParserDict({
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   693
            'type': self.mapContentType(attrsD.get('type', defaultContentType)),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   694
            'language': self.lang,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   695
            'base': self.baseuri})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   696
        self.contentparams['base64'] = self._isBase64(attrsD, self.contentparams)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   697
        self.push(tag, expectingText)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   698
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   699
    def popContent(self, tag):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   700
        value = self.pop(tag)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   701
        self.incontent -= 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   702
        self.contentparams.clear()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   703
        return value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   704
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   705
    def _mapToStandardPrefix(self, name):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   706
        colonpos = name.find(':')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   707
        if colonpos <> -1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   708
            prefix = name[:colonpos]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   709
            suffix = name[colonpos+1:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   710
            prefix = self.namespacemap.get(prefix, prefix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   711
            name = prefix + ':' + suffix
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   712
        return name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   713
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   714
    def _getAttribute(self, attrsD, name):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   715
        return attrsD.get(self._mapToStandardPrefix(name))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   716
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   717
    def _isBase64(self, attrsD, contentparams):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   718
        if attrsD.get('mode', '') == 'base64':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   719
            return 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   720
        if self.contentparams['type'].startswith('text/'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   721
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   722
        if self.contentparams['type'].endswith('+xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   723
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   724
        if self.contentparams['type'].endswith('/xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   725
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   726
        return 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   727
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   728
    def _itsAnHrefDamnIt(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   729
        href = attrsD.get('url', attrsD.get('uri', attrsD.get('href', None)))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   730
        if href:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   731
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   732
                del attrsD['url']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   733
            except KeyError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   734
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   735
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   736
                del attrsD['uri']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   737
            except KeyError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   738
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   739
            attrsD['href'] = href
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   740
        return attrsD
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   741
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   742
    def _save(self, key, value):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   743
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   744
        context.setdefault(key, value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   745
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   746
    def _start_rss(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   747
        versionmap = {'0.91': 'rss091u',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   748
                      '0.92': 'rss092',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   749
                      '0.93': 'rss093',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   750
                      '0.94': 'rss094'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   751
        if not self.version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   752
            attr_version = attrsD.get('version', '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   753
            version = versionmap.get(attr_version)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   754
            if version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   755
                self.version = version
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   756
            elif attr_version.startswith('2.'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   757
                self.version = 'rss20'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   758
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   759
                self.version = 'rss'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   760
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   761
    def _start_dlhottitles(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   762
        self.version = 'hotrss'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   763
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   764
    def _start_channel(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   765
        self.infeed = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   766
        self._cdf_common(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   767
    _start_feedinfo = _start_channel
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   768
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   769
    def _cdf_common(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   770
        if attrsD.has_key('lastmod'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   771
            self._start_modified({})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   772
            self.elementstack[-1][-1] = attrsD['lastmod']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   773
            self._end_modified()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   774
        if attrsD.has_key('href'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   775
            self._start_link({})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   776
            self.elementstack[-1][-1] = attrsD['href']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   777
            self._end_link()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   778
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   779
    def _start_feed(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   780
        self.infeed = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   781
        versionmap = {'0.1': 'atom01',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   782
                      '0.2': 'atom02',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   783
                      '0.3': 'atom03'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   784
        if not self.version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   785
            attr_version = attrsD.get('version')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   786
            version = versionmap.get(attr_version)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   787
            if version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   788
                self.version = version
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   789
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   790
                self.version = 'atom'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   791
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   792
    def _end_channel(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   793
        self.infeed = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   794
    _end_feed = _end_channel
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   795
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   796
    def _start_image(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   797
        self.inimage = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   798
        self.push('image', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   799
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   800
        context.setdefault('image', FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   801
            
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   802
    def _end_image(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   803
        self.pop('image')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   804
        self.inimage = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   805
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   806
    def _start_textinput(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   807
        self.intextinput = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   808
        self.push('textinput', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   809
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   810
        context.setdefault('textinput', FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   811
    _start_textInput = _start_textinput
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   812
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   813
    def _end_textinput(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   814
        self.pop('textinput')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   815
        self.intextinput = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   816
    _end_textInput = _end_textinput
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   817
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   818
    def _start_author(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   819
        self.inauthor = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   820
        self.push('author', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   821
    _start_managingeditor = _start_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   822
    _start_dc_author = _start_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   823
    _start_dc_creator = _start_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   824
    _start_itunes_author = _start_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   825
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   826
    def _end_author(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   827
        self.pop('author')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   828
        self.inauthor = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   829
        self._sync_author_detail()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   830
    _end_managingeditor = _end_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   831
    _end_dc_author = _end_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   832
    _end_dc_creator = _end_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   833
    _end_itunes_author = _end_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   834
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   835
    def _start_itunes_owner(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   836
        self.inpublisher = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   837
        self.push('publisher', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   838
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   839
    def _end_itunes_owner(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   840
        self.pop('publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   841
        self.inpublisher = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   842
        self._sync_author_detail('publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   843
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   844
    def _start_contributor(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   845
        self.incontributor = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   846
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   847
        context.setdefault('contributors', [])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   848
        context['contributors'].append(FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   849
        self.push('contributor', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   850
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   851
    def _end_contributor(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   852
        self.pop('contributor')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   853
        self.incontributor = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   854
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   855
    def _start_dc_contributor(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   856
        self.incontributor = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   857
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   858
        context.setdefault('contributors', [])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   859
        context['contributors'].append(FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   860
        self.push('name', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   861
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   862
    def _end_dc_contributor(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   863
        self._end_name()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   864
        self.incontributor = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   865
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   866
    def _start_name(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   867
        self.push('name', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   868
    _start_itunes_name = _start_name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   869
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   870
    def _end_name(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   871
        value = self.pop('name')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   872
        if self.inpublisher:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   873
            self._save_author('name', value, 'publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   874
        elif self.inauthor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   875
            self._save_author('name', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   876
        elif self.incontributor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   877
            self._save_contributor('name', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   878
        elif self.intextinput:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   879
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   880
            context['textinput']['name'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   881
    _end_itunes_name = _end_name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   882
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   883
    def _start_width(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   884
        self.push('width', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   885
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   886
    def _end_width(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   887
        value = self.pop('width')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   888
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   889
            value = int(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   890
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   891
            value = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   892
        if self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   893
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   894
            context['image']['width'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   895
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   896
    def _start_height(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   897
        self.push('height', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   898
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   899
    def _end_height(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   900
        value = self.pop('height')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   901
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   902
            value = int(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   903
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   904
            value = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   905
        if self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   906
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   907
            context['image']['height'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   908
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   909
    def _start_url(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   910
        self.push('href', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   911
    _start_homepage = _start_url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   912
    _start_uri = _start_url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   913
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   914
    def _end_url(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   915
        value = self.pop('href')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   916
        if self.inauthor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   917
            self._save_author('href', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   918
        elif self.incontributor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   919
            self._save_contributor('href', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   920
        elif self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   921
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   922
            context['image']['href'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   923
        elif self.intextinput:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   924
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   925
            context['textinput']['link'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   926
    _end_homepage = _end_url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   927
    _end_uri = _end_url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   928
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   929
    def _start_email(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   930
        self.push('email', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   931
    _start_itunes_email = _start_email
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   932
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   933
    def _end_email(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   934
        value = self.pop('email')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   935
        if self.inpublisher:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   936
            self._save_author('email', value, 'publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   937
        elif self.inauthor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   938
            self._save_author('email', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   939
        elif self.incontributor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   940
            self._save_contributor('email', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   941
    _end_itunes_email = _end_email
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   942
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   943
    def _getContext(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   944
        if self.insource:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   945
            context = self.sourcedata
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   946
        elif self.inentry:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   947
            context = self.entries[-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   948
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   949
            context = self.feeddata
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   950
        return context
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   951
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   952
    def _save_author(self, key, value, prefix='author'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   953
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   954
        context.setdefault(prefix + '_detail', FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   955
        context[prefix + '_detail'][key] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   956
        self._sync_author_detail()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   957
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   958
    def _save_contributor(self, key, value):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   959
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   960
        context.setdefault('contributors', [FeedParserDict()])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   961
        context['contributors'][-1][key] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   962
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   963
    def _sync_author_detail(self, key='author'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   964
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   965
        detail = context.get('%s_detail' % key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   966
        if detail:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   967
            name = detail.get('name')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   968
            email = detail.get('email')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   969
            if name and email:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   970
                context[key] = '%s (%s)' % (name, email)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   971
            elif name:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   972
                context[key] = name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   973
            elif email:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   974
                context[key] = email
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   975
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   976
            author = context.get(key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   977
            if not author: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   978
            emailmatch = re.search(r'''(([a-zA-Z0-9\_\-\.\+]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?))''', author)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   979
            if not emailmatch: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   980
            email = emailmatch.group(0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   981
            # probably a better way to do the following, but it passes all the tests
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   982
            author = author.replace(email, '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   983
            author = author.replace('()', '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   984
            author = author.strip()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   985
            if author and (author[0] == '('):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   986
                author = author[1:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   987
            if author and (author[-1] == ')'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   988
                author = author[:-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   989
            author = author.strip()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   990
            context.setdefault('%s_detail' % key, FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   991
            context['%s_detail' % key]['name'] = author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   992
            context['%s_detail' % key]['email'] = email
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   993
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   994
    def _start_subtitle(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   995
        self.pushContent('subtitle', attrsD, 'text/plain', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   996
    _start_tagline = _start_subtitle
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   997
    _start_itunes_subtitle = _start_subtitle
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   998
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   999
    def _end_subtitle(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1000
        self.popContent('subtitle')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1001
    _end_tagline = _end_subtitle
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1002
    _end_itunes_subtitle = _end_subtitle
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1003
            
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1004
    def _start_rights(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1005
        self.pushContent('rights', attrsD, 'text/plain', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1006
    _start_dc_rights = _start_rights
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1007
    _start_copyright = _start_rights
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1008
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1009
    def _end_rights(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1010
        self.popContent('rights')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1011
    _end_dc_rights = _end_rights
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1012
    _end_copyright = _end_rights
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1013
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1014
    def _start_item(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1015
        self.entries.append(FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1016
        self.push('item', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1017
        self.inentry = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1018
        self.guidislink = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1019
        id = self._getAttribute(attrsD, 'rdf:about')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1020
        if id:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1021
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1022
            context['id'] = id
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1023
        self._cdf_common(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1024
    _start_entry = _start_item
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1025
    _start_product = _start_item
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1026
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1027
    def _end_item(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1028
        self.pop('item')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1029
        self.inentry = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1030
    _end_entry = _end_item
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1031
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1032
    def _start_dc_language(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1033
        self.push('language', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1034
    _start_language = _start_dc_language
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1035
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1036
    def _end_dc_language(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1037
        self.lang = self.pop('language')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1038
    _end_language = _end_dc_language
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1039
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1040
    def _start_dc_publisher(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1041
        self.push('publisher', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1042
    _start_webmaster = _start_dc_publisher
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1043
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1044
    def _end_dc_publisher(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1045
        self.pop('publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1046
        self._sync_author_detail('publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1047
    _end_webmaster = _end_dc_publisher
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1048
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1049
    def _start_published(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1050
        self.push('published', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1051
    _start_dcterms_issued = _start_published
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1052
    _start_issued = _start_published
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1053
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1054
    def _end_published(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1055
        value = self.pop('published')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1056
        self._save('published_parsed', _parse_date(value))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1057
    _end_dcterms_issued = _end_published
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1058
    _end_issued = _end_published
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1059
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1060
    def _start_updated(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1061
        self.push('updated', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1062
    _start_modified = _start_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1063
    _start_dcterms_modified = _start_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1064
    _start_pubdate = _start_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1065
    _start_dc_date = _start_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1066
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1067
    def _end_updated(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1068
        value = self.pop('updated')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1069
        parsed_value = _parse_date(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1070
        self._save('updated_parsed', parsed_value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1071
    _end_modified = _end_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1072
    _end_dcterms_modified = _end_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1073
    _end_pubdate = _end_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1074
    _end_dc_date = _end_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1075
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1076
    def _start_created(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1077
        self.push('created', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1078
    _start_dcterms_created = _start_created
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1079
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1080
    def _end_created(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1081
        value = self.pop('created')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1082
        self._save('created_parsed', _parse_date(value))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1083
    _end_dcterms_created = _end_created
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1084
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1085
    def _start_expirationdate(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1086
        self.push('expired', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1087
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1088
    def _end_expirationdate(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1089
        self._save('expired_parsed', _parse_date(self.pop('expired')))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1090
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1091
    def _start_cc_license(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1092
        self.push('license', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1093
        value = self._getAttribute(attrsD, 'rdf:resource')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1094
        if value:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1095
            self.elementstack[-1][2].append(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1096
        self.pop('license')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1097
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1098
    def _start_creativecommons_license(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1099
        self.push('license', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1100
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1101
    def _end_creativecommons_license(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1102
        self.pop('license')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1103
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1104
    def _addTag(self, term, scheme, label):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1105
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1106
        tags = context.setdefault('tags', [])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1107
        if (not term) and (not scheme) and (not label): return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1108
        value = FeedParserDict({'term': term, 'scheme': scheme, 'label': label})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1109
        if value not in tags:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1110
            tags.append(FeedParserDict({'term': term, 'scheme': scheme, 'label': label}))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1111
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1112
    def _start_category(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1113
        if _debug: sys.stderr.write('entering _start_category with %s\n' % repr(attrsD))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1114
        term = attrsD.get('term')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1115
        scheme = attrsD.get('scheme', attrsD.get('domain'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1116
        label = attrsD.get('label')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1117
        self._addTag(term, scheme, label)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1118
        self.push('category', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1119
    _start_dc_subject = _start_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1120
    _start_keywords = _start_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1121
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1122
    def _end_itunes_keywords(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1123
        for term in self.pop('itunes_keywords').split():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1124
            self._addTag(term, 'http://www.itunes.com/', None)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1125
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1126
    def _start_itunes_category(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1127
        self._addTag(attrsD.get('text'), 'http://www.itunes.com/', None)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1128
        self.push('category', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1129
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1130
    def _end_category(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1131
        value = self.pop('category')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1132
        if not value: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1133
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1134
        tags = context['tags']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1135
        if value and len(tags) and not tags[-1]['term']:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1136
            tags[-1]['term'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1137
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1138
            self._addTag(value, None, None)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1139
    _end_dc_subject = _end_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1140
    _end_keywords = _end_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1141
    _end_itunes_category = _end_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1142
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1143
    def _start_cloud(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1144
        self._getContext()['cloud'] = FeedParserDict(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1145
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1146
    def _start_link(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1147
        attrsD.setdefault('rel', 'alternate')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1148
        attrsD.setdefault('type', 'text/html')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1149
        attrsD = self._itsAnHrefDamnIt(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1150
        if attrsD.has_key('href'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1151
            attrsD['href'] = self.resolveURI(attrsD['href'])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1152
        expectingText = self.infeed or self.inentry or self.insource
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1153
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1154
        context.setdefault('links', [])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1155
        context['links'].append(FeedParserDict(attrsD))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1156
        if attrsD['rel'] == 'enclosure':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1157
            self._start_enclosure(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1158
        if attrsD.has_key('href'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1159
            expectingText = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1160
            if (attrsD.get('rel') == 'alternate') and (self.mapContentType(attrsD.get('type')) in self.html_types):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1161
                context['link'] = attrsD['href']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1162
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1163
            self.push('link', expectingText)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1164
    _start_producturl = _start_link
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1165
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1166
    def _end_link(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1167
        value = self.pop('link')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1168
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1169
        if self.intextinput:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1170
            context['textinput']['link'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1171
        if self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1172
            context['image']['link'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1173
    _end_producturl = _end_link
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1174
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1175
    def _start_guid(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1176
        self.guidislink = (attrsD.get('ispermalink', 'true') == 'true')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1177
        self.push('id', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1178
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1179
    def _end_guid(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1180
        value = self.pop('id')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1181
        self._save('guidislink', self.guidislink and not self._getContext().has_key('link'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1182
        if self.guidislink:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1183
            # guid acts as link, but only if 'ispermalink' is not present or is 'true',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1184
            # and only if the item doesn't already have a link element
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1185
            self._save('link', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1186
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1187
    def _start_title(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1188
        self.pushContent('title', attrsD, 'text/plain', self.infeed or self.inentry or self.insource)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1189
    _start_dc_title = _start_title
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1190
    _start_media_title = _start_title
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1191
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1192
    def _end_title(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1193
        value = self.popContent('title')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1194
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1195
        if self.intextinput:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1196
            context['textinput']['title'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1197
        elif self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1198
            context['image']['title'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1199
    _end_dc_title = _end_title
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1200
    _end_media_title = _end_title
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1201
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1202
    def _start_description(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1203
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1204
        if context.has_key('summary'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1205
            self._summaryKey = 'content'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1206
            self._start_content(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1207
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1208
            self.pushContent('description', attrsD, 'text/html', self.infeed or self.inentry or self.insource)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1209
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1210
    def _start_abstract(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1211
        self.pushContent('description', attrsD, 'text/plain', self.infeed or self.inentry or self.insource)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1212
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1213
    def _end_description(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1214
        if self._summaryKey == 'content':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1215
            self._end_content()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1216
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1217
            value = self.popContent('description')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1218
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1219
            if self.intextinput:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1220
                context['textinput']['description'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1221
            elif self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1222
                context['image']['description'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1223
        self._summaryKey = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1224
    _end_abstract = _end_description
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1225
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1226
    def _start_info(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1227
        self.pushContent('info', attrsD, 'text/plain', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1228
    _start_feedburner_browserfriendly = _start_info
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1229
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1230
    def _end_info(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1231
        self.popContent('info')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1232
    _end_feedburner_browserfriendly = _end_info
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1233
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1234
    def _start_generator(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1235
        if attrsD:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1236
            attrsD = self._itsAnHrefDamnIt(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1237
            if attrsD.has_key('href'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1238
                attrsD['href'] = self.resolveURI(attrsD['href'])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1239
        self._getContext()['generator_detail'] = FeedParserDict(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1240
        self.push('generator', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1241
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1242
    def _end_generator(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1243
        value = self.pop('generator')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1244
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1245
        if context.has_key('generator_detail'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1246
            context['generator_detail']['name'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1247
            
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1248
    def _start_admin_generatoragent(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1249
        self.push('generator', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1250
        value = self._getAttribute(attrsD, 'rdf:resource')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1251
        if value:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1252
            self.elementstack[-1][2].append(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1253
        self.pop('generator')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1254
        self._getContext()['generator_detail'] = FeedParserDict({'href': value})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1255
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1256
    def _start_admin_errorreportsto(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1257
        self.push('errorreportsto', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1258
        value = self._getAttribute(attrsD, 'rdf:resource')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1259
        if value:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1260
            self.elementstack[-1][2].append(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1261
        self.pop('errorreportsto')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1262
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1263
    def _start_summary(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1264
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1265
        if context.has_key('summary'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1266
            self._summaryKey = 'content'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1267
            self._start_content(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1268
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1269
            self._summaryKey = 'summary'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1270
            self.pushContent(self._summaryKey, attrsD, 'text/plain', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1271
    _start_itunes_summary = _start_summary
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1272
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1273
    def _end_summary(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1274
        if self._summaryKey == 'content':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1275
            self._end_content()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1276
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1277
            self.popContent(self._summaryKey or 'summary')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1278
        self._summaryKey = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1279
    _end_itunes_summary = _end_summary
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1280
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1281
    def _start_enclosure(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1282
        attrsD = self._itsAnHrefDamnIt(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1283
        self._getContext().setdefault('enclosures', []).append(FeedParserDict(attrsD))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1284
        href = attrsD.get('href')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1285
        if href:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1286
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1287
            if not context.get('id'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1288
                context['id'] = href
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1289
            
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1290
    def _start_source(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1291
        self.insource = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1292
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1293
    def _end_source(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1294
        self.insource = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1295
        self._getContext()['source'] = copy.deepcopy(self.sourcedata)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1296
        self.sourcedata.clear()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1297
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1298
    def _start_content(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1299
        self.pushContent('content', attrsD, 'text/plain', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1300
        src = attrsD.get('src')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1301
        if src:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1302
            self.contentparams['src'] = src
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1303
        self.push('content', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1304
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1305
    def _start_prodlink(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1306
        self.pushContent('content', attrsD, 'text/html', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1307
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1308
    def _start_body(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1309
        self.pushContent('content', attrsD, 'application/xhtml+xml', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1310
    _start_xhtml_body = _start_body
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1311
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1312
    def _start_content_encoded(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1313
        self.pushContent('content', attrsD, 'text/html', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1314
    _start_fullitem = _start_content_encoded
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1315
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1316
    def _end_content(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1317
        copyToDescription = self.mapContentType(self.contentparams.get('type')) in (['text/plain'] + self.html_types)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1318
        value = self.popContent('content')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1319
        if copyToDescription:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1320
            self._save('description', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1321
    _end_body = _end_content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1322
    _end_xhtml_body = _end_content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1323
    _end_content_encoded = _end_content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1324
    _end_fullitem = _end_content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1325
    _end_prodlink = _end_content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1326
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1327
    def _start_itunes_image(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1328
        self.push('itunes_image', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1329
        self._getContext()['image'] = FeedParserDict({'href': attrsD.get('href')})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1330
    _start_itunes_link = _start_itunes_image
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1331
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1332
    def _end_itunes_block(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1333
        value = self.pop('itunes_block', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1334
        self._getContext()['itunes_block'] = (value == 'yes') and 1 or 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1335
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1336
    def _end_itunes_explicit(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1337
        value = self.pop('itunes_explicit', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1338
        self._getContext()['itunes_explicit'] = (value == 'yes') and 1 or 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1339
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1340
if _XML_AVAILABLE:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1341
    class _StrictFeedParser(_FeedParserMixin, xml.sax.handler.ContentHandler):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1342
        def __init__(self, baseuri, baselang, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1343
            if _debug: sys.stderr.write('trying StrictFeedParser\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1344
            xml.sax.handler.ContentHandler.__init__(self)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1345
            _FeedParserMixin.__init__(self, baseuri, baselang, encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1346
            self.bozo = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1347
            self.exc = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1348
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1349
        def startPrefixMapping(self, prefix, uri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1350
            self.trackNamespace(prefix, uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1351
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1352
        def startElementNS(self, name, qname, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1353
            namespace, localname = name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1354
            lowernamespace = str(namespace or '').lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1355
            if lowernamespace.find('backend.userland.com/rss') <> -1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1356
                # match any backend.userland.com namespace
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1357
                namespace = 'http://backend.userland.com/rss'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1358
                lowernamespace = namespace
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1359
            if qname and qname.find(':') > 0:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1360
                givenprefix = qname.split(':')[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1361
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1362
                givenprefix = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1363
            prefix = self._matchnamespaces.get(lowernamespace, givenprefix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1364
            if givenprefix and (prefix == None or (prefix == '' and lowernamespace == '')) and not self.namespacesInUse.has_key(givenprefix):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1365
                    raise UndeclaredNamespace, "'%s' is not associated with a namespace" % givenprefix
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1366
            if prefix:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1367
                localname = prefix + ':' + localname
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1368
            localname = str(localname).lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1369
            if _debug: sys.stderr.write('startElementNS: qname = %s, namespace = %s, givenprefix = %s, prefix = %s, attrs = %s, localname = %s\n' % (qname, namespace, givenprefix, prefix, attrs.items(), localname))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1370
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1371
            # qname implementation is horribly broken in Python 2.1 (it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1372
            # doesn't report any), and slightly broken in Python 2.2 (it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1373
            # doesn't report the xml: namespace). So we match up namespaces
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1374
            # with a known list first, and then possibly override them with
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1375
            # the qnames the SAX parser gives us (if indeed it gives us any
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1376
            # at all).  Thanks to MatejC for helping me test this and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1377
            # tirelessly telling me that it didn't work yet.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1378
            attrsD = {}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1379
            for (namespace, attrlocalname), attrvalue in attrs._attrs.items():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1380
                lowernamespace = (namespace or '').lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1381
                prefix = self._matchnamespaces.get(lowernamespace, '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1382
                if prefix:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1383
                    attrlocalname = prefix + ':' + attrlocalname
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1384
                attrsD[str(attrlocalname).lower()] = attrvalue
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1385
            for qname in attrs.getQNames():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1386
                attrsD[str(qname).lower()] = attrs.getValueByQName(qname)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1387
            self.unknown_starttag(localname, attrsD.items())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1388
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1389
        def characters(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1390
            self.handle_data(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1391
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1392
        def endElementNS(self, name, qname):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1393
            namespace, localname = name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1394
            lowernamespace = str(namespace or '').lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1395
            if qname and qname.find(':') > 0:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1396
                givenprefix = qname.split(':')[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1397
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1398
                givenprefix = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1399
            prefix = self._matchnamespaces.get(lowernamespace, givenprefix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1400
            if prefix:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1401
                localname = prefix + ':' + localname
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1402
            localname = str(localname).lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1403
            self.unknown_endtag(localname)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1404
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1405
        def error(self, exc):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1406
            self.bozo = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1407
            self.exc = exc
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1408
            
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1409
        def fatalError(self, exc):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1410
            self.error(exc)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1411
            raise exc
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1412
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1413
class _BaseHTMLProcessor(sgmllib.SGMLParser):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1414
    elements_no_end_tag = ['area', 'base', 'basefont', 'br', 'col', 'frame', 'hr',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1415
      'img', 'input', 'isindex', 'link', 'meta', 'param']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1416
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1417
    def __init__(self, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1418
        self.encoding = encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1419
        if _debug: sys.stderr.write('entering BaseHTMLProcessor, encoding=%s\n' % self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1420
        sgmllib.SGMLParser.__init__(self)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1421
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1422
    def reset(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1423
        self.pieces = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1424
        sgmllib.SGMLParser.reset(self)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1425
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1426
    def _shorttag_replace(self, match):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1427
        tag = match.group(1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1428
        if tag in self.elements_no_end_tag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1429
            return '<' + tag + ' />'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1430
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1431
            return '<' + tag + '></' + tag + '>'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1432
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1433
    def feed(self, data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1434
        data = re.compile(r'<!((?!DOCTYPE|--|\[))', re.IGNORECASE).sub(r'&lt;!\1', data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1435
        #data = re.sub(r'<(\S+?)\s*?/>', self._shorttag_replace, data) # bug [ 1399464 ] Bad regexp for _shorttag_replace
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1436
        data = re.sub(r'<([^<\s]+?)\s*/>', self._shorttag_replace, data) 
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1437
        data = data.replace('&#39;', "'")
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1438
        data = data.replace('&#34;', '"')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1439
        if self.encoding and type(data) == type(u''):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1440
            data = data.encode(self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1441
        sgmllib.SGMLParser.feed(self, data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1442
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1443
    def normalize_attrs(self, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1444
        # utility method to be called by descendants
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1445
        attrs = [(k.lower(), v) for k, v in attrs]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1446
        attrs = [(k, k in ('rel', 'type') and v.lower() or v) for k, v in attrs]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1447
        return attrs
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1448
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1449
    def unknown_starttag(self, tag, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1450
        # called for each start tag
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1451
        # attrs is a list of (attr, value) tuples
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1452
        # e.g. for <pre class='screen'>, tag='pre', attrs=[('class', 'screen')]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1453
        if _debug: sys.stderr.write('_BaseHTMLProcessor, unknown_starttag, tag=%s\n' % tag)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1454
        uattrs = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1455
        # thanks to Kevin Marks for this breathtaking hack to deal with (valid) high-bit attribute values in UTF-8 feeds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1456
        for key, value in attrs:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1457
            if type(value) != type(u''):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1458
                value = unicode(value, self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1459
            uattrs.append((unicode(key, self.encoding), value))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1460
        strattrs = u''.join([u' %s="%s"' % (key, value) for key, value in uattrs]).encode(self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1461
        if tag in self.elements_no_end_tag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1462
            self.pieces.append('<%(tag)s%(strattrs)s />' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1463
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1464
            self.pieces.append('<%(tag)s%(strattrs)s>' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1465
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1466
    def unknown_endtag(self, tag):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1467
        # called for each end tag, e.g. for </pre>, tag will be 'pre'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1468
        # Reconstruct the original end tag.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1469
        if tag not in self.elements_no_end_tag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1470
            self.pieces.append("</%(tag)s>" % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1471
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1472
    def handle_charref(self, ref):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1473
        # called for each character reference, e.g. for '&#160;', ref will be '160'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1474
        # Reconstruct the original character reference.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1475
        self.pieces.append('&#%(ref)s;' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1476
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1477
    def handle_entityref(self, ref):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1478
        # called for each entity reference, e.g. for '&copy;', ref will be 'copy'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1479
        # Reconstruct the original entity reference.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1480
        self.pieces.append('&%(ref)s;' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1481
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1482
    def handle_data(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1483
        # called for each block of plain text, i.e. outside of any tag and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1484
        # not containing any character or entity references
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1485
        # Store the original text verbatim.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1486
        if _debug: sys.stderr.write('_BaseHTMLProcessor, handle_text, text=%s\n' % text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1487
        self.pieces.append(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1488
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1489
    def handle_comment(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1490
        # called for each HTML comment, e.g. <!-- insert Javascript code here -->
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1491
        # Reconstruct the original comment.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1492
        self.pieces.append('<!--%(text)s-->' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1493
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1494
    def handle_pi(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1495
        # called for each processing instruction, e.g. <?instruction>
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1496
        # Reconstruct original processing instruction.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1497
        self.pieces.append('<?%(text)s>' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1498
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1499
    def handle_decl(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1500
        # called for the DOCTYPE, if present, e.g.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1501
        # <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1502
        #     "http://www.w3.org/TR/html4/loose.dtd">
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1503
        # Reconstruct original DOCTYPE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1504
        self.pieces.append('<!%(text)s>' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1505
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1506
    _new_declname_match = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9:]*\s*').match
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1507
    def _scan_name(self, i, declstartpos):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1508
        rawdata = self.rawdata
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1509
        n = len(rawdata)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1510
        if i == n:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1511
            return None, -1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1512
        m = self._new_declname_match(rawdata, i)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1513
        if m:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1514
            s = m.group()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1515
            name = s.strip()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1516
            if (i + len(s)) == n:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1517
                return None, -1  # end of buffer
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1518
            return name.lower(), m.end()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1519
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1520
            self.handle_data(rawdata)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1521
#            self.updatepos(declstartpos, i)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1522
            return None, -1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1523
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1524
    def output(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1525
        '''Return processed HTML as a single string'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1526
        return ''.join([str(p) for p in self.pieces])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1527
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1528
class _LooseFeedParser(_FeedParserMixin, _BaseHTMLProcessor):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1529
    def __init__(self, baseuri, baselang, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1530
        sgmllib.SGMLParser.__init__(self)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1531
        _FeedParserMixin.__init__(self, baseuri, baselang, encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1532
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1533
    def decodeEntities(self, element, data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1534
        data = data.replace('&#60;', '&lt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1535
        data = data.replace('&#x3c;', '&lt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1536
        data = data.replace('&#62;', '&gt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1537
        data = data.replace('&#x3e;', '&gt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1538
        data = data.replace('&#38;', '&amp;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1539
        data = data.replace('&#x26;', '&amp;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1540
        data = data.replace('&#34;', '&quot;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1541
        data = data.replace('&#x22;', '&quot;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1542
        data = data.replace('&#39;', '&apos;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1543
        data = data.replace('&#x27;', '&apos;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1544
        if self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1545
            data = data.replace('&lt;', '<')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1546
            data = data.replace('&gt;', '>')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1547
            data = data.replace('&amp;', '&')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1548
            data = data.replace('&quot;', '"')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1549
            data = data.replace('&apos;', "'")
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1550
        return data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1551
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1552
class _RelativeURIResolver(_BaseHTMLProcessor):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1553
    relative_uris = [('a', 'href'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1554
                     ('applet', 'codebase'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1555
                     ('area', 'href'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1556
                     ('blockquote', 'cite'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1557
                     ('body', 'background'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1558
                     ('del', 'cite'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1559
                     ('form', 'action'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1560
                     ('frame', 'longdesc'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1561
                     ('frame', 'src'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1562
                     ('iframe', 'longdesc'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1563
                     ('iframe', 'src'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1564
                     ('head', 'profile'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1565
                     ('img', 'longdesc'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1566
                     ('img', 'src'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1567
                     ('img', 'usemap'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1568
                     ('input', 'src'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1569
                     ('input', 'usemap'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1570
                     ('ins', 'cite'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1571
                     ('link', 'href'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1572
                     ('object', 'classid'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1573
                     ('object', 'codebase'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1574
                     ('object', 'data'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1575
                     ('object', 'usemap'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1576
                     ('q', 'cite'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1577
                     ('script', 'src')]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1578
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1579
    def __init__(self, baseuri, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1580
        _BaseHTMLProcessor.__init__(self, encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1581
        self.baseuri = baseuri
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1582
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1583
    def resolveURI(self, uri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1584
        return _urljoin(self.baseuri, uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1585
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1586
    def unknown_starttag(self, tag, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1587
        attrs = self.normalize_attrs(attrs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1588
        attrs = [(key, ((tag, key) in self.relative_uris) and self.resolveURI(value) or value) for key, value in attrs]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1589
        _BaseHTMLProcessor.unknown_starttag(self, tag, attrs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1590
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1591
def _resolveRelativeURIs(htmlSource, baseURI, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1592
    if _debug: sys.stderr.write('entering _resolveRelativeURIs\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1593
    p = _RelativeURIResolver(baseURI, encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1594
    p.feed(htmlSource)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1595
    return p.output()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1596
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1597
class _HTMLSanitizer(_BaseHTMLProcessor):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1598
    acceptable_elements = ['a', 'abbr', 'acronym', 'address', 'area', 'b', 'big',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1599
      'blockquote', 'br', 'button', 'caption', 'center', 'cite', 'code', 'col',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1600
      'colgroup', 'dd', 'del', 'dfn', 'dir', 'div', 'dl', 'dt', 'em', 'fieldset',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1601
      'font', 'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'input',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1602
      'ins', 'kbd', 'label', 'legend', 'li', 'map', 'menu', 'ol', 'optgroup',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1603
      'option', 'p', 'pre', 'q', 's', 'samp', 'select', 'small', 'span', 'strike',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1604
      'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'textarea', 'tfoot', 'th',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1605
      'thead', 'tr', 'tt', 'u', 'ul', 'var']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1606
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1607
    acceptable_attributes = ['abbr', 'accept', 'accept-charset', 'accesskey',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1608
      'action', 'align', 'alt', 'axis', 'border', 'cellpadding', 'cellspacing',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1609
      'char', 'charoff', 'charset', 'checked', 'cite', 'class', 'clear', 'cols',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1610
      'colspan', 'color', 'compact', 'coords', 'datetime', 'dir', 'disabled',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1611
      'enctype', 'for', 'frame', 'headers', 'height', 'href', 'hreflang', 'hspace',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1612
      'id', 'ismap', 'label', 'lang', 'longdesc', 'maxlength', 'media', 'method',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1613
      'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1614
      'rel', 'rev', 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape', 'size',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1615
      'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title', 'type',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1616
      'usemap', 'valign', 'value', 'vspace', 'width']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1617
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1618
    unacceptable_elements_with_end_tag = ['script', 'applet']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1619
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1620
    def reset(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1621
        _BaseHTMLProcessor.reset(self)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1622
        self.unacceptablestack = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1623
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1624
    def unknown_starttag(self, tag, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1625
        if not tag in self.acceptable_elements:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1626
            if tag in self.unacceptable_elements_with_end_tag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1627
                self.unacceptablestack += 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1628
            return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1629
        attrs = self.normalize_attrs(attrs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1630
        attrs = [(key, value) for key, value in attrs if key in self.acceptable_attributes]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1631
        _BaseHTMLProcessor.unknown_starttag(self, tag, attrs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1632
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1633
    def unknown_endtag(self, tag):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1634
        if not tag in self.acceptable_elements:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1635
            if tag in self.unacceptable_elements_with_end_tag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1636
                self.unacceptablestack -= 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1637
            return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1638
        _BaseHTMLProcessor.unknown_endtag(self, tag)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1639
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1640
    def handle_pi(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1641
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1642
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1643
    def handle_decl(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1644
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1645
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1646
    def handle_data(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1647
        if not self.unacceptablestack:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1648
            _BaseHTMLProcessor.handle_data(self, text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1649
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1650
def _sanitizeHTML(htmlSource, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1651
    p = _HTMLSanitizer(encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1652
    p.feed(htmlSource)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1653
    data = p.output()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1654
    if TIDY_MARKUP:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1655
        # loop through list of preferred Tidy interfaces looking for one that's installed,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1656
        # then set up a common _tidy function to wrap the interface-specific API.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1657
        _tidy = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1658
        for tidy_interface in PREFERRED_TIDY_INTERFACES:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1659
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1660
                if tidy_interface == "uTidy":
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1661
                    from tidy import parseString as _utidy
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1662
                    def _tidy(data, **kwargs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1663
                        return str(_utidy(data, **kwargs))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1664
                    break
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1665
                elif tidy_interface == "mxTidy":
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1666
                    from mx.Tidy import Tidy as _mxtidy
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1667
                    def _tidy(data, **kwargs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1668
                        nerrors, nwarnings, data, errordata = _mxtidy.tidy(data, **kwargs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1669
                        return data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1670
                    break
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1671
            except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1672
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1673
        if _tidy:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1674
            utf8 = type(data) == type(u'')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1675
            if utf8:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1676
                data = data.encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1677
            data = _tidy(data, output_xhtml=1, numeric_entities=1, wrap=0, char_encoding="utf8")
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1678
            if utf8:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1679
                data = unicode(data, 'utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1680
            if data.count('<body'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1681
                data = data.split('<body', 1)[1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1682
                if data.count('>'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1683
                    data = data.split('>', 1)[1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1684
            if data.count('</body'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1685
                data = data.split('</body', 1)[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1686
    data = data.strip().replace('\r\n', '\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1687
    return data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1688
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1689
class _FeedURLHandler(urllib2.HTTPDigestAuthHandler, urllib2.HTTPRedirectHandler, urllib2.HTTPDefaultErrorHandler):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1690
    def http_error_default(self, req, fp, code, msg, headers):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1691
        if ((code / 100) == 3) and (code != 304):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1692
            return self.http_error_302(req, fp, code, msg, headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1693
        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1694
        infourl.status = code
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1695
        return infourl
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1696
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1697
    def http_error_302(self, req, fp, code, msg, headers):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1698
        if headers.dict.has_key('location'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1699
            infourl = urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1700
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1701
            infourl = urllib.addinfourl(fp, headers, req.get_full_url())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1702
        if not hasattr(infourl, 'status'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1703
            infourl.status = code
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1704
        return infourl
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1705
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1706
    def http_error_301(self, req, fp, code, msg, headers):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1707
        if headers.dict.has_key('location'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1708
            infourl = urllib2.HTTPRedirectHandler.http_error_301(self, req, fp, code, msg, headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1709
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1710
            infourl = urllib.addinfourl(fp, headers, req.get_full_url())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1711
        if not hasattr(infourl, 'status'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1712
            infourl.status = code
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1713
        return infourl
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1714
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1715
    http_error_300 = http_error_302
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1716
    http_error_303 = http_error_302
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1717
    http_error_307 = http_error_302
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1718
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1719
    def http_error_401(self, req, fp, code, msg, headers):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1720
        # Check if
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1721
        # - server requires digest auth, AND
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1722
        # - we tried (unsuccessfully) with basic auth, AND
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1723
        # - we're using Python 2.3.3 or later (digest auth is irreparably broken in earlier versions)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1724
        # If all conditions hold, parse authentication information
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1725
        # out of the Authorization header we sent the first time
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1726
        # (for the username and password) and the WWW-Authenticate
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1727
        # header the server sent back (for the realm) and retry
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1728
        # the request with the appropriate digest auth headers instead.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1729
        # This evil genius hack has been brought to you by Aaron Swartz.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1730
        host = urlparse.urlparse(req.get_full_url())[1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1731
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1732
            assert sys.version.split()[0] >= '2.3.3'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1733
            assert base64 != None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1734
            user, passw = base64.decodestring(req.headers['Authorization'].split(' ')[1]).split(':')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1735
            realm = re.findall('realm="([^"]*)"', headers['WWW-Authenticate'])[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1736
            self.add_password(realm, host, user, passw)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1737
            retry = self.http_error_auth_reqed('www-authenticate', host, req, headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1738
            self.reset_retry_count()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1739
            return retry
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1740
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1741
            return self.http_error_default(req, fp, code, msg, headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1742
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1743
def _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1744
    """URL, filename, or string --> stream
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1745
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1746
    This function lets you define parsers that take any input source
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1747
    (URL, pathname to local or network file, or actual data as a string)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1748
    and deal with it in a uniform manner.  Returned object is guaranteed
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1749
    to have all the basic stdio read methods (read, readline, readlines).
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1750
    Just .close() the object when you're done with it.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1751
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1752
    If the etag argument is supplied, it will be used as the value of an
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1753
    If-None-Match request header.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1754
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1755
    If the modified argument is supplied, it must be a tuple of 9 integers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1756
    as returned by gmtime() in the standard Python time module. This MUST
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1757
    be in GMT (Greenwich Mean Time). The formatted date/time will be used
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1758
    as the value of an If-Modified-Since request header.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1759
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1760
    If the agent argument is supplied, it will be used as the value of a
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1761
    User-Agent request header.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1762
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1763
    If the referrer argument is supplied, it will be used as the value of a
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1764
    Referer[sic] request header.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1765
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1766
    If handlers is supplied, it is a list of handlers used to build a
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1767
    urllib2 opener.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1768
    """
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1769
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1770
    if hasattr(url_file_stream_or_string, 'read'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1771
        return url_file_stream_or_string
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1772
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1773
    if url_file_stream_or_string == '-':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1774
        return sys.stdin
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1775
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1776
    if urlparse.urlparse(url_file_stream_or_string)[0] in ('http', 'https', 'ftp'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1777
        if not agent:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1778
            agent = USER_AGENT
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1779
        # test for inline user:password for basic auth
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1780
        auth = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1781
        if base64:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1782
            urltype, rest = urllib.splittype(url_file_stream_or_string)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1783
            realhost, rest = urllib.splithost(rest)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1784
            if realhost:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1785
                user_passwd, realhost = urllib.splituser(realhost)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1786
                if user_passwd:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1787
                    url_file_stream_or_string = '%s://%s%s' % (urltype, realhost, rest)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1788
                    auth = base64.encodestring(user_passwd).strip()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1789
        # try to open with urllib2 (to use optional headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1790
        request = urllib2.Request(url_file_stream_or_string)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1791
        request.add_header('User-Agent', agent)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1792
        if etag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1793
            request.add_header('If-None-Match', etag)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1794
        if modified:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1795
            # format into an RFC 1123-compliant timestamp. We can't use
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1796
            # time.strftime() since the %a and %b directives can be affected
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1797
            # by the current locale, but RFC 2616 states that dates must be
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1798
            # in English.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1799
            short_weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1800
            months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1801
            request.add_header('If-Modified-Since', '%s, %02d %s %04d %02d:%02d:%02d GMT' % (short_weekdays[modified[6]], modified[2], months[modified[1] - 1], modified[0], modified[3], modified[4], modified[5]))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1802
        if referrer:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1803
            request.add_header('Referer', referrer)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1804
        if gzip and zlib:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1805
            request.add_header('Accept-encoding', 'gzip, deflate')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1806
        elif gzip:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1807
            request.add_header('Accept-encoding', 'gzip')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1808
        elif zlib:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1809
            request.add_header('Accept-encoding', 'deflate')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1810
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1811
            request.add_header('Accept-encoding', '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1812
        if auth:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1813
            request.add_header('Authorization', 'Basic %s' % auth)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1814
        if ACCEPT_HEADER:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1815
            request.add_header('Accept', ACCEPT_HEADER)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1816
        request.add_header('A-IM', 'feed') # RFC 3229 support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1817
        opener = apply(urllib2.build_opener, tuple([_FeedURLHandler()] + handlers))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1818
        opener.addheaders = [] # RMK - must clear so we only send our custom User-Agent
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1819
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1820
            return opener.open(request)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1821
        finally:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1822
            opener.close() # JohnD
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1823
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1824
    # try to open with native open function (if url_file_stream_or_string is a filename)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1825
    try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1826
        return open(url_file_stream_or_string)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1827
    except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1828
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1829
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1830
    # treat url_file_stream_or_string as string
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1831
    return _StringIO(str(url_file_stream_or_string))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1832
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1833
_date_handlers = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1834
def registerDateHandler(func):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1835
    '''Register a date handler function (takes string, returns 9-tuple date in GMT)'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1836
    _date_handlers.insert(0, func)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1837
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1838
# ISO-8601 date parsing routines written by Fazal Majid.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1839
# The ISO 8601 standard is very convoluted and irregular - a full ISO 8601
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1840
# parser is beyond the scope of feedparser and would be a worthwhile addition
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1841
# to the Python library.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1842
# A single regular expression cannot parse ISO 8601 date formats into groups
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1843
# as the standard is highly irregular (for instance is 030104 2003-01-04 or
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1844
# 0301-04-01), so we use templates instead.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1845
# Please note the order in templates is significant because we need a
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1846
# greedy match.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1847
_iso8601_tmpl = ['YYYY-?MM-?DD', 'YYYY-MM', 'YYYY-?OOO',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1848
                'YY-?MM-?DD', 'YY-?OOO', 'YYYY', 
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1849
                '-YY-?MM', '-OOO', '-YY',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1850
                '--MM-?DD', '--MM',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1851
                '---DD',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1852
                'CC', '']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1853
_iso8601_re = [
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1854
    tmpl.replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1855
    'YYYY', r'(?P<year>\d{4})').replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1856
    'YY', r'(?P<year>\d\d)').replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1857
    'MM', r'(?P<month>[01]\d)').replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1858
    'DD', r'(?P<day>[0123]\d)').replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1859
    'OOO', r'(?P<ordinal>[0123]\d\d)').replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1860
    'CC', r'(?P<century>\d\d$)')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1861
    + r'(T?(?P<hour>\d{2}):(?P<minute>\d{2})'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1862
    + r'(:(?P<second>\d{2}))?'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1863
    + r'(?P<tz>[+-](?P<tzhour>\d{2})(:(?P<tzmin>\d{2}))?|Z)?)?'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1864
    for tmpl in _iso8601_tmpl]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1865
del tmpl
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1866
_iso8601_matches = [re.compile(regex).match for regex in _iso8601_re]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1867
del regex
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1868
def _parse_date_iso8601(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1869
    '''Parse a variety of ISO-8601-compatible formats like 20040105'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1870
    m = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1871
    for _iso8601_match in _iso8601_matches:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1872
        m = _iso8601_match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1873
        if m: break
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1874
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1875
    if m.span() == (0, 0): return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1876
    params = m.groupdict()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1877
    ordinal = params.get('ordinal', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1878
    if ordinal:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1879
        ordinal = int(ordinal)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1880
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1881
        ordinal = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1882
    year = params.get('year', '--')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1883
    if not year or year == '--':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1884
        year = time.gmtime()[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1885
    elif len(year) == 2:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1886
        # ISO 8601 assumes current century, i.e. 93 -> 2093, NOT 1993
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1887
        year = 100 * int(time.gmtime()[0] / 100) + int(year)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1888
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1889
        year = int(year)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1890
    month = params.get('month', '-')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1891
    if not month or month == '-':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1892
        # ordinals are NOT normalized by mktime, we simulate them
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1893
        # by setting month=1, day=ordinal
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1894
        if ordinal:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1895
            month = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1896
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1897
            month = time.gmtime()[1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1898
    month = int(month)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1899
    day = params.get('day', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1900
    if not day:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1901
        # see above
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1902
        if ordinal:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1903
            day = ordinal
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1904
        elif params.get('century', 0) or \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1905
                 params.get('year', 0) or params.get('month', 0):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1906
            day = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1907
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1908
            day = time.gmtime()[2]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1909
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1910
        day = int(day)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1911
    # special case of the century - is the first year of the 21st century
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1912
    # 2000 or 2001 ? The debate goes on...
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1913
    if 'century' in params.keys():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1914
        year = (int(params['century']) - 1) * 100 + 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1915
    # in ISO 8601 most fields are optional
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1916
    for field in ['hour', 'minute', 'second', 'tzhour', 'tzmin']:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1917
        if not params.get(field, None):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1918
            params[field] = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1919
    hour = int(params.get('hour', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1920
    minute = int(params.get('minute', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1921
    second = int(params.get('second', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1922
    # weekday is normalized by mktime(), we can ignore it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1923
    weekday = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1924
    # daylight savings is complex, but not needed for feedparser's purposes
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1925
    # as time zones, if specified, include mention of whether it is active
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1926
    # (e.g. PST vs. PDT, CET). Using -1 is implementation-dependent and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1927
    # and most implementations have DST bugs
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1928
    daylight_savings_flag = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1929
    tm = [year, month, day, hour, minute, second, weekday,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1930
          ordinal, daylight_savings_flag]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1931
    # ISO 8601 time zone adjustments
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1932
    tz = params.get('tz')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1933
    if tz and tz != 'Z':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1934
        if tz[0] == '-':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1935
            tm[3] += int(params.get('tzhour', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1936
            tm[4] += int(params.get('tzmin', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1937
        elif tz[0] == '+':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1938
            tm[3] -= int(params.get('tzhour', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1939
            tm[4] -= int(params.get('tzmin', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1940
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1941
            return None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1942
    # Python's time.mktime() is a wrapper around the ANSI C mktime(3c)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1943
    # which is guaranteed to normalize d/m/y/h/m/s.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1944
    # Many implementations have bugs, but we'll pretend they don't.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1945
    return time.localtime(time.mktime(tm))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1946
registerDateHandler(_parse_date_iso8601)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1947
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1948
# 8-bit date handling routines written by ytrewq1.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1949
_korean_year  = u'\ub144' # b3e2 in euc-kr
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1950
_korean_month = u'\uc6d4' # bff9 in euc-kr
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1951
_korean_day   = u'\uc77c' # c0cf in euc-kr
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1952
_korean_am    = u'\uc624\uc804' # bfc0 c0fc in euc-kr
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1953
_korean_pm    = u'\uc624\ud6c4' # bfc0 c8c4 in euc-kr
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1954
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1955
_korean_onblog_date_re = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1956
    re.compile('(\d{4})%s\s+(\d{2})%s\s+(\d{2})%s\s+(\d{2}):(\d{2}):(\d{2})' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1957
               (_korean_year, _korean_month, _korean_day))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1958
_korean_nate_date_re = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1959
    re.compile(u'(\d{4})-(\d{2})-(\d{2})\s+(%s|%s)\s+(\d{,2}):(\d{,2}):(\d{,2})' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1960
               (_korean_am, _korean_pm))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1961
def _parse_date_onblog(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1962
    '''Parse a string according to the OnBlog 8-bit date format'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1963
    m = _korean_onblog_date_re.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1964
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1965
    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1966
                {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1967
                 'hour': m.group(4), 'minute': m.group(5), 'second': m.group(6),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1968
                 'zonediff': '+09:00'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1969
    if _debug: sys.stderr.write('OnBlog date parsed as: %s\n' % w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1970
    return _parse_date_w3dtf(w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1971
registerDateHandler(_parse_date_onblog)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1972
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1973
def _parse_date_nate(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1974
    '''Parse a string according to the Nate 8-bit date format'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1975
    m = _korean_nate_date_re.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1976
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1977
    hour = int(m.group(5))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1978
    ampm = m.group(4)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1979
    if (ampm == _korean_pm):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1980
        hour += 12
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1981
    hour = str(hour)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1982
    if len(hour) == 1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1983
        hour = '0' + hour
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1984
    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1985
                {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1986
                 'hour': hour, 'minute': m.group(6), 'second': m.group(7),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1987
                 'zonediff': '+09:00'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1988
    if _debug: sys.stderr.write('Nate date parsed as: %s\n' % w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1989
    return _parse_date_w3dtf(w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1990
registerDateHandler(_parse_date_nate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1991
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1992
_mssql_date_re = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1993
    re.compile('(\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2})(\.\d+)?')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1994
def _parse_date_mssql(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1995
    '''Parse a string according to the MS SQL date format'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1996
    m = _mssql_date_re.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1997
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1998
    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1999
                {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2000
                 'hour': m.group(4), 'minute': m.group(5), 'second': m.group(6),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2001
                 'zonediff': '+09:00'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2002
    if _debug: sys.stderr.write('MS SQL date parsed as: %s\n' % w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2003
    return _parse_date_w3dtf(w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2004
registerDateHandler(_parse_date_mssql)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2005
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2006
# Unicode strings for Greek date strings
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2007
_greek_months = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2008
  { \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2009
   u'\u0399\u03b1\u03bd': u'Jan',       # c9e1ed in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2010
   u'\u03a6\u03b5\u03b2': u'Feb',       # d6e5e2 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2011
   u'\u039c\u03ac\u03ce': u'Mar',       # ccdcfe in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2012
   u'\u039c\u03b1\u03ce': u'Mar',       # cce1fe in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2013
   u'\u0391\u03c0\u03c1': u'Apr',       # c1f0f1 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2014
   u'\u039c\u03ac\u03b9': u'May',       # ccdce9 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2015
   u'\u039c\u03b1\u03ca': u'May',       # cce1fa in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2016
   u'\u039c\u03b1\u03b9': u'May',       # cce1e9 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2017
   u'\u0399\u03bf\u03cd\u03bd': u'Jun', # c9effded in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2018
   u'\u0399\u03bf\u03bd': u'Jun',       # c9efed in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2019
   u'\u0399\u03bf\u03cd\u03bb': u'Jul', # c9effdeb in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2020
   u'\u0399\u03bf\u03bb': u'Jul',       # c9f9eb in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2021
   u'\u0391\u03cd\u03b3': u'Aug',       # c1fde3 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2022
   u'\u0391\u03c5\u03b3': u'Aug',       # c1f5e3 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2023
   u'\u03a3\u03b5\u03c0': u'Sep',       # d3e5f0 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2024
   u'\u039f\u03ba\u03c4': u'Oct',       # cfeaf4 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2025
   u'\u039d\u03bf\u03ad': u'Nov',       # cdefdd in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2026
   u'\u039d\u03bf\u03b5': u'Nov',       # cdefe5 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2027
   u'\u0394\u03b5\u03ba': u'Dec',       # c4e5ea in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2028
  }
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2029
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2030
_greek_wdays = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2031
  { \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2032
   u'\u039a\u03c5\u03c1': u'Sun', # caf5f1 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2033
   u'\u0394\u03b5\u03c5': u'Mon', # c4e5f5 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2034
   u'\u03a4\u03c1\u03b9': u'Tue', # d4f1e9 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2035
   u'\u03a4\u03b5\u03c4': u'Wed', # d4e5f4 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2036
   u'\u03a0\u03b5\u03bc': u'Thu', # d0e5ec in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2037
   u'\u03a0\u03b1\u03c1': u'Fri', # d0e1f1 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2038
   u'\u03a3\u03b1\u03b2': u'Sat', # d3e1e2 in iso-8859-7   
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2039
  }
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2040
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2041
_greek_date_format_re = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2042
    re.compile(u'([^,]+),\s+(\d{2})\s+([^\s]+)\s+(\d{4})\s+(\d{2}):(\d{2}):(\d{2})\s+([^\s]+)')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2043
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2044
def _parse_date_greek(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2045
    '''Parse a string according to a Greek 8-bit date format.'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2046
    m = _greek_date_format_re.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2047
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2048
    try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2049
        wday = _greek_wdays[m.group(1)]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2050
        month = _greek_months[m.group(3)]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2051
    except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2052
        return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2053
    rfc822date = '%(wday)s, %(day)s %(month)s %(year)s %(hour)s:%(minute)s:%(second)s %(zonediff)s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2054
                 {'wday': wday, 'day': m.group(2), 'month': month, 'year': m.group(4),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2055
                  'hour': m.group(5), 'minute': m.group(6), 'second': m.group(7),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2056
                  'zonediff': m.group(8)}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2057
    if _debug: sys.stderr.write('Greek date parsed as: %s\n' % rfc822date)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2058
    return _parse_date_rfc822(rfc822date)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2059
registerDateHandler(_parse_date_greek)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2060
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2061
# Unicode strings for Hungarian date strings
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2062
_hungarian_months = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2063
  { \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2064
    u'janu\u00e1r':   u'01',  # e1 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2065
    u'febru\u00e1ri': u'02',  # e1 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2066
    u'm\u00e1rcius':  u'03',  # e1 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2067
    u'\u00e1prilis':  u'04',  # e1 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2068
    u'm\u00e1ujus':   u'05',  # e1 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2069
    u'j\u00fanius':   u'06',  # fa in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2070
    u'j\u00falius':   u'07',  # fa in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2071
    u'augusztus':     u'08',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2072
    u'szeptember':    u'09',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2073
    u'okt\u00f3ber':  u'10',  # f3 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2074
    u'november':      u'11',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2075
    u'december':      u'12',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2076
  }
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2077
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2078
_hungarian_date_format_re = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2079
  re.compile(u'(\d{4})-([^-]+)-(\d{,2})T(\d{,2}):(\d{2})((\+|-)(\d{,2}:\d{2}))')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2080
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2081
def _parse_date_hungarian(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2082
    '''Parse a string according to a Hungarian 8-bit date format.'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2083
    m = _hungarian_date_format_re.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2084
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2085
    try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2086
        month = _hungarian_months[m.group(2)]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2087
        day = m.group(3)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2088
        if len(day) == 1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2089
            day = '0' + day
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2090
        hour = m.group(4)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2091
        if len(hour) == 1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2092
            hour = '0' + hour
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2093
    except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2094
        return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2095
    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s%(zonediff)s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2096
                {'year': m.group(1), 'month': month, 'day': day,\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2097
                 'hour': hour, 'minute': m.group(5),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2098
                 'zonediff': m.group(6)}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2099
    if _debug: sys.stderr.write('Hungarian date parsed as: %s\n' % w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2100
    return _parse_date_w3dtf(w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2101
registerDateHandler(_parse_date_hungarian)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2102
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2103
# W3DTF-style date parsing adapted from PyXML xml.utils.iso8601, written by
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2104
# Drake and licensed under the Python license.  Removed all range checking
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2105
# for month, day, hour, minute, and second, since mktime will normalize
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2106
# these later
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2107
def _parse_date_w3dtf(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2108
    def __extract_date(m):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2109
        year = int(m.group('year'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2110
        if year < 100:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2111
            year = 100 * int(time.gmtime()[0] / 100) + int(year)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2112
        if year < 1000:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2113
            return 0, 0, 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2114
        julian = m.group('julian')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2115
        if julian:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2116
            julian = int(julian)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2117
            month = julian / 30 + 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2118
            day = julian % 30 + 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2119
            jday = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2120
            while jday != julian:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2121
                t = time.mktime((year, month, day, 0, 0, 0, 0, 0, 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2122
                jday = time.gmtime(t)[-2]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2123
                diff = abs(jday - julian)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2124
                if jday > julian:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2125
                    if diff < day:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2126
                        day = day - diff
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2127
                    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2128
                        month = month - 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2129
                        day = 31
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2130
                elif jday < julian:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2131
                    if day + diff < 28:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2132
                       day = day + diff
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2133
                    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2134
                        month = month + 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2135
            return year, month, day
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2136
        month = m.group('month')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2137
        day = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2138
        if month is None:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2139
            month = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2140
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2141
            month = int(month)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2142
            day = m.group('day')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2143
            if day:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2144
                day = int(day)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2145
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2146
                day = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2147
        return year, month, day
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2148
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2149
    def __extract_time(m):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2150
        if not m:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2151
            return 0, 0, 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2152
        hours = m.group('hours')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2153
        if not hours:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2154
            return 0, 0, 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2155
        hours = int(hours)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2156
        minutes = int(m.group('minutes'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2157
        seconds = m.group('seconds')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2158
        if seconds:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2159
            seconds = int(seconds)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2160
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2161
            seconds = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2162
        return hours, minutes, seconds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2163
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2164
    def __extract_tzd(m):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2165
        '''Return the Time Zone Designator as an offset in seconds from UTC.'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2166
        if not m:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2167
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2168
        tzd = m.group('tzd')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2169
        if not tzd:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2170
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2171
        if tzd == 'Z':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2172
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2173
        hours = int(m.group('tzdhours'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2174
        minutes = m.group('tzdminutes')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2175
        if minutes:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2176
            minutes = int(minutes)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2177
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2178
            minutes = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2179
        offset = (hours*60 + minutes) * 60
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2180
        if tzd[0] == '+':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2181
            return -offset
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2182
        return offset
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2183
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2184
    __date_re = ('(?P<year>\d\d\d\d)'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2185
                 '(?:(?P<dsep>-|)'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2186
                 '(?:(?P<julian>\d\d\d)'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2187
                 '|(?P<month>\d\d)(?:(?P=dsep)(?P<day>\d\d))?))?')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2188
    __tzd_re = '(?P<tzd>[-+](?P<tzdhours>\d\d)(?::?(?P<tzdminutes>\d\d))|Z)'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2189
    __tzd_rx = re.compile(__tzd_re)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2190
    __time_re = ('(?P<hours>\d\d)(?P<tsep>:|)(?P<minutes>\d\d)'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2191
                 '(?:(?P=tsep)(?P<seconds>\d\d(?:[.,]\d+)?))?'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2192
                 + __tzd_re)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2193
    __datetime_re = '%s(?:T%s)?' % (__date_re, __time_re)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2194
    __datetime_rx = re.compile(__datetime_re)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2195
    m = __datetime_rx.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2196
    if (m is None) or (m.group() != dateString): return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2197
    gmt = __extract_date(m) + __extract_time(m) + (0, 0, 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2198
    if gmt[0] == 0: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2199
    return time.gmtime(time.mktime(gmt) + __extract_tzd(m) - time.timezone)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2200
registerDateHandler(_parse_date_w3dtf)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2201
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2202
def _parse_date_rfc822(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2203
    '''Parse an RFC822, RFC1123, RFC2822, or asctime-style date'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2204
    data = dateString.split()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2205
    if data[0][-1] in (',', '.') or data[0].lower() in rfc822._daynames:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2206
        del data[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2207
    if len(data) == 4:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2208
        s = data[3]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2209
        i = s.find('+')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2210
        if i > 0:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2211
            data[3:] = [s[:i], s[i+1:]]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2212
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2213
            data.append('')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2214
        dateString = " ".join(data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2215
    if len(data) < 5:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2216
        dateString += ' 00:00:00 GMT'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2217
    tm = rfc822.parsedate_tz(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2218
    if tm:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2219
        return time.gmtime(rfc822.mktime_tz(tm))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2220
# rfc822.py defines several time zones, but we define some extra ones.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2221
# 'ET' is equivalent to 'EST', etc.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2222
_additional_timezones = {'AT': -400, 'ET': -500, 'CT': -600, 'MT': -700, 'PT': -800}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2223
rfc822._timezones.update(_additional_timezones)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2224
registerDateHandler(_parse_date_rfc822)    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2225
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2226
def _parse_date(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2227
    '''Parses a variety of date formats into a 9-tuple in GMT'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2228
    for handler in _date_handlers:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2229
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2230
            date9tuple = handler(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2231
            if not date9tuple: continue
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2232
            if len(date9tuple) != 9:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2233
                if _debug: sys.stderr.write('date handler function must return 9-tuple\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2234
                raise ValueError
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2235
            map(int, date9tuple)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2236
            return date9tuple
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2237
        except Exception, e:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2238
            if _debug: sys.stderr.write('%s raised %s\n' % (handler.__name__, repr(e)))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2239
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2240
    return None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2241
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2242
def _getCharacterEncoding(http_headers, xml_data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2243
    '''Get the character encoding of the XML document
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2244
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2245
    http_headers is a dictionary
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2246
    xml_data is a raw string (not Unicode)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2247
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2248
    This is so much trickier than it sounds, it's not even funny.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2249
    According to RFC 3023 ('XML Media Types'), if the HTTP Content-Type
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2250
    is application/xml, application/*+xml,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2251
    application/xml-external-parsed-entity, or application/xml-dtd,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2252
    the encoding given in the charset parameter of the HTTP Content-Type
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2253
    takes precedence over the encoding given in the XML prefix within the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2254
    document, and defaults to 'utf-8' if neither are specified.  But, if
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2255
    the HTTP Content-Type is text/xml, text/*+xml, or
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2256
    text/xml-external-parsed-entity, the encoding given in the XML prefix
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2257
    within the document is ALWAYS IGNORED and only the encoding given in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2258
    the charset parameter of the HTTP Content-Type header should be
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2259
    respected, and it defaults to 'us-ascii' if not specified.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2260
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2261
    Furthermore, discussion on the atom-syntax mailing list with the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2262
    author of RFC 3023 leads me to the conclusion that any document
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2263
    served with a Content-Type of text/* and no charset parameter
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2264
    must be treated as us-ascii.  (We now do this.)  And also that it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2265
    must always be flagged as non-well-formed.  (We now do this too.)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2266
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2267
    If Content-Type is unspecified (input was local file or non-HTTP source)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2268
    or unrecognized (server just got it totally wrong), then go by the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2269
    encoding given in the XML prefix of the document and default to
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2270
    'iso-8859-1' as per the HTTP specification (RFC 2616).
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2271
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2272
    Then, assuming we didn't find a character encoding in the HTTP headers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2273
    (and the HTTP Content-type allowed us to look in the body), we need
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2274
    to sniff the first few bytes of the XML data and try to determine
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2275
    whether the encoding is ASCII-compatible.  Section F of the XML
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2276
    specification shows the way here:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2277
    http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2278
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2279
    If the sniffed encoding is not ASCII-compatible, we need to make it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2280
    ASCII compatible so that we can sniff further into the XML declaration
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2281
    to find the encoding attribute, which will tell us the true encoding.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2282
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2283
    Of course, none of this guarantees that we will be able to parse the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2284
    feed in the declared character encoding (assuming it was declared
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2285
    correctly, which many are not).  CJKCodecs and iconv_codec help a lot;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2286
    you should definitely install them if you can.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2287
    http://cjkpython.i18n.org/
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2288
    '''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2289
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2290
    def _parseHTTPContentType(content_type):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2291
        '''takes HTTP Content-Type header and returns (content type, charset)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2292
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2293
        If no charset is specified, returns (content type, '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2294
        If no content type is specified, returns ('', '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2295
        Both return parameters are guaranteed to be lowercase strings
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2296
        '''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2297
        content_type = content_type or ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2298
        content_type, params = cgi.parse_header(content_type)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2299
        return content_type, params.get('charset', '').replace("'", '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2300
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2301
    sniffed_xml_encoding = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2302
    xml_encoding = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2303
    true_encoding = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2304
    http_content_type, http_encoding = _parseHTTPContentType(http_headers.get('content-type'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2305
    # Must sniff for non-ASCII-compatible character encodings before
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2306
    # searching for XML declaration.  This heuristic is defined in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2307
    # section F of the XML specification:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2308
    # http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2309
    try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2310
        if xml_data[:4] == '\x4c\x6f\xa7\x94':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2311
            # EBCDIC
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2312
            xml_data = _ebcdic_to_ascii(xml_data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2313
        elif xml_data[:4] == '\x00\x3c\x00\x3f':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2314
            # UTF-16BE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2315
            sniffed_xml_encoding = 'utf-16be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2316
            xml_data = unicode(xml_data, 'utf-16be').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2317
        elif (len(xml_data) >= 4) and (xml_data[:2] == '\xfe\xff') and (xml_data[2:4] != '\x00\x00'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2318
            # UTF-16BE with BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2319
            sniffed_xml_encoding = 'utf-16be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2320
            xml_data = unicode(xml_data[2:], 'utf-16be').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2321
        elif xml_data[:4] == '\x3c\x00\x3f\x00':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2322
            # UTF-16LE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2323
            sniffed_xml_encoding = 'utf-16le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2324
            xml_data = unicode(xml_data, 'utf-16le').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2325
        elif (len(xml_data) >= 4) and (xml_data[:2] == '\xff\xfe') and (xml_data[2:4] != '\x00\x00'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2326
            # UTF-16LE with BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2327
            sniffed_xml_encoding = 'utf-16le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2328
            xml_data = unicode(xml_data[2:], 'utf-16le').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2329
        elif xml_data[:4] == '\x00\x00\x00\x3c':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2330
            # UTF-32BE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2331
            sniffed_xml_encoding = 'utf-32be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2332
            xml_data = unicode(xml_data, 'utf-32be').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2333
        elif xml_data[:4] == '\x3c\x00\x00\x00':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2334
            # UTF-32LE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2335
            sniffed_xml_encoding = 'utf-32le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2336
            xml_data = unicode(xml_data, 'utf-32le').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2337
        elif xml_data[:4] == '\x00\x00\xfe\xff':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2338
            # UTF-32BE with BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2339
            sniffed_xml_encoding = 'utf-32be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2340
            xml_data = unicode(xml_data[4:], 'utf-32be').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2341
        elif xml_data[:4] == '\xff\xfe\x00\x00':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2342
            # UTF-32LE with BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2343
            sniffed_xml_encoding = 'utf-32le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2344
            xml_data = unicode(xml_data[4:], 'utf-32le').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2345
        elif xml_data[:3] == '\xef\xbb\xbf':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2346
            # UTF-8 with BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2347
            sniffed_xml_encoding = 'utf-8'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2348
            xml_data = unicode(xml_data[3:], 'utf-8').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2349
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2350
            # ASCII-compatible
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2351
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2352
        xml_encoding_match = re.compile('^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2353
    except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2354
        xml_encoding_match = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2355
    if xml_encoding_match:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2356
        xml_encoding = xml_encoding_match.groups()[0].lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2357
        if sniffed_xml_encoding and (xml_encoding in ('iso-10646-ucs-2', 'ucs-2', 'csunicode', 'iso-10646-ucs-4', 'ucs-4', 'csucs4', 'utf-16', 'utf-32', 'utf_16', 'utf_32', 'utf16', 'u16')):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2358
            xml_encoding = sniffed_xml_encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2359
    acceptable_content_type = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2360
    application_content_types = ('application/xml', 'application/xml-dtd', 'application/xml-external-parsed-entity')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2361
    text_content_types = ('text/xml', 'text/xml-external-parsed-entity')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2362
    if (http_content_type in application_content_types) or \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2363
       (http_content_type.startswith('application/') and http_content_type.endswith('+xml')):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2364
        acceptable_content_type = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2365
        true_encoding = http_encoding or xml_encoding or 'utf-8'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2366
    elif (http_content_type in text_content_types) or \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2367
         (http_content_type.startswith('text/')) and http_content_type.endswith('+xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2368
        acceptable_content_type = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2369
        true_encoding = http_encoding or 'us-ascii'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2370
    elif http_content_type.startswith('text/'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2371
        true_encoding = http_encoding or 'us-ascii'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2372
    elif http_headers and (not http_headers.has_key('content-type')):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2373
        true_encoding = xml_encoding or 'iso-8859-1'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2374
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2375
        true_encoding = xml_encoding or 'utf-8'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2376
    return true_encoding, http_encoding, xml_encoding, sniffed_xml_encoding, acceptable_content_type
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2377
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2378
def _toUTF8(data, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2379
    '''Changes an XML data stream on the fly to specify a new encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2380
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2381
    data is a raw sequence of bytes (not Unicode) that is presumed to be in %encoding already
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2382
    encoding is a string recognized by encodings.aliases
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2383
    '''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2384
    if _debug: sys.stderr.write('entering _toUTF8, trying encoding %s\n' % encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2385
    # strip Byte Order Mark (if present)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2386
    if (len(data) >= 4) and (data[:2] == '\xfe\xff') and (data[2:4] != '\x00\x00'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2387
        if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2388
            sys.stderr.write('stripping BOM\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2389
            if encoding != 'utf-16be':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2390
                sys.stderr.write('trying utf-16be instead\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2391
        encoding = 'utf-16be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2392
        data = data[2:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2393
    elif (len(data) >= 4) and (data[:2] == '\xff\xfe') and (data[2:4] != '\x00\x00'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2394
        if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2395
            sys.stderr.write('stripping BOM\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2396
            if encoding != 'utf-16le':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2397
                sys.stderr.write('trying utf-16le instead\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2398
        encoding = 'utf-16le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2399
        data = data[2:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2400
    elif data[:3] == '\xef\xbb\xbf':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2401
        if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2402
            sys.stderr.write('stripping BOM\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2403
            if encoding != 'utf-8':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2404
                sys.stderr.write('trying utf-8 instead\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2405
        encoding = 'utf-8'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2406
        data = data[3:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2407
    elif data[:4] == '\x00\x00\xfe\xff':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2408
        if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2409
            sys.stderr.write('stripping BOM\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2410
            if encoding != 'utf-32be':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2411
                sys.stderr.write('trying utf-32be instead\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2412
        encoding = 'utf-32be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2413
        data = data[4:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2414
    elif data[:4] == '\xff\xfe\x00\x00':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2415
        if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2416
            sys.stderr.write('stripping BOM\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2417
            if encoding != 'utf-32le':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2418
                sys.stderr.write('trying utf-32le instead\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2419
        encoding = 'utf-32le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2420
        data = data[4:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2421
    newdata = unicode(data, encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2422
    if _debug: sys.stderr.write('successfully converted %s data to unicode\n' % encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2423
    declmatch = re.compile('^<\?xml[^>]*?>')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2424
    newdecl = '''<?xml version='1.0' encoding='utf-8'?>'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2425
    if declmatch.search(newdata):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2426
        newdata = declmatch.sub(newdecl, newdata)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2427
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2428
        newdata = newdecl + u'\n' + newdata
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2429
    return newdata.encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2430
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2431
def _stripDoctype(data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2432
    '''Strips DOCTYPE from XML document, returns (rss_version, stripped_data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2433
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2434
    rss_version may be 'rss091n' or None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2435
    stripped_data is the same XML document, minus the DOCTYPE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2436
    '''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2437
    entity_pattern = re.compile(r'<!ENTITY([^>]*?)>', re.MULTILINE)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2438
    data = entity_pattern.sub('', data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2439
    doctype_pattern = re.compile(r'<!DOCTYPE([^>]*?)>', re.MULTILINE)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2440
    doctype_results = doctype_pattern.findall(data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2441
    doctype = doctype_results and doctype_results[0] or ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2442
    if doctype.lower().count('netscape'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2443
        version = 'rss091n'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2444
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2445
        version = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2446
    data = doctype_pattern.sub('', data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2447
    return version, data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2448
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2449
def parse(url_file_stream_or_string, etag=None, modified=None, agent=None, referrer=None, handlers=[]):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2450
    '''Parse a feed from a URL, file, stream, or string'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2451
    result = FeedParserDict()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2452
    result['feed'] = FeedParserDict()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2453
    result['entries'] = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2454
    if _XML_AVAILABLE:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2455
        result['bozo'] = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2456
    if type(handlers) == types.InstanceType:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2457
        handlers = [handlers]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2458
    try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2459
        f = _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2460
        data = f.read()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2461
    except Exception, e:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2462
        result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2463
        result['bozo_exception'] = e
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2464
        data = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2465
        f = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2466
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2467
    # if feed is gzip-compressed, decompress it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2468
    if f and data and hasattr(f, 'headers'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2469
        if gzip and f.headers.get('content-encoding', '') == 'gzip':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2470
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2471
                data = gzip.GzipFile(fileobj=_StringIO(data)).read()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2472
            except Exception, e:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2473
                # Some feeds claim to be gzipped but they're not, so
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2474
                # we get garbage.  Ideally, we should re-request the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2475
                # feed without the 'Accept-encoding: gzip' header,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2476
                # but we don't.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2477
                result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2478
                result['bozo_exception'] = e
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2479
                data = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2480
        elif zlib and f.headers.get('content-encoding', '') == 'deflate':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2481
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2482
                data = zlib.decompress(data, -zlib.MAX_WBITS)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2483
            except Exception, e:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2484
                result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2485
                result['bozo_exception'] = e
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2486
                data = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2487
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2488
    # save HTTP headers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2489
    if hasattr(f, 'info'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2490
        info = f.info()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2491
        result['etag'] = info.getheader('ETag')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2492
        last_modified = info.getheader('Last-Modified')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2493
        if last_modified:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2494
            result['modified'] = _parse_date(last_modified)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2495
    if hasattr(f, 'url'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2496
        result['href'] = f.url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2497
        result['status'] = 200
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2498
    if hasattr(f, 'status'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2499
        result['status'] = f.status
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2500
    if hasattr(f, 'headers'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2501
        result['headers'] = f.headers.dict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2502
    if hasattr(f, 'close'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2503
        f.close()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2504
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2505
    # there are four encodings to keep track of:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2506
    # - http_encoding is the encoding declared in the Content-Type HTTP header
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2507
    # - xml_encoding is the encoding declared in the <?xml declaration
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2508
    # - sniffed_encoding is the encoding sniffed from the first 4 bytes of the XML data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2509
    # - result['encoding'] is the actual encoding, as per RFC 3023 and a variety of other conflicting specifications
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2510
    http_headers = result.get('headers', {})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2511
    result['encoding'], http_encoding, xml_encoding, sniffed_xml_encoding, acceptable_content_type = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2512
        _getCharacterEncoding(http_headers, data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2513
    if http_headers and (not acceptable_content_type):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2514
        if http_headers.has_key('content-type'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2515
            bozo_message = '%s is not an XML media type' % http_headers['content-type']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2516
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2517
            bozo_message = 'no Content-type specified'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2518
        result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2519
        result['bozo_exception'] = NonXMLContentType(bozo_message)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2520
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2521
    result['version'], data = _stripDoctype(data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2522
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2523
    baseuri = http_headers.get('content-location', result.get('href'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2524
    baselang = http_headers.get('content-language', None)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2525
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2526
    # if server sent 304, we're done
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2527
    if result.get('status', 0) == 304:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2528
        result['version'] = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2529
        result['debug_message'] = 'The feed has not changed since you last checked, ' + \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2530
            'so the server sent no data.  This is a feature, not a bug!'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2531
        return result
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2532
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2533
    # if there was a problem downloading, we're done
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2534
    if not data:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2535
        return result
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2536
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2537
    # determine character encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2538
    use_strict_parser = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2539
    known_encoding = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2540
    tried_encodings = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2541
    # try: HTTP encoding, declared XML encoding, encoding sniffed from BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2542
    for proposed_encoding in (result['encoding'], xml_encoding, sniffed_xml_encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2543
        if not proposed_encoding: continue
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2544
        if proposed_encoding in tried_encodings: continue
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2545
        tried_encodings.append(proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2546
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2547
            data = _toUTF8(data, proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2548
            known_encoding = use_strict_parser = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2549
            break
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2550
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2551
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2552
    # if no luck and we have auto-detection library, try that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2553
    if (not known_encoding) and chardet:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2554
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2555
            proposed_encoding = chardet.detect(data)['encoding']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2556
            if proposed_encoding and (proposed_encoding not in tried_encodings):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2557
                tried_encodings.append(proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2558
                data = _toUTF8(data, proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2559
                known_encoding = use_strict_parser = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2560
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2561
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2562
    # if still no luck and we haven't tried utf-8 yet, try that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2563
    if (not known_encoding) and ('utf-8' not in tried_encodings):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2564
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2565
            proposed_encoding = 'utf-8'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2566
            tried_encodings.append(proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2567
            data = _toUTF8(data, proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2568
            known_encoding = use_strict_parser = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2569
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2570
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2571
    # if still no luck and we haven't tried windows-1252 yet, try that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2572
    if (not known_encoding) and ('windows-1252' not in tried_encodings):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2573
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2574
            proposed_encoding = 'windows-1252'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2575
            tried_encodings.append(proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2576
            data = _toUTF8(data, proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2577
            known_encoding = use_strict_parser = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2578
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2579
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2580
    # if still no luck, give up
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2581
    if not known_encoding:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2582
        result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2583
        result['bozo_exception'] = CharacterEncodingUnknown( \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2584
            'document encoding unknown, I tried ' + \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2585
            '%s, %s, utf-8, and windows-1252 but nothing worked' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2586
            (result['encoding'], xml_encoding))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2587
        result['encoding'] = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2588
    elif proposed_encoding != result['encoding']:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2589
        result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2590
        result['bozo_exception'] = CharacterEncodingOverride( \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2591
            'documented declared as %s, but parsed as %s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2592
            (result['encoding'], proposed_encoding))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2593
        result['encoding'] = proposed_encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2594
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2595
    if not _XML_AVAILABLE:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2596
        use_strict_parser = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2597
    if use_strict_parser:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2598
        # initialize the SAX parser
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2599
        feedparser = _StrictFeedParser(baseuri, baselang, 'utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2600
        saxparser = xml.sax.make_parser(PREFERRED_XML_PARSERS)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2601
        saxparser.setFeature(xml.sax.handler.feature_namespaces, 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2602
        saxparser.setContentHandler(feedparser)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2603
        saxparser.setErrorHandler(feedparser)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2604
        source = xml.sax.xmlreader.InputSource()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2605
        source.setByteStream(_StringIO(data))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2606
        if hasattr(saxparser, '_ns_stack'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2607
            # work around bug in built-in SAX parser (doesn't recognize xml: namespace)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2608
            # PyXML doesn't have this problem, and it doesn't have _ns_stack either
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2609
            saxparser._ns_stack.append({'http://www.w3.org/XML/1998/namespace':'xml'})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2610
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2611
            saxparser.parse(source)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2612
        except Exception, e:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2613
            if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2614
                import traceback
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2615
                traceback.print_stack()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2616
                traceback.print_exc()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2617
                sys.stderr.write('xml parsing failed\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2618
            result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2619
            result['bozo_exception'] = feedparser.exc or e
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2620
            use_strict_parser = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2621
    if not use_strict_parser:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2622
        feedparser = _LooseFeedParser(baseuri, baselang, known_encoding and 'utf-8' or '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2623
        feedparser.feed(data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2624
    result['feed'] = feedparser.feeddata
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2625
    result['entries'] = feedparser.entries
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2626
    result['version'] = result['version'] or feedparser.version
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2627
    result['namespaces'] = feedparser.namespacesInUse
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2628
    return result
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2629
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2630
if __name__ == '__main__':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2631
    if not sys.argv[1:]:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2632
        print __doc__
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2633
        sys.exit(0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2634
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2635
        urls = sys.argv[1:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2636
    zopeCompatibilityHack()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2637
    from pprint import pprint
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2638
    for url in urls:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2639
        print url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2640
        print
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2641
        result = parse(url)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2642
        pprint(result)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2643
        print
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2644
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2645
#REVISION HISTORY
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2646
#1.0 - 9/27/2002 - MAP - fixed namespace processing on prefixed RSS 2.0 elements,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2647
#  added Simon Fell's test suite
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2648
#1.1 - 9/29/2002 - MAP - fixed infinite loop on incomplete CDATA sections
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2649
#2.0 - 10/19/2002
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2650
#  JD - use inchannel to watch out for image and textinput elements which can
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2651
#  also contain title, link, and description elements
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2652
#  JD - check for isPermaLink='false' attribute on guid elements
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2653
#  JD - replaced openAnything with open_resource supporting ETag and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2654
#  If-Modified-Since request headers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2655
#  JD - parse now accepts etag, modified, agent, and referrer optional
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2656
#  arguments
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2657
#  JD - modified parse to return a dictionary instead of a tuple so that any
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2658
#  etag or modified information can be returned and cached by the caller
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2659
#2.0.1 - 10/21/2002 - MAP - changed parse() so that if we don't get anything
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2660
#  because of etag/modified, return the old etag/modified to the caller to
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2661
#  indicate why nothing is being returned
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2662
#2.0.2 - 10/21/2002 - JB - added the inchannel to the if statement, otherwise its
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2663
#  useless.  Fixes the problem JD was addressing by adding it.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2664
#2.1 - 11/14/2002 - MAP - added gzip support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2665
#2.2 - 1/27/2003 - MAP - added attribute support, admin:generatorAgent.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2666
#  start_admingeneratoragent is an example of how to handle elements with
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2667
#  only attributes, no content.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2668
#2.3 - 6/11/2003 - MAP - added USER_AGENT for default (if caller doesn't specify);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2669
#  also, make sure we send the User-Agent even if urllib2 isn't available.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2670
#  Match any variation of backend.userland.com/rss namespace.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2671
#2.3.1 - 6/12/2003 - MAP - if item has both link and guid, return both as-is.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2672
#2.4 - 7/9/2003 - MAP - added preliminary Pie/Atom/Echo support based on Sam Ruby's
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2673
#  snapshot of July 1 <http://www.intertwingly.net/blog/1506.html>; changed
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2674
#  project name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2675
#2.5 - 7/25/2003 - MAP - changed to Python license (all contributors agree);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2676
#  removed unnecessary urllib code -- urllib2 should always be available anyway;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2677
#  return actual url, status, and full HTTP headers (as result['url'],
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2678
#  result['status'], and result['headers']) if parsing a remote feed over HTTP --
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2679
#  this should pass all the HTTP tests at <http://diveintomark.org/tests/client/http/>;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2680
#  added the latest namespace-of-the-week for RSS 2.0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2681
#2.5.1 - 7/26/2003 - RMK - clear opener.addheaders so we only send our custom
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2682
#  User-Agent (otherwise urllib2 sends two, which confuses some servers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2683
#2.5.2 - 7/28/2003 - MAP - entity-decode inline xml properly; added support for
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2684
#  inline <xhtml:body> and <xhtml:div> as used in some RSS 2.0 feeds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2685
#2.5.3 - 8/6/2003 - TvdV - patch to track whether we're inside an image or
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2686
#  textInput, and also to return the character encoding (if specified)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2687
#2.6 - 1/1/2004 - MAP - dc:author support (MarekK); fixed bug tracking
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2688
#  nested divs within content (JohnD); fixed missing sys import (JohanS);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2689
#  fixed regular expression to capture XML character encoding (Andrei);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2690
#  added support for Atom 0.3-style links; fixed bug with textInput tracking;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2691
#  added support for cloud (MartijnP); added support for multiple
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2692
#  category/dc:subject (MartijnP); normalize content model: 'description' gets
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2693
#  description (which can come from description, summary, or full content if no
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2694
#  description), 'content' gets dict of base/language/type/value (which can come
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2695
#  from content:encoded, xhtml:body, content, or fullitem);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2696
#  fixed bug matching arbitrary Userland namespaces; added xml:base and xml:lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2697
#  tracking; fixed bug tracking unknown tags; fixed bug tracking content when
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2698
#  <content> element is not in default namespace (like Pocketsoap feed);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2699
#  resolve relative URLs in link, guid, docs, url, comments, wfw:comment,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2700
#  wfw:commentRSS; resolve relative URLs within embedded HTML markup in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2701
#  description, xhtml:body, content, content:encoded, title, subtitle,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2702
#  summary, info, tagline, and copyright; added support for pingback and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2703
#  trackback namespaces
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2704
#2.7 - 1/5/2004 - MAP - really added support for trackback and pingback
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2705
#  namespaces, as opposed to 2.6 when I said I did but didn't really;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2706
#  sanitize HTML markup within some elements; added mxTidy support (if
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2707
#  installed) to tidy HTML markup within some elements; fixed indentation
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2708
#  bug in _parse_date (FazalM); use socket.setdefaulttimeout if available
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2709
#  (FazalM); universal date parsing and normalization (FazalM): 'created', modified',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2710
#  'issued' are parsed into 9-tuple date format and stored in 'created_parsed',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2711
#  'modified_parsed', and 'issued_parsed'; 'date' is duplicated in 'modified'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2712
#  and vice-versa; 'date_parsed' is duplicated in 'modified_parsed' and vice-versa
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2713
#2.7.1 - 1/9/2004 - MAP - fixed bug handling &quot; and &apos;.  fixed memory
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2714
#  leak not closing url opener (JohnD); added dc:publisher support (MarekK);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2715
#  added admin:errorReportsTo support (MarekK); Python 2.1 dict support (MarekK)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2716
#2.7.4 - 1/14/2004 - MAP - added workaround for improperly formed <br/> tags in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2717
#  encoded HTML (skadz); fixed unicode handling in normalize_attrs (ChrisL);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2718
#  fixed relative URI processing for guid (skadz); added ICBM support; added
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2719
#  base64 support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2720
#2.7.5 - 1/15/2004 - MAP - added workaround for malformed DOCTYPE (seen on many
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2721
#  blogspot.com sites); added _debug variable
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2722
#2.7.6 - 1/16/2004 - MAP - fixed bug with StringIO importing
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2723
#3.0b3 - 1/23/2004 - MAP - parse entire feed with real XML parser (if available);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2724
#  added several new supported namespaces; fixed bug tracking naked markup in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2725
#  description; added support for enclosure; added support for source; re-added
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2726
#  support for cloud which got dropped somehow; added support for expirationDate
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2727
#3.0b4 - 1/26/2004 - MAP - fixed xml:lang inheritance; fixed multiple bugs tracking
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2728
#  xml:base URI, one for documents that don't define one explicitly and one for
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2729
#  documents that define an outer and an inner xml:base that goes out of scope
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2730
#  before the end of the document
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2731
#3.0b5 - 1/26/2004 - MAP - fixed bug parsing multiple links at feed level
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2732
#3.0b6 - 1/27/2004 - MAP - added feed type and version detection, result['version']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2733
#  will be one of SUPPORTED_VERSIONS.keys() or empty string if unrecognized;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2734
#  added support for creativeCommons:license and cc:license; added support for
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2735
#  full Atom content model in title, tagline, info, copyright, summary; fixed bug
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2736
#  with gzip encoding (not always telling server we support it when we do)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2737
#3.0b7 - 1/28/2004 - MAP - support Atom-style author element in author_detail
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2738
#  (dictionary of 'name', 'url', 'email'); map author to author_detail if author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2739
#  contains name + email address
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2740
#3.0b8 - 1/28/2004 - MAP - added support for contributor
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2741
#3.0b9 - 1/29/2004 - MAP - fixed check for presence of dict function; added
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2742
#  support for summary
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2743
#3.0b10 - 1/31/2004 - MAP - incorporated ISO-8601 date parsing routines from
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2744
#  xml.util.iso8601
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2745
#3.0b11 - 2/2/2004 - MAP - added 'rights' to list of elements that can contain
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2746
#  dangerous markup; fiddled with decodeEntities (not right); liberalized
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2747
#  date parsing even further
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2748
#3.0b12 - 2/6/2004 - MAP - fiddled with decodeEntities (still not right);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2749
#  added support to Atom 0.2 subtitle; added support for Atom content model
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2750
#  in copyright; better sanitizing of dangerous HTML elements with end tags
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2751
#  (script, frameset)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2752
#3.0b13 - 2/8/2004 - MAP - better handling of empty HTML tags (br, hr, img,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2753
#  etc.) in embedded markup, in either HTML or XHTML form (<br>, <br/>, <br />)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2754
#3.0b14 - 2/8/2004 - MAP - fixed CDATA handling in non-wellformed feeds under
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2755
#  Python 2.1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2756
#3.0b15 - 2/11/2004 - MAP - fixed bug resolving relative links in wfw:commentRSS;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2757
#  fixed bug capturing author and contributor URL; fixed bug resolving relative
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2758
#  links in author and contributor URL; fixed bug resolvin relative links in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2759
#  generator URL; added support for recognizing RSS 1.0; passed Simon Fell's
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2760
#  namespace tests, and included them permanently in the test suite with his
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2761
#  permission; fixed namespace handling under Python 2.1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2762
#3.0b16 - 2/12/2004 - MAP - fixed support for RSS 0.90 (broken in b15)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2763
#3.0b17 - 2/13/2004 - MAP - determine character encoding as per RFC 3023
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2764
#3.0b18 - 2/17/2004 - MAP - always map description to summary_detail (Andrei);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2765
#  use libxml2 (if available)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2766
#3.0b19 - 3/15/2004 - MAP - fixed bug exploding author information when author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2767
#  name was in parentheses; removed ultra-problematic mxTidy support; patch to
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2768
#  workaround crash in PyXML/expat when encountering invalid entities
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2769
#  (MarkMoraes); support for textinput/textInput
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2770
#3.0b20 - 4/7/2004 - MAP - added CDF support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2771
#3.0b21 - 4/14/2004 - MAP - added Hot RSS support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2772
#3.0b22 - 4/19/2004 - MAP - changed 'channel' to 'feed', 'item' to 'entries' in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2773
#  results dict; changed results dict to allow getting values with results.key
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2774
#  as well as results[key]; work around embedded illformed HTML with half
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2775
#  a DOCTYPE; work around malformed Content-Type header; if character encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2776
#  is wrong, try several common ones before falling back to regexes (if this
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2777
#  works, bozo_exception is set to CharacterEncodingOverride); fixed character
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2778
#  encoding issues in BaseHTMLProcessor by tracking encoding and converting
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2779
#  from Unicode to raw strings before feeding data to sgmllib.SGMLParser;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2780
#  convert each value in results to Unicode (if possible), even if using
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2781
#  regex-based parsing
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2782
#3.0b23 - 4/21/2004 - MAP - fixed UnicodeDecodeError for feeds that contain
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2783
#  high-bit characters in attributes in embedded HTML in description (thanks
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2784
#  Thijs van de Vossen); moved guid, date, and date_parsed to mapped keys in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2785
#  FeedParserDict; tweaked FeedParserDict.has_key to return True if asking
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2786
#  about a mapped key
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2787
#3.0fc1 - 4/23/2004 - MAP - made results.entries[0].links[0] and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2788
#  results.entries[0].enclosures[0] into FeedParserDict; fixed typo that could
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2789
#  cause the same encoding to be tried twice (even if it failed the first time);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2790
#  fixed DOCTYPE stripping when DOCTYPE contained entity declarations;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2791
#  better textinput and image tracking in illformed RSS 1.0 feeds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2792
#3.0fc2 - 5/10/2004 - MAP - added and passed Sam's amp tests; added and passed
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2793
#  my blink tag tests
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2794
#3.0fc3 - 6/18/2004 - MAP - fixed bug in _changeEncodingDeclaration that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2795
#  failed to parse utf-16 encoded feeds; made source into a FeedParserDict;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2796
#  duplicate admin:generatorAgent/@rdf:resource in generator_detail.url;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2797
#  added support for image; refactored parse() fallback logic to try other
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2798
#  encodings if SAX parsing fails (previously it would only try other encodings
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2799
#  if re-encoding failed); remove unichr madness in normalize_attrs now that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2800
#  we're properly tracking encoding in and out of BaseHTMLProcessor; set
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2801
#  feed.language from root-level xml:lang; set entry.id from rdf:about;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2802
#  send Accept header
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2803
#3.0 - 6/21/2004 - MAP - don't try iso-8859-1 (can't distinguish between
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2804
#  iso-8859-1 and windows-1252 anyway, and most incorrectly marked feeds are
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2805
#  windows-1252); fixed regression that could cause the same encoding to be
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2806
#  tried twice (even if it failed the first time)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2807
#3.0.1 - 6/22/2004 - MAP - default to us-ascii for all text/* content types;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2808
#  recover from malformed content-type header parameter with no equals sign
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2809
#  ('text/xml; charset:iso-8859-1')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2810
#3.1 - 6/28/2004 - MAP - added and passed tests for converting HTML entities
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2811
#  to Unicode equivalents in illformed feeds (aaronsw); added and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2812
#  passed tests for converting character entities to Unicode equivalents
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2813
#  in illformed feeds (aaronsw); test for valid parsers when setting
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2814
#  XML_AVAILABLE; make version and encoding available when server returns
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2815
#  a 304; add handlers parameter to pass arbitrary urllib2 handlers (like
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2816
#  digest auth or proxy support); add code to parse username/password
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2817
#  out of url and send as basic authentication; expose downloading-related
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2818
#  exceptions in bozo_exception (aaronsw); added __contains__ method to
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2819
#  FeedParserDict (aaronsw); added publisher_detail (aaronsw)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2820
#3.2 - 7/3/2004 - MAP - use cjkcodecs and iconv_codec if available; always
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2821
#  convert feed to UTF-8 before passing to XML parser; completely revamped
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2822
#  logic for determining character encoding and attempting XML parsing
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2823
#  (much faster); increased default timeout to 20 seconds; test for presence
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2824
#  of Location header on redirects; added tests for many alternate character
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2825
#  encodings; support various EBCDIC encodings; support UTF-16BE and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2826
#  UTF16-LE with or without a BOM; support UTF-8 with a BOM; support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2827
#  UTF-32BE and UTF-32LE with or without a BOM; fixed crashing bug if no
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2828
#  XML parsers are available; added support for 'Content-encoding: deflate';
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2829
#  send blank 'Accept-encoding: ' header if neither gzip nor zlib modules
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2830
#  are available
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2831
#3.3 - 7/15/2004 - MAP - optimize EBCDIC to ASCII conversion; fix obscure
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2832
#  problem tracking xml:base and xml:lang if element declares it, child
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2833
#  doesn't, first grandchild redeclares it, and second grandchild doesn't;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2834
#  refactored date parsing; defined public registerDateHandler so callers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2835
#  can add support for additional date formats at runtime; added support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2836
#  for OnBlog, Nate, MSSQL, Greek, and Hungarian dates (ytrewq1); added
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2837
#  zopeCompatibilityHack() which turns FeedParserDict into a regular
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2838
#  dictionary, required for Zope compatibility, and also makes command-
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2839
#  line debugging easier because pprint module formats real dictionaries
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2840
#  better than dictionary-like objects; added NonXMLContentType exception,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2841
#  which is stored in bozo_exception when a feed is served with a non-XML
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2842
#  media type such as 'text/plain'; respect Content-Language as default
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2843
#  language if not xml:lang is present; cloud dict is now FeedParserDict;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2844
#  generator dict is now FeedParserDict; better tracking of xml:lang,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2845
#  including support for xml:lang='' to unset the current language;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2846
#  recognize RSS 1.0 feeds even when RSS 1.0 namespace is not the default
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2847
#  namespace; don't overwrite final status on redirects (scenarios:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2848
#  redirecting to a URL that returns 304, redirecting to a URL that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2849
#  redirects to another URL with a different type of redirect); add
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2850
#  support for HTTP 303 redirects
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2851
#4.0 - MAP - support for relative URIs in xml:base attribute; fixed
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2852
#  encoding issue with mxTidy (phopkins); preliminary support for RFC 3229;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2853
#  support for Atom 1.0; support for iTunes extensions; new 'tags' for
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2854
#  categories/keywords/etc. as array of dict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2855
#  {'term': term, 'scheme': scheme, 'label': label} to match Atom 1.0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2856
#  terminology; parse RFC 822-style dates with no time; lots of other
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2857
#  bug fixes
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2858
#4.1 - MAP - removed socket timeout; added support for chardet library