app/feedparser/__init__.py
author Madhusudan.C.S <madhusudancs@gmail.com>
Mon, 24 Aug 2009 04:31:23 +0530
changeset 2787 8408741aee63
parent 151 6f8eb27752dc
permissions -rwxr-xr-x
Reverting last 4 patches containing GHOP related views. As Lennard suggested all the model patches should come first followed by the logic and views patches, to make sure nothing committed breaks the existing code after thorough review.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
140
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     1
#!/usr/bin/env python
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     2
"""Universal feed parser
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     3
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     4
Handles RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 feeds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     5
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     6
Visit http://feedparser.org/ for the latest version
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     7
Visit http://feedparser.org/docs/ for the latest documentation
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     8
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     9
Required: Python 2.1 or later
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    10
Recommended: Python 2.3 or later
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    11
Recommended: CJKCodecs and iconv_codec <http://cjkpython.i18n.org/>
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    12
"""
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    13
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    14
__version__ = "4.1"# + "$Revision: 1.92 $"[11:15] + "-cvs"
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    15
__license__ = """Copyright (c) 2002-2006, Mark Pilgrim, All rights reserved.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    16
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    17
Redistribution and use in source and binary forms, with or without modification,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    18
are permitted provided that the following conditions are met:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    19
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    20
* Redistributions of source code must retain the above copyright notice,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    21
  this list of conditions and the following disclaimer.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    22
* Redistributions in binary form must reproduce the above copyright notice,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    23
  this list of conditions and the following disclaimer in the documentation
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    24
  and/or other materials provided with the distribution.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    25
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    26
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    27
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    28
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    29
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    30
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    31
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    32
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    33
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    34
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    35
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    36
POSSIBILITY OF SUCH DAMAGE."""
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    37
__author__ = "Mark Pilgrim <http://diveintomark.org/>"
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    38
__contributors__ = ["Jason Diamond <http://injektilo.org/>",
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    39
                    "John Beimler <http://john.beimler.org/>",
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    40
                    "Fazal Majid <http://www.majid.info/mylos/weblog/>",
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    41
                    "Aaron Swartz <http://aaronsw.com/>",
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    42
                    "Kevin Marks <http://epeus.blogspot.com/>"]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    43
_debug = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    44
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    45
# HTTP "User-Agent" header to send to servers when downloading feeds.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    46
# If you are embedding feedparser in a larger application, you should
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    47
# change this to your application name and URL.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    48
USER_AGENT = "UniversalFeedParser/%s +http://feedparser.org/" % __version__
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    49
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    50
# HTTP "Accept" header to send to servers when downloading feeds.  If you don't
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    51
# want to send an Accept header, set this to None.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    52
ACCEPT_HEADER = "application/atom+xml,application/rdf+xml,application/rss+xml,application/x-netcdf,application/xml;q=0.9,text/xml;q=0.2,*/*;q=0.1"
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    53
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    54
# List of preferred XML parsers, by SAX driver name.  These will be tried first,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    55
# but if they're not installed, Python will keep searching through its own list
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    56
# of pre-installed parsers until it finds one that supports everything we need.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    57
PREFERRED_XML_PARSERS = ["drv_libxml2"]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    58
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    59
# If you want feedparser to automatically run HTML markup through HTML Tidy, set
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    60
# this to 1.  Requires mxTidy <http://www.egenix.com/files/python/mxTidy.html>
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    61
# or utidylib <http://utidylib.berlios.de/>.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    62
TIDY_MARKUP = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    63
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    64
# List of Python interfaces for HTML Tidy, in order of preference.  Only useful
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    65
# if TIDY_MARKUP = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    66
PREFERRED_TIDY_INTERFACES = ["uTidy", "mxTidy"]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    67
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    68
# ---------- required modules (should come with any Python distribution) ----------
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    69
import sgmllib, re, sys, copy, urlparse, time, rfc822, types, cgi, urllib, urllib2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    70
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    71
    from cStringIO import StringIO as _StringIO
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    72
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    73
    from StringIO import StringIO as _StringIO
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    74
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    75
# ---------- optional modules (feedparser will work without these, but with reduced functionality) ----------
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    76
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    77
# gzip is included with most Python distributions, but may not be available if you compiled your own
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    78
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    79
    import gzip
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    80
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    81
    gzip = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    82
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    83
    import zlib
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    84
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    85
    zlib = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    86
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    87
# If a real XML parser is available, feedparser will attempt to use it.  feedparser has
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    88
# been tested with the built-in SAX parser, PyXML, and libxml2.  On platforms where the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    89
# Python distribution does not come with an XML parser (such as Mac OS X 10.2 and some
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    90
# versions of FreeBSD), feedparser will quietly fall back on regex-based parsing.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    91
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    92
    import xml.sax
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    93
    xml.sax.make_parser(PREFERRED_XML_PARSERS) # test for valid parsers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    94
    from xml.sax.saxutils import escape as _xmlescape
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    95
    _XML_AVAILABLE = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    96
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    97
    _XML_AVAILABLE = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    98
    def _xmlescape(data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    99
        data = data.replace('&', '&amp;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   100
        data = data.replace('>', '&gt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   101
        data = data.replace('<', '&lt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   102
        return data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   103
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   104
# base64 support for Atom feeds that contain embedded binary data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   105
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   106
    import base64, binascii
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   107
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   108
    base64 = binascii = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   109
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   110
# cjkcodecs and iconv_codec provide support for more character encodings.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   111
# Both are available from http://cjkpython.i18n.org/
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   112
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   113
    import cjkcodecs.aliases
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   114
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   115
    pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   116
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   117
    import iconv_codec
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   118
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   119
    pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   120
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   121
# chardet library auto-detects character encodings
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   122
# Download from http://chardet.feedparser.org/
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   123
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   124
    import chardet
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   125
    if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   126
        import chardet.constants
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   127
        chardet.constants._debug = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   128
except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   129
    chardet = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   130
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   131
# ---------- don't touch these ----------
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   132
class ThingsNobodyCaresAboutButMe(Exception): pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   133
class CharacterEncodingOverride(ThingsNobodyCaresAboutButMe): pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   134
class CharacterEncodingUnknown(ThingsNobodyCaresAboutButMe): pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   135
class NonXMLContentType(ThingsNobodyCaresAboutButMe): pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   136
class UndeclaredNamespace(Exception): pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   137
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   138
sgmllib.tagfind = re.compile('[a-zA-Z][-_.:a-zA-Z0-9]*')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   139
sgmllib.special = re.compile('<!')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   140
sgmllib.charref = re.compile('&#(x?[0-9A-Fa-f]+)[^0-9A-Fa-f]')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   141
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   142
SUPPORTED_VERSIONS = {'': 'unknown',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   143
                      'rss090': 'RSS 0.90',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   144
                      'rss091n': 'RSS 0.91 (Netscape)',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   145
                      'rss091u': 'RSS 0.91 (Userland)',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   146
                      'rss092': 'RSS 0.92',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   147
                      'rss093': 'RSS 0.93',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   148
                      'rss094': 'RSS 0.94',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   149
                      'rss20': 'RSS 2.0',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   150
                      'rss10': 'RSS 1.0',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   151
                      'rss': 'RSS (unknown version)',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   152
                      'atom01': 'Atom 0.1',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   153
                      'atom02': 'Atom 0.2',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   154
                      'atom03': 'Atom 0.3',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   155
                      'atom10': 'Atom 1.0',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   156
                      'atom': 'Atom (unknown version)',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   157
                      'cdf': 'CDF',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   158
                      'hotrss': 'Hot RSS'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   159
                      }
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   160
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   161
try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   162
    UserDict = dict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   163
except NameError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   164
    # Python 2.1 does not have dict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   165
    from UserDict import UserDict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   166
    def dict(aList):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   167
        rc = {}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   168
        for k, v in aList:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   169
            rc[k] = v
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   170
        return rc
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   171
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   172
class FeedParserDict(UserDict):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   173
    keymap = {'channel': 'feed',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   174
              'items': 'entries',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   175
              'guid': 'id',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   176
              'date': 'updated',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   177
              'date_parsed': 'updated_parsed',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   178
              'description': ['subtitle', 'summary'],
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   179
              'url': ['href'],
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   180
              'modified': 'updated',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   181
              'modified_parsed': 'updated_parsed',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   182
              'issued': 'published',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   183
              'issued_parsed': 'published_parsed',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   184
              'copyright': 'rights',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   185
              'copyright_detail': 'rights_detail',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   186
              'tagline': 'subtitle',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   187
              'tagline_detail': 'subtitle_detail'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   188
    def __getitem__(self, key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   189
        if key == 'category':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   190
            return UserDict.__getitem__(self, 'tags')[0]['term']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   191
        if key == 'categories':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   192
            return [(tag['scheme'], tag['term']) for tag in UserDict.__getitem__(self, 'tags')]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   193
        realkey = self.keymap.get(key, key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   194
        if type(realkey) == types.ListType:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   195
            for k in realkey:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   196
                if UserDict.has_key(self, k):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   197
                    return UserDict.__getitem__(self, k)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   198
        if UserDict.has_key(self, key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   199
            return UserDict.__getitem__(self, key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   200
        return UserDict.__getitem__(self, realkey)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   201
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   202
    def __setitem__(self, key, value):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   203
        for k in self.keymap.keys():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   204
            if key == k:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   205
                key = self.keymap[k]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   206
                if type(key) == types.ListType:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   207
                    key = key[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   208
        return UserDict.__setitem__(self, key, value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   209
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   210
    def get(self, key, default=None):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   211
        if self.has_key(key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   212
            return self[key]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   213
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   214
            return default
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   215
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   216
    def setdefault(self, key, value):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   217
        if not self.has_key(key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   218
            self[key] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   219
        return self[key]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   220
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   221
    def has_key(self, key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   222
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   223
            return hasattr(self, key) or UserDict.has_key(self, key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   224
        except AttributeError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   225
            return False
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   226
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   227
    def __getattr__(self, key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   228
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   229
            return self.__dict__[key]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   230
        except KeyError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   231
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   232
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   233
            assert not key.startswith('_')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   234
            return self.__getitem__(key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   235
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   236
            raise AttributeError, "object has no attribute '%s'" % key
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   237
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   238
    def __setattr__(self, key, value):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   239
        if key.startswith('_') or key == 'data':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   240
            self.__dict__[key] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   241
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   242
            return self.__setitem__(key, value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   243
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   244
    def __contains__(self, key):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   245
        return self.has_key(key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   246
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   247
def zopeCompatibilityHack():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   248
    global FeedParserDict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   249
    del FeedParserDict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   250
    def FeedParserDict(aDict=None):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   251
        rc = {}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   252
        if aDict:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   253
            rc.update(aDict)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   254
        return rc
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   255
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   256
_ebcdic_to_ascii_map = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   257
def _ebcdic_to_ascii(s):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   258
    global _ebcdic_to_ascii_map
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   259
    if not _ebcdic_to_ascii_map:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   260
        emap = (
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   261
            0,1,2,3,156,9,134,127,151,141,142,11,12,13,14,15,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   262
            16,17,18,19,157,133,8,135,24,25,146,143,28,29,30,31,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   263
            128,129,130,131,132,10,23,27,136,137,138,139,140,5,6,7,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   264
            144,145,22,147,148,149,150,4,152,153,154,155,20,21,158,26,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   265
            32,160,161,162,163,164,165,166,167,168,91,46,60,40,43,33,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   266
            38,169,170,171,172,173,174,175,176,177,93,36,42,41,59,94,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   267
            45,47,178,179,180,181,182,183,184,185,124,44,37,95,62,63,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   268
            186,187,188,189,190,191,192,193,194,96,58,35,64,39,61,34,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   269
            195,97,98,99,100,101,102,103,104,105,196,197,198,199,200,201,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   270
            202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   271
            209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   272
            216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   273
            123,65,66,67,68,69,70,71,72,73,232,233,234,235,236,237,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   274
            125,74,75,76,77,78,79,80,81,82,238,239,240,241,242,243,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   275
            92,159,83,84,85,86,87,88,89,90,244,245,246,247,248,249,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   276
            48,49,50,51,52,53,54,55,56,57,250,251,252,253,254,255
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   277
            )
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   278
        import string
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   279
        _ebcdic_to_ascii_map = string.maketrans( \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   280
            ''.join(map(chr, range(256))), ''.join(map(chr, emap)))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   281
    return s.translate(_ebcdic_to_ascii_map)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   282
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   283
_urifixer = re.compile('^([A-Za-z][A-Za-z0-9+-.]*://)(/*)(.*?)')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   284
def _urljoin(base, uri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   285
    uri = _urifixer.sub(r'\1\3', uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   286
    return urlparse.urljoin(base, uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   287
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   288
class _FeedParserMixin:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   289
    namespaces = {'': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   290
                  'http://backend.userland.com/rss': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   291
                  'http://blogs.law.harvard.edu/tech/rss': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   292
                  'http://purl.org/rss/1.0/': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   293
                  'http://my.netscape.com/rdf/simple/0.9/': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   294
                  'http://example.com/newformat#': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   295
                  'http://example.com/necho': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   296
                  'http://purl.org/echo/': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   297
                  'uri/of/echo/namespace#': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   298
                  'http://purl.org/pie/': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   299
                  'http://purl.org/atom/ns#': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   300
                  'http://www.w3.org/2005/Atom': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   301
                  'http://purl.org/rss/1.0/modules/rss091#': '',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   302
                  
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   303
                  'http://webns.net/mvcb/':                               'admin',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   304
                  'http://purl.org/rss/1.0/modules/aggregation/':         'ag',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   305
                  'http://purl.org/rss/1.0/modules/annotate/':            'annotate',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   306
                  'http://media.tangent.org/rss/1.0/':                    'audio',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   307
                  'http://backend.userland.com/blogChannelModule':        'blogChannel',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   308
                  'http://web.resource.org/cc/':                          'cc',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   309
                  'http://backend.userland.com/creativeCommonsRssModule': 'creativeCommons',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   310
                  'http://purl.org/rss/1.0/modules/company':              'co',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   311
                  'http://purl.org/rss/1.0/modules/content/':             'content',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   312
                  'http://my.theinfo.org/changed/1.0/rss/':               'cp',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   313
                  'http://purl.org/dc/elements/1.1/':                     'dc',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   314
                  'http://purl.org/dc/terms/':                            'dcterms',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   315
                  'http://purl.org/rss/1.0/modules/email/':               'email',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   316
                  'http://purl.org/rss/1.0/modules/event/':               'ev',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   317
                  'http://rssnamespace.org/feedburner/ext/1.0':           'feedburner',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   318
                  'http://freshmeat.net/rss/fm/':                         'fm',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   319
                  'http://xmlns.com/foaf/0.1/':                           'foaf',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   320
                  'http://www.w3.org/2003/01/geo/wgs84_pos#':             'geo',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   321
                  'http://postneo.com/icbm/':                             'icbm',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   322
                  'http://purl.org/rss/1.0/modules/image/':               'image',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   323
                  'http://www.itunes.com/DTDs/PodCast-1.0.dtd':           'itunes',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   324
                  'http://example.com/DTDs/PodCast-1.0.dtd':              'itunes',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   325
                  'http://purl.org/rss/1.0/modules/link/':                'l',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   326
                  'http://search.yahoo.com/mrss':                         'media',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   327
                  'http://madskills.com/public/xml/rss/module/pingback/': 'pingback',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   328
                  'http://prismstandard.org/namespaces/1.2/basic/':       'prism',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   329
                  'http://www.w3.org/1999/02/22-rdf-syntax-ns#':          'rdf',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   330
                  'http://www.w3.org/2000/01/rdf-schema#':                'rdfs',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   331
                  'http://purl.org/rss/1.0/modules/reference/':           'ref',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   332
                  'http://purl.org/rss/1.0/modules/richequiv/':           'reqv',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   333
                  'http://purl.org/rss/1.0/modules/search/':              'search',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   334
                  'http://purl.org/rss/1.0/modules/slash/':               'slash',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   335
                  'http://schemas.xmlsoap.org/soap/envelope/':            'soap',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   336
                  'http://purl.org/rss/1.0/modules/servicestatus/':       'ss',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   337
                  'http://hacks.benhammersley.com/rss/streaming/':        'str',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   338
                  'http://purl.org/rss/1.0/modules/subscription/':        'sub',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   339
                  'http://purl.org/rss/1.0/modules/syndication/':         'sy',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   340
                  'http://purl.org/rss/1.0/modules/taxonomy/':            'taxo',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   341
                  'http://purl.org/rss/1.0/modules/threading/':           'thr',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   342
                  'http://purl.org/rss/1.0/modules/textinput/':           'ti',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   343
                  'http://madskills.com/public/xml/rss/module/trackback/':'trackback',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   344
                  'http://wellformedweb.org/commentAPI/':                 'wfw',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   345
                  'http://purl.org/rss/1.0/modules/wiki/':                'wiki',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   346
                  'http://www.w3.org/1999/xhtml':                         'xhtml',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   347
                  'http://www.w3.org/XML/1998/namespace':                 'xml',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   348
                  'http://schemas.pocketsoap.com/rss/myDescModule/':      'szf'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   349
}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   350
    _matchnamespaces = {}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   351
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   352
    can_be_relative_uri = ['link', 'id', 'wfw_comment', 'wfw_commentrss', 'docs', 'url', 'href', 'comments', 'license', 'icon', 'logo']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   353
    can_contain_relative_uris = ['content', 'title', 'summary', 'info', 'tagline', 'subtitle', 'copyright', 'rights', 'description']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   354
    can_contain_dangerous_markup = ['content', 'title', 'summary', 'info', 'tagline', 'subtitle', 'copyright', 'rights', 'description']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   355
    html_types = ['text/html', 'application/xhtml+xml']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   356
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   357
    def __init__(self, baseuri=None, baselang=None, encoding='utf-8'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   358
        if _debug: sys.stderr.write('initializing FeedParser\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   359
        if not self._matchnamespaces:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   360
            for k, v in self.namespaces.items():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   361
                self._matchnamespaces[k.lower()] = v
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   362
        self.feeddata = FeedParserDict() # feed-level data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   363
        self.encoding = encoding # character encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   364
        self.entries = [] # list of entry-level data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   365
        self.version = '' # feed type/version, see SUPPORTED_VERSIONS
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   366
        self.namespacesInUse = {} # dictionary of namespaces defined by the feed
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   367
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   368
        # the following are used internally to track state;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   369
        # this is really out of control and should be refactored
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   370
        self.infeed = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   371
        self.inentry = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   372
        self.incontent = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   373
        self.intextinput = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   374
        self.inimage = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   375
        self.inauthor = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   376
        self.incontributor = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   377
        self.inpublisher = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   378
        self.insource = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   379
        self.sourcedata = FeedParserDict()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   380
        self.contentparams = FeedParserDict()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   381
        self._summaryKey = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   382
        self.namespacemap = {}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   383
        self.elementstack = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   384
        self.basestack = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   385
        self.langstack = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   386
        self.baseuri = baseuri or ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   387
        self.lang = baselang or None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   388
        if baselang:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   389
            self.feeddata['language'] = baselang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   390
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   391
    def unknown_starttag(self, tag, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   392
        if _debug: sys.stderr.write('start %s with %s\n' % (tag, attrs))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   393
        # normalize attrs
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   394
        attrs = [(k.lower(), v) for k, v in attrs]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   395
        attrs = [(k, k in ('rel', 'type') and v.lower() or v) for k, v in attrs]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   396
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   397
        # track xml:base and xml:lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   398
        attrsD = dict(attrs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   399
        baseuri = attrsD.get('xml:base', attrsD.get('base')) or self.baseuri
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   400
        self.baseuri = _urljoin(self.baseuri, baseuri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   401
        lang = attrsD.get('xml:lang', attrsD.get('lang'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   402
        if lang == '':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   403
            # xml:lang could be explicitly set to '', we need to capture that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   404
            lang = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   405
        elif lang is None:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   406
            # if no xml:lang is specified, use parent lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   407
            lang = self.lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   408
        if lang:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   409
            if tag in ('feed', 'rss', 'rdf:RDF'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   410
                self.feeddata['language'] = lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   411
        self.lang = lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   412
        self.basestack.append(self.baseuri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   413
        self.langstack.append(lang)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   414
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   415
        # track namespaces
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   416
        for prefix, uri in attrs:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   417
            if prefix.startswith('xmlns:'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   418
                self.trackNamespace(prefix[6:], uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   419
            elif prefix == 'xmlns':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   420
                self.trackNamespace(None, uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   421
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   422
        # track inline content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   423
        if self.incontent and self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   424
            # element declared itself as escaped markup, but it isn't really
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   425
            self.contentparams['type'] = 'application/xhtml+xml'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   426
        if self.incontent and self.contentparams.get('type') == 'application/xhtml+xml':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   427
            # Note: probably shouldn't simply recreate localname here, but
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   428
            # our namespace handling isn't actually 100% correct in cases where
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   429
            # the feed redefines the default namespace (which is actually
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   430
            # the usual case for inline content, thanks Sam), so here we
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   431
            # cheat and just reconstruct the element based on localname
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   432
            # because that compensates for the bugs in our namespace handling.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   433
            # This will horribly munge inline content with non-empty qnames,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   434
            # but nobody actually does that, so I'm not fixing it.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   435
            tag = tag.split(':')[-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   436
            return self.handle_data('<%s%s>' % (tag, ''.join([' %s="%s"' % t for t in attrs])), escape=0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   437
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   438
        # match namespaces
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   439
        if tag.find(':') <> -1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   440
            prefix, suffix = tag.split(':', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   441
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   442
            prefix, suffix = '', tag
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   443
        prefix = self.namespacemap.get(prefix, prefix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   444
        if prefix:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   445
            prefix = prefix + '_'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   446
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   447
        # special hack for better tracking of empty textinput/image elements in illformed feeds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   448
        if (not prefix) and tag not in ('title', 'link', 'description', 'name'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   449
            self.intextinput = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   450
        if (not prefix) and tag not in ('title', 'link', 'description', 'url', 'href', 'width', 'height'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   451
            self.inimage = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   452
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   453
        # call special handler (if defined) or default handler
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   454
        methodname = '_start_' + prefix + suffix
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   455
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   456
            method = getattr(self, methodname)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   457
            return method(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   458
        except AttributeError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   459
            return self.push(prefix + suffix, 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   460
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   461
    def unknown_endtag(self, tag):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   462
        if _debug: sys.stderr.write('end %s\n' % tag)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   463
        # match namespaces
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   464
        if tag.find(':') <> -1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   465
            prefix, suffix = tag.split(':', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   466
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   467
            prefix, suffix = '', tag
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   468
        prefix = self.namespacemap.get(prefix, prefix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   469
        if prefix:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   470
            prefix = prefix + '_'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   471
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   472
        # call special handler (if defined) or default handler
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   473
        methodname = '_end_' + prefix + suffix
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   474
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   475
            method = getattr(self, methodname)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   476
            method()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   477
        except AttributeError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   478
            self.pop(prefix + suffix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   479
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   480
        # track inline content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   481
        if self.incontent and self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   482
            # element declared itself as escaped markup, but it isn't really
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   483
            self.contentparams['type'] = 'application/xhtml+xml'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   484
        if self.incontent and self.contentparams.get('type') == 'application/xhtml+xml':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   485
            tag = tag.split(':')[-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   486
            self.handle_data('</%s>' % tag, escape=0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   487
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   488
        # track xml:base and xml:lang going out of scope
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   489
        if self.basestack:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   490
            self.basestack.pop()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   491
            if self.basestack and self.basestack[-1]:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   492
                self.baseuri = self.basestack[-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   493
        if self.langstack:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   494
            self.langstack.pop()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   495
            if self.langstack: # and (self.langstack[-1] is not None):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   496
                self.lang = self.langstack[-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   497
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   498
    def handle_charref(self, ref):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   499
        # called for each character reference, e.g. for '&#160;', ref will be '160'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   500
        if not self.elementstack: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   501
        ref = ref.lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   502
        if ref in ('34', '38', '39', '60', '62', 'x22', 'x26', 'x27', 'x3c', 'x3e'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   503
            text = '&#%s;' % ref
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   504
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   505
            if ref[0] == 'x':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   506
                c = int(ref[1:], 16)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   507
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   508
                c = int(ref)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   509
            text = unichr(c).encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   510
        self.elementstack[-1][2].append(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   511
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   512
    def handle_entityref(self, ref):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   513
        # called for each entity reference, e.g. for '&copy;', ref will be 'copy'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   514
        if not self.elementstack: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   515
        if _debug: sys.stderr.write('entering handle_entityref with %s\n' % ref)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   516
        if ref in ('lt', 'gt', 'quot', 'amp', 'apos'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   517
            text = '&%s;' % ref
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   518
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   519
            # entity resolution graciously donated by Aaron Swartz
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   520
            def name2cp(k):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   521
                import htmlentitydefs
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   522
                if hasattr(htmlentitydefs, 'name2codepoint'): # requires Python 2.3
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   523
                    return htmlentitydefs.name2codepoint[k]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   524
                k = htmlentitydefs.entitydefs[k]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   525
                if k.startswith('&#') and k.endswith(';'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   526
                    return int(k[2:-1]) # not in latin-1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   527
                return ord(k)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   528
            try: name2cp(ref)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   529
            except KeyError: text = '&%s;' % ref
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   530
            else: text = unichr(name2cp(ref)).encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   531
        self.elementstack[-1][2].append(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   532
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   533
    def handle_data(self, text, escape=1):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   534
        # called for each block of plain text, i.e. outside of any tag and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   535
        # not containing any character or entity references
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   536
        if not self.elementstack: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   537
        if escape and self.contentparams.get('type') == 'application/xhtml+xml':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   538
            text = _xmlescape(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   539
        self.elementstack[-1][2].append(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   540
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   541
    def handle_comment(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   542
        # called for each comment, e.g. <!-- insert message here -->
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   543
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   544
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   545
    def handle_pi(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   546
        # called for each processing instruction, e.g. <?instruction>
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   547
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   548
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   549
    def handle_decl(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   550
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   551
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   552
    def parse_declaration(self, i):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   553
        # override internal declaration handler to handle CDATA blocks
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   554
        if _debug: sys.stderr.write('entering parse_declaration\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   555
        if self.rawdata[i:i+9] == '<![CDATA[':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   556
            k = self.rawdata.find(']]>', i)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   557
            if k == -1: k = len(self.rawdata)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   558
            self.handle_data(_xmlescape(self.rawdata[i+9:k]), 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   559
            return k+3
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   560
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   561
            k = self.rawdata.find('>', i)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   562
            return k+1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   563
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   564
    def mapContentType(self, contentType):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   565
        contentType = contentType.lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   566
        if contentType == 'text':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   567
            contentType = 'text/plain'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   568
        elif contentType == 'html':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   569
            contentType = 'text/html'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   570
        elif contentType == 'xhtml':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   571
            contentType = 'application/xhtml+xml'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   572
        return contentType
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   573
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   574
    def trackNamespace(self, prefix, uri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   575
        loweruri = uri.lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   576
        if (prefix, loweruri) == (None, 'http://my.netscape.com/rdf/simple/0.9/') and not self.version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   577
            self.version = 'rss090'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   578
        if loweruri == 'http://purl.org/rss/1.0/' and not self.version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   579
            self.version = 'rss10'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   580
        if loweruri == 'http://www.w3.org/2005/atom' and not self.version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   581
            self.version = 'atom10'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   582
        if loweruri.find('backend.userland.com/rss') <> -1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   583
            # match any backend.userland.com namespace
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   584
            uri = 'http://backend.userland.com/rss'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   585
            loweruri = uri
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   586
        if self._matchnamespaces.has_key(loweruri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   587
            self.namespacemap[prefix] = self._matchnamespaces[loweruri]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   588
            self.namespacesInUse[self._matchnamespaces[loweruri]] = uri
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   589
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   590
            self.namespacesInUse[prefix or ''] = uri
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   591
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   592
    def resolveURI(self, uri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   593
        return _urljoin(self.baseuri or '', uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   594
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   595
    def decodeEntities(self, element, data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   596
        return data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   597
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   598
    def push(self, element, expectingText):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   599
        self.elementstack.append([element, expectingText, []])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   600
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   601
    def pop(self, element, stripWhitespace=1):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   602
        if not self.elementstack: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   603
        if self.elementstack[-1][0] != element: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   604
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   605
        element, expectingText, pieces = self.elementstack.pop()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   606
        output = ''.join(pieces)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   607
        if stripWhitespace:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   608
            output = output.strip()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   609
        if not expectingText: return output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   610
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   611
        # decode base64 content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   612
        if base64 and self.contentparams.get('base64', 0):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   613
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   614
                output = base64.decodestring(output)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   615
            except binascii.Error:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   616
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   617
            except binascii.Incomplete:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   618
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   619
                
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   620
        # resolve relative URIs
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   621
        if (element in self.can_be_relative_uri) and output:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   622
            output = self.resolveURI(output)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   623
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   624
        # decode entities within embedded markup
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   625
        if not self.contentparams.get('base64', 0):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   626
            output = self.decodeEntities(element, output)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   627
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   628
        # remove temporary cruft from contentparams
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   629
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   630
            del self.contentparams['mode']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   631
        except KeyError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   632
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   633
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   634
            del self.contentparams['base64']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   635
        except KeyError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   636
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   637
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   638
        # resolve relative URIs within embedded markup
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   639
        if self.mapContentType(self.contentparams.get('type', 'text/html')) in self.html_types:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   640
            if element in self.can_contain_relative_uris:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   641
                output = _resolveRelativeURIs(output, self.baseuri, self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   642
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   643
        # sanitize embedded markup
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   644
        if self.mapContentType(self.contentparams.get('type', 'text/html')) in self.html_types:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   645
            if element in self.can_contain_dangerous_markup:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   646
                output = _sanitizeHTML(output, self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   647
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   648
        if self.encoding and type(output) != type(u''):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   649
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   650
                output = unicode(output, self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   651
            except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   652
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   653
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   654
        # categories/tags/keywords/whatever are handled in _end_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   655
        if element == 'category':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   656
            return output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   657
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   658
        # store output in appropriate place(s)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   659
        if self.inentry and not self.insource:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   660
            if element == 'content':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   661
                self.entries[-1].setdefault(element, [])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   662
                contentparams = copy.deepcopy(self.contentparams)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   663
                contentparams['value'] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   664
                self.entries[-1][element].append(contentparams)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   665
            elif element == 'link':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   666
                self.entries[-1][element] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   667
                if output:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   668
                    self.entries[-1]['links'][-1]['href'] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   669
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   670
                if element == 'description':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   671
                    element = 'summary'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   672
                self.entries[-1][element] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   673
                if self.incontent:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   674
                    contentparams = copy.deepcopy(self.contentparams)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   675
                    contentparams['value'] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   676
                    self.entries[-1][element + '_detail'] = contentparams
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   677
        elif (self.infeed or self.insource) and (not self.intextinput) and (not self.inimage):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   678
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   679
            if element == 'description':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   680
                element = 'subtitle'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   681
            context[element] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   682
            if element == 'link':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   683
                context['links'][-1]['href'] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   684
            elif self.incontent:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   685
                contentparams = copy.deepcopy(self.contentparams)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   686
                contentparams['value'] = output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   687
                context[element + '_detail'] = contentparams
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   688
        return output
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   689
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   690
    def pushContent(self, tag, attrsD, defaultContentType, expectingText):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   691
        self.incontent += 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   692
        self.contentparams = FeedParserDict({
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   693
            'type': self.mapContentType(attrsD.get('type', defaultContentType)),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   694
            'language': self.lang,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   695
            'base': self.baseuri})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   696
        self.contentparams['base64'] = self._isBase64(attrsD, self.contentparams)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   697
        self.push(tag, expectingText)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   698
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   699
    def popContent(self, tag):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   700
        value = self.pop(tag)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   701
        self.incontent -= 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   702
        self.contentparams.clear()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   703
        return value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   704
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   705
    def _mapToStandardPrefix(self, name):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   706
        colonpos = name.find(':')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   707
        if colonpos <> -1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   708
            prefix = name[:colonpos]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   709
            suffix = name[colonpos+1:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   710
            prefix = self.namespacemap.get(prefix, prefix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   711
            name = prefix + ':' + suffix
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   712
        return name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   713
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   714
    def _getAttribute(self, attrsD, name):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   715
        return attrsD.get(self._mapToStandardPrefix(name))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   716
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   717
    def _isBase64(self, attrsD, contentparams):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   718
        if attrsD.get('mode', '') == 'base64':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   719
            return 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   720
        if self.contentparams['type'].startswith('text/'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   721
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   722
        if self.contentparams['type'].endswith('+xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   723
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   724
        if self.contentparams['type'].endswith('/xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   725
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   726
        return 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   727
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   728
    def _itsAnHrefDamnIt(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   729
        href = attrsD.get('url', attrsD.get('uri', attrsD.get('href', None)))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   730
        if href:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   731
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   732
                del attrsD['url']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   733
            except KeyError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   734
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   735
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   736
                del attrsD['uri']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   737
            except KeyError:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   738
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   739
            attrsD['href'] = href
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   740
        return attrsD
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   741
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   742
    def _save(self, key, value):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   743
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   744
        context.setdefault(key, value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   745
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   746
    def _start_rss(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   747
        versionmap = {'0.91': 'rss091u',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   748
                      '0.92': 'rss092',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   749
                      '0.93': 'rss093',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   750
                      '0.94': 'rss094'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   751
        if not self.version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   752
            attr_version = attrsD.get('version', '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   753
            version = versionmap.get(attr_version)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   754
            if version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   755
                self.version = version
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   756
            elif attr_version.startswith('2.'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   757
                self.version = 'rss20'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   758
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   759
                self.version = 'rss'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   760
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   761
    def _start_dlhottitles(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   762
        self.version = 'hotrss'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   763
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   764
    def _start_channel(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   765
        self.infeed = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   766
        self._cdf_common(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   767
    _start_feedinfo = _start_channel
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   768
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   769
    def _cdf_common(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   770
        if attrsD.has_key('lastmod'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   771
            self._start_modified({})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   772
            self.elementstack[-1][-1] = attrsD['lastmod']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   773
            self._end_modified()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   774
        if attrsD.has_key('href'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   775
            self._start_link({})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   776
            self.elementstack[-1][-1] = attrsD['href']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   777
            self._end_link()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   778
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   779
    def _start_feed(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   780
        self.infeed = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   781
        versionmap = {'0.1': 'atom01',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   782
                      '0.2': 'atom02',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   783
                      '0.3': 'atom03'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   784
        if not self.version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   785
            attr_version = attrsD.get('version')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   786
            version = versionmap.get(attr_version)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   787
            if version:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   788
                self.version = version
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   789
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   790
                self.version = 'atom'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   791
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   792
    def _end_channel(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   793
        self.infeed = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   794
    _end_feed = _end_channel
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   795
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   796
    def _start_image(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   797
        self.inimage = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   798
        self.push('image', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   799
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   800
        context.setdefault('image', FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   801
            
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   802
    def _end_image(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   803
        self.pop('image')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   804
        self.inimage = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   805
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   806
    def _start_textinput(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   807
        self.intextinput = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   808
        self.push('textinput', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   809
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   810
        context.setdefault('textinput', FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   811
    _start_textInput = _start_textinput
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   812
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   813
    def _end_textinput(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   814
        self.pop('textinput')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   815
        self.intextinput = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   816
    _end_textInput = _end_textinput
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   817
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   818
    def _start_author(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   819
        self.inauthor = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   820
        self.push('author', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   821
    _start_managingeditor = _start_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   822
    _start_dc_author = _start_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   823
    _start_dc_creator = _start_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   824
    _start_itunes_author = _start_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   825
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   826
    def _end_author(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   827
        self.pop('author')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   828
        self.inauthor = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   829
        self._sync_author_detail()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   830
    _end_managingeditor = _end_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   831
    _end_dc_author = _end_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   832
    _end_dc_creator = _end_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   833
    _end_itunes_author = _end_author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   834
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   835
    def _start_itunes_owner(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   836
        self.inpublisher = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   837
        self.push('publisher', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   838
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   839
    def _end_itunes_owner(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   840
        self.pop('publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   841
        self.inpublisher = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   842
        self._sync_author_detail('publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   843
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   844
    def _start_contributor(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   845
        self.incontributor = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   846
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   847
        context.setdefault('contributors', [])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   848
        context['contributors'].append(FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   849
        self.push('contributor', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   850
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   851
    def _end_contributor(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   852
        self.pop('contributor')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   853
        self.incontributor = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   854
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   855
    def _start_dc_contributor(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   856
        self.incontributor = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   857
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   858
        context.setdefault('contributors', [])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   859
        context['contributors'].append(FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   860
        self.push('name', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   861
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   862
    def _end_dc_contributor(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   863
        self._end_name()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   864
        self.incontributor = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   865
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   866
    def _start_name(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   867
        self.push('name', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   868
    _start_itunes_name = _start_name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   869
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   870
    def _end_name(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   871
        value = self.pop('name')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   872
        if self.inpublisher:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   873
            self._save_author('name', value, 'publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   874
        elif self.inauthor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   875
            self._save_author('name', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   876
        elif self.incontributor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   877
            self._save_contributor('name', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   878
        elif self.intextinput:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   879
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   880
            context['textinput']['name'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   881
    _end_itunes_name = _end_name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   882
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   883
    def _start_width(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   884
        self.push('width', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   885
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   886
    def _end_width(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   887
        value = self.pop('width')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   888
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   889
            value = int(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   890
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   891
            value = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   892
        if self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   893
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   894
            context['image']['width'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   895
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   896
    def _start_height(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   897
        self.push('height', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   898
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   899
    def _end_height(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   900
        value = self.pop('height')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   901
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   902
            value = int(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   903
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   904
            value = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   905
        if self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   906
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   907
            context['image']['height'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   908
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   909
    def _start_url(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   910
        self.push('href', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   911
    _start_homepage = _start_url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   912
    _start_uri = _start_url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   913
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   914
    def _end_url(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   915
        value = self.pop('href')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   916
        if self.inauthor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   917
            self._save_author('href', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   918
        elif self.incontributor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   919
            self._save_contributor('href', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   920
        elif self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   921
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   922
            context['image']['href'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   923
        elif self.intextinput:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   924
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   925
            context['textinput']['link'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   926
    _end_homepage = _end_url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   927
    _end_uri = _end_url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   928
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   929
    def _start_email(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   930
        self.push('email', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   931
    _start_itunes_email = _start_email
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   932
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   933
    def _end_email(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   934
        value = self.pop('email')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   935
        if self.inpublisher:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   936
            self._save_author('email', value, 'publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   937
        elif self.inauthor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   938
            self._save_author('email', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   939
        elif self.incontributor:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   940
            self._save_contributor('email', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   941
    _end_itunes_email = _end_email
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   942
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   943
    def _getContext(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   944
        if self.insource:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   945
            context = self.sourcedata
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   946
        elif self.inentry:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   947
            context = self.entries[-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   948
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   949
            context = self.feeddata
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   950
        return context
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   951
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   952
    def _save_author(self, key, value, prefix='author'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   953
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   954
        context.setdefault(prefix + '_detail', FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   955
        context[prefix + '_detail'][key] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   956
        self._sync_author_detail()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   957
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   958
    def _save_contributor(self, key, value):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   959
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   960
        context.setdefault('contributors', [FeedParserDict()])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   961
        context['contributors'][-1][key] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   962
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   963
    def _sync_author_detail(self, key='author'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   964
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   965
        detail = context.get('%s_detail' % key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   966
        if detail:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   967
            name = detail.get('name')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   968
            email = detail.get('email')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   969
            if name and email:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   970
                context[key] = '%s (%s)' % (name, email)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   971
            elif name:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   972
                context[key] = name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   973
            elif email:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   974
                context[key] = email
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   975
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   976
            author = context.get(key)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   977
            if not author: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   978
            emailmatch = re.search(r'''(([a-zA-Z0-9\_\-\.\+]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?))''', author)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   979
            if not emailmatch: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   980
            email = emailmatch.group(0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   981
            # probably a better way to do the following, but it passes all the tests
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   982
            author = author.replace(email, '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   983
            author = author.replace('()', '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   984
            author = author.strip()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   985
            if author and (author[0] == '('):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   986
                author = author[1:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   987
            if author and (author[-1] == ')'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   988
                author = author[:-1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   989
            author = author.strip()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   990
            context.setdefault('%s_detail' % key, FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   991
            context['%s_detail' % key]['name'] = author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   992
            context['%s_detail' % key]['email'] = email
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   993
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   994
    def _start_subtitle(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   995
        self.pushContent('subtitle', attrsD, 'text/plain', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   996
    _start_tagline = _start_subtitle
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   997
    _start_itunes_subtitle = _start_subtitle
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   998
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   999
    def _end_subtitle(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1000
        self.popContent('subtitle')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1001
    _end_tagline = _end_subtitle
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1002
    _end_itunes_subtitle = _end_subtitle
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1003
            
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1004
    def _start_rights(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1005
        self.pushContent('rights', attrsD, 'text/plain', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1006
    _start_dc_rights = _start_rights
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1007
    _start_copyright = _start_rights
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1008
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1009
    def _end_rights(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1010
        self.popContent('rights')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1011
    _end_dc_rights = _end_rights
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1012
    _end_copyright = _end_rights
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1013
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1014
    def _start_item(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1015
        self.entries.append(FeedParserDict())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1016
        self.push('item', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1017
        self.inentry = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1018
        self.guidislink = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1019
        id = self._getAttribute(attrsD, 'rdf:about')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1020
        if id:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1021
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1022
            context['id'] = id
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1023
        self._cdf_common(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1024
    _start_entry = _start_item
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1025
    _start_product = _start_item
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1026
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1027
    def _end_item(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1028
        self.pop('item')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1029
        self.inentry = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1030
    _end_entry = _end_item
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1031
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1032
    def _start_dc_language(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1033
        self.push('language', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1034
    _start_language = _start_dc_language
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1035
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1036
    def _end_dc_language(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1037
        self.lang = self.pop('language')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1038
    _end_language = _end_dc_language
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1039
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1040
    def _start_dc_publisher(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1041
        self.push('publisher', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1042
    _start_webmaster = _start_dc_publisher
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1043
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1044
    def _end_dc_publisher(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1045
        self.pop('publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1046
        self._sync_author_detail('publisher')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1047
    _end_webmaster = _end_dc_publisher
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1048
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1049
    def _start_published(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1050
        self.push('published', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1051
    _start_dcterms_issued = _start_published
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1052
    _start_issued = _start_published
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1053
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1054
    def _end_published(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1055
        value = self.pop('published')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1056
        self._save('published_parsed', _parse_date(value))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1057
    _end_dcterms_issued = _end_published
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1058
    _end_issued = _end_published
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1059
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1060
    def _start_updated(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1061
        self.push('updated', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1062
    _start_modified = _start_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1063
    _start_dcterms_modified = _start_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1064
    _start_pubdate = _start_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1065
    _start_dc_date = _start_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1066
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1067
    def _end_updated(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1068
        value = self.pop('updated')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1069
        parsed_value = _parse_date(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1070
        self._save('updated_parsed', parsed_value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1071
    _end_modified = _end_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1072
    _end_dcterms_modified = _end_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1073
    _end_pubdate = _end_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1074
    _end_dc_date = _end_updated
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1075
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1076
    def _start_created(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1077
        self.push('created', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1078
    _start_dcterms_created = _start_created
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1079
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1080
    def _end_created(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1081
        value = self.pop('created')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1082
        self._save('created_parsed', _parse_date(value))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1083
    _end_dcterms_created = _end_created
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1084
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1085
    def _start_expirationdate(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1086
        self.push('expired', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1087
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1088
    def _end_expirationdate(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1089
        self._save('expired_parsed', _parse_date(self.pop('expired')))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1090
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1091
    def _start_cc_license(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1092
        self.push('license', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1093
        value = self._getAttribute(attrsD, 'rdf:resource')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1094
        if value:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1095
            self.elementstack[-1][2].append(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1096
        self.pop('license')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1097
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1098
    def _start_creativecommons_license(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1099
        self.push('license', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1100
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1101
    def _end_creativecommons_license(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1102
        self.pop('license')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1103
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1104
    def _addTag(self, term, scheme, label):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1105
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1106
        tags = context.setdefault('tags', [])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1107
        if (not term) and (not scheme) and (not label): return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1108
        value = FeedParserDict({'term': term, 'scheme': scheme, 'label': label})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1109
        if value not in tags:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1110
            tags.append(FeedParserDict({'term': term, 'scheme': scheme, 'label': label}))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1111
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1112
    def _start_category(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1113
        if _debug: sys.stderr.write('entering _start_category with %s\n' % repr(attrsD))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1114
        term = attrsD.get('term')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1115
        scheme = attrsD.get('scheme', attrsD.get('domain'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1116
        label = attrsD.get('label')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1117
        self._addTag(term, scheme, label)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1118
        self.push('category', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1119
    _start_dc_subject = _start_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1120
    _start_keywords = _start_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1121
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1122
    def _end_itunes_keywords(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1123
        for term in self.pop('itunes_keywords').split():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1124
            self._addTag(term, 'http://www.itunes.com/', None)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1125
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1126
    def _start_itunes_category(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1127
        self._addTag(attrsD.get('text'), 'http://www.itunes.com/', None)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1128
        self.push('category', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1129
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1130
    def _end_category(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1131
        value = self.pop('category')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1132
        if not value: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1133
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1134
        tags = context['tags']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1135
        if value and len(tags) and not tags[-1]['term']:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1136
            tags[-1]['term'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1137
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1138
            self._addTag(value, None, None)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1139
    _end_dc_subject = _end_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1140
    _end_keywords = _end_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1141
    _end_itunes_category = _end_category
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1142
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1143
    def _start_cloud(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1144
        self._getContext()['cloud'] = FeedParserDict(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1145
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1146
    def _start_link(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1147
        attrsD.setdefault('rel', 'alternate')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1148
        attrsD.setdefault('type', 'text/html')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1149
        attrsD = self._itsAnHrefDamnIt(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1150
        if attrsD.has_key('href'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1151
            attrsD['href'] = self.resolveURI(attrsD['href'])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1152
        expectingText = self.infeed or self.inentry or self.insource
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1153
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1154
        context.setdefault('links', [])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1155
        context['links'].append(FeedParserDict(attrsD))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1156
        if attrsD['rel'] == 'enclosure':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1157
            self._start_enclosure(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1158
        if attrsD.has_key('href'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1159
            expectingText = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1160
            if (attrsD.get('rel') == 'alternate') and (self.mapContentType(attrsD.get('type')) in self.html_types):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1161
                context['link'] = attrsD['href']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1162
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1163
            self.push('link', expectingText)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1164
    _start_producturl = _start_link
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1165
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1166
    def _end_link(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1167
        value = self.pop('link')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1168
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1169
        if self.intextinput:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1170
            context['textinput']['link'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1171
        if self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1172
            context['image']['link'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1173
    _end_producturl = _end_link
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1174
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1175
    def _start_guid(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1176
        self.guidislink = (attrsD.get('ispermalink', 'true') == 'true')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1177
        self.push('id', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1178
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1179
    def _end_guid(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1180
        value = self.pop('id')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1181
        self._save('guidislink', self.guidislink and not self._getContext().has_key('link'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1182
        if self.guidislink:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1183
            # guid acts as link, but only if 'ispermalink' is not present or is 'true',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1184
            # and only if the item doesn't already have a link element
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1185
            self._save('link', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1186
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1187
    def _start_title(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1188
        self.pushContent('title', attrsD, 'text/plain', self.infeed or self.inentry or self.insource)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1189
    _start_dc_title = _start_title
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1190
    _start_media_title = _start_title
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1191
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1192
    def _end_title(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1193
        value = self.popContent('title')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1194
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1195
        if self.intextinput:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1196
            context['textinput']['title'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1197
        elif self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1198
            context['image']['title'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1199
    _end_dc_title = _end_title
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1200
    _end_media_title = _end_title
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1201
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1202
    def _start_description(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1203
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1204
        if context.has_key('summary'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1205
            self._summaryKey = 'content'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1206
            self._start_content(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1207
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1208
            self.pushContent('description', attrsD, 'text/html', self.infeed or self.inentry or self.insource)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1209
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1210
    def _start_abstract(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1211
        self.pushContent('description', attrsD, 'text/plain', self.infeed or self.inentry or self.insource)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1212
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1213
    def _end_description(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1214
        if self._summaryKey == 'content':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1215
            self._end_content()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1216
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1217
            value = self.popContent('description')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1218
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1219
            if self.intextinput:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1220
                context['textinput']['description'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1221
            elif self.inimage:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1222
                context['image']['description'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1223
        self._summaryKey = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1224
    _end_abstract = _end_description
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1225
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1226
    def _start_info(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1227
        self.pushContent('info', attrsD, 'text/plain', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1228
    _start_feedburner_browserfriendly = _start_info
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1229
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1230
    def _end_info(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1231
        self.popContent('info')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1232
    _end_feedburner_browserfriendly = _end_info
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1233
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1234
    def _start_generator(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1235
        if attrsD:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1236
            attrsD = self._itsAnHrefDamnIt(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1237
            if attrsD.has_key('href'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1238
                attrsD['href'] = self.resolveURI(attrsD['href'])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1239
        self._getContext()['generator_detail'] = FeedParserDict(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1240
        self.push('generator', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1241
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1242
    def _end_generator(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1243
        value = self.pop('generator')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1244
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1245
        if context.has_key('generator_detail'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1246
            context['generator_detail']['name'] = value
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1247
            
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1248
    def _start_admin_generatoragent(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1249
        self.push('generator', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1250
        value = self._getAttribute(attrsD, 'rdf:resource')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1251
        if value:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1252
            self.elementstack[-1][2].append(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1253
        self.pop('generator')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1254
        self._getContext()['generator_detail'] = FeedParserDict({'href': value})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1255
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1256
    def _start_admin_errorreportsto(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1257
        self.push('errorreportsto', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1258
        value = self._getAttribute(attrsD, 'rdf:resource')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1259
        if value:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1260
            self.elementstack[-1][2].append(value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1261
        self.pop('errorreportsto')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1262
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1263
    def _start_summary(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1264
        context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1265
        if context.has_key('summary'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1266
            self._summaryKey = 'content'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1267
            self._start_content(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1268
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1269
            self._summaryKey = 'summary'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1270
            self.pushContent(self._summaryKey, attrsD, 'text/plain', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1271
    _start_itunes_summary = _start_summary
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1272
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1273
    def _end_summary(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1274
        if self._summaryKey == 'content':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1275
            self._end_content()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1276
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1277
            self.popContent(self._summaryKey or 'summary')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1278
        self._summaryKey = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1279
    _end_itunes_summary = _end_summary
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1280
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1281
    def _start_enclosure(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1282
        attrsD = self._itsAnHrefDamnIt(attrsD)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1283
        self._getContext().setdefault('enclosures', []).append(FeedParserDict(attrsD))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1284
        href = attrsD.get('href')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1285
        if href:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1286
            context = self._getContext()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1287
            if not context.get('id'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1288
                context['id'] = href
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1289
            
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1290
    def _start_source(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1291
        self.insource = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1292
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1293
    def _end_source(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1294
        self.insource = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1295
        self._getContext()['source'] = copy.deepcopy(self.sourcedata)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1296
        self.sourcedata.clear()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1297
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1298
    def _start_content(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1299
        self.pushContent('content', attrsD, 'text/plain', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1300
        src = attrsD.get('src')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1301
        if src:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1302
            self.contentparams['src'] = src
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1303
        self.push('content', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1304
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1305
    def _start_prodlink(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1306
        self.pushContent('content', attrsD, 'text/html', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1307
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1308
    def _start_body(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1309
        self.pushContent('content', attrsD, 'application/xhtml+xml', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1310
    _start_xhtml_body = _start_body
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1311
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1312
    def _start_content_encoded(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1313
        self.pushContent('content', attrsD, 'text/html', 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1314
    _start_fullitem = _start_content_encoded
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1315
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1316
    def _end_content(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1317
        copyToDescription = self.mapContentType(self.contentparams.get('type')) in (['text/plain'] + self.html_types)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1318
        value = self.popContent('content')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1319
        if copyToDescription:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1320
            self._save('description', value)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1321
    _end_body = _end_content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1322
    _end_xhtml_body = _end_content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1323
    _end_content_encoded = _end_content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1324
    _end_fullitem = _end_content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1325
    _end_prodlink = _end_content
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1326
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1327
    def _start_itunes_image(self, attrsD):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1328
        self.push('itunes_image', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1329
        self._getContext()['image'] = FeedParserDict({'href': attrsD.get('href')})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1330
    _start_itunes_link = _start_itunes_image
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1331
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1332
    def _end_itunes_block(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1333
        value = self.pop('itunes_block', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1334
        self._getContext()['itunes_block'] = (value == 'yes') and 1 or 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1335
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1336
    def _end_itunes_explicit(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1337
        value = self.pop('itunes_explicit', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1338
        self._getContext()['itunes_explicit'] = (value == 'yes') and 1 or 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1339
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1340
if _XML_AVAILABLE:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1341
    class _StrictFeedParser(_FeedParserMixin, xml.sax.handler.ContentHandler):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1342
        def __init__(self, baseuri, baselang, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1343
            if _debug: sys.stderr.write('trying StrictFeedParser\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1344
            xml.sax.handler.ContentHandler.__init__(self)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1345
            _FeedParserMixin.__init__(self, baseuri, baselang, encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1346
            self.bozo = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1347
            self.exc = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1348
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1349
        def startPrefixMapping(self, prefix, uri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1350
            self.trackNamespace(prefix, uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1351
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1352
        def startElementNS(self, name, qname, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1353
            namespace, localname = name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1354
            lowernamespace = str(namespace or '').lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1355
            if lowernamespace.find('backend.userland.com/rss') <> -1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1356
                # match any backend.userland.com namespace
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1357
                namespace = 'http://backend.userland.com/rss'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1358
                lowernamespace = namespace
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1359
            if qname and qname.find(':') > 0:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1360
                givenprefix = qname.split(':')[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1361
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1362
                givenprefix = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1363
            prefix = self._matchnamespaces.get(lowernamespace, givenprefix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1364
            if givenprefix and (prefix == None or (prefix == '' and lowernamespace == '')) and not self.namespacesInUse.has_key(givenprefix):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1365
                    raise UndeclaredNamespace, "'%s' is not associated with a namespace" % givenprefix
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1366
            if prefix:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1367
                localname = prefix + ':' + localname
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1368
            localname = str(localname).lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1369
            if _debug: sys.stderr.write('startElementNS: qname = %s, namespace = %s, givenprefix = %s, prefix = %s, attrs = %s, localname = %s\n' % (qname, namespace, givenprefix, prefix, attrs.items(), localname))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1370
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1371
            # qname implementation is horribly broken in Python 2.1 (it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1372
            # doesn't report any), and slightly broken in Python 2.2 (it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1373
            # doesn't report the xml: namespace). So we match up namespaces
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1374
            # with a known list first, and then possibly override them with
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1375
            # the qnames the SAX parser gives us (if indeed it gives us any
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1376
            # at all).  Thanks to MatejC for helping me test this and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1377
            # tirelessly telling me that it didn't work yet.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1378
            attrsD = {}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1379
            for (namespace, attrlocalname), attrvalue in attrs._attrs.items():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1380
                lowernamespace = (namespace or '').lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1381
                prefix = self._matchnamespaces.get(lowernamespace, '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1382
                if prefix:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1383
                    attrlocalname = prefix + ':' + attrlocalname
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1384
                attrsD[str(attrlocalname).lower()] = attrvalue
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1385
            for qname in attrs.getQNames():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1386
                attrsD[str(qname).lower()] = attrs.getValueByQName(qname)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1387
            self.unknown_starttag(localname, attrsD.items())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1388
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1389
        def characters(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1390
            self.handle_data(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1391
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1392
        def endElementNS(self, name, qname):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1393
            namespace, localname = name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1394
            lowernamespace = str(namespace or '').lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1395
            if qname and qname.find(':') > 0:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1396
                givenprefix = qname.split(':')[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1397
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1398
                givenprefix = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1399
            prefix = self._matchnamespaces.get(lowernamespace, givenprefix)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1400
            if prefix:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1401
                localname = prefix + ':' + localname
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1402
            localname = str(localname).lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1403
            self.unknown_endtag(localname)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1404
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1405
        def error(self, exc):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1406
            self.bozo = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1407
            self.exc = exc
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1408
            
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1409
        def fatalError(self, exc):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1410
            self.error(exc)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1411
            raise exc
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1412
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1413
class _BaseHTMLProcessor(sgmllib.SGMLParser):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1414
    elements_no_end_tag = ['area', 'base', 'basefont', 'br', 'col', 'frame', 'hr',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1415
      'img', 'input', 'isindex', 'link', 'meta', 'param']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1416
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1417
    def __init__(self, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1418
        self.encoding = encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1419
        if _debug: sys.stderr.write('entering BaseHTMLProcessor, encoding=%s\n' % self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1420
        sgmllib.SGMLParser.__init__(self)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1421
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1422
    def reset(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1423
        self.pieces = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1424
        sgmllib.SGMLParser.reset(self)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1425
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1426
    def _shorttag_replace(self, match):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1427
        tag = match.group(1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1428
        if tag in self.elements_no_end_tag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1429
            return '<' + tag + ' />'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1430
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1431
            return '<' + tag + '></' + tag + '>'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1432
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1433
    def feed(self, data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1434
        data = re.compile(r'<!((?!DOCTYPE|--|\[))', re.IGNORECASE).sub(r'&lt;!\1', data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1435
        #data = re.sub(r'<(\S+?)\s*?/>', self._shorttag_replace, data) # bug [ 1399464 ] Bad regexp for _shorttag_replace
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1436
        data = re.sub(r'<([^<\s]+?)\s*/>', self._shorttag_replace, data) 
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1437
        data = data.replace('&#39;', "'")
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1438
        data = data.replace('&#34;', '"')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1439
        if self.encoding and type(data) == type(u''):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1440
            data = data.encode(self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1441
        sgmllib.SGMLParser.feed(self, data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1442
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1443
    def normalize_attrs(self, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1444
        # utility method to be called by descendants
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1445
        attrs = [(k.lower(), v) for k, v in attrs]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1446
        attrs = [(k, k in ('rel', 'type') and v.lower() or v) for k, v in attrs]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1447
        return attrs
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1448
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1449
    def unknown_starttag(self, tag, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1450
        # called for each start tag
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1451
        # attrs is a list of (attr, value) tuples
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1452
        # e.g. for <pre class='screen'>, tag='pre', attrs=[('class', 'screen')]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1453
        if _debug: sys.stderr.write('_BaseHTMLProcessor, unknown_starttag, tag=%s\n' % tag)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1454
        uattrs = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1455
        # thanks to Kevin Marks for this breathtaking hack to deal with (valid) high-bit attribute values in UTF-8 feeds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1456
        for key, value in attrs:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1457
            if type(value) != type(u''):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1458
                value = unicode(value, self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1459
            uattrs.append((unicode(key, self.encoding), value))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1460
        strattrs = u''.join([u' %s="%s"' % (key, value) for key, value in uattrs]).encode(self.encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1461
        if tag in self.elements_no_end_tag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1462
            self.pieces.append('<%(tag)s%(strattrs)s />' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1463
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1464
            self.pieces.append('<%(tag)s%(strattrs)s>' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1465
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1466
    def unknown_endtag(self, tag):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1467
        # called for each end tag, e.g. for </pre>, tag will be 'pre'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1468
        # Reconstruct the original end tag.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1469
        if tag not in self.elements_no_end_tag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1470
            self.pieces.append("</%(tag)s>" % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1471
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1472
    def handle_charref(self, ref):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1473
        # called for each character reference, e.g. for '&#160;', ref will be '160'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1474
        # Reconstruct the original character reference.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1475
        self.pieces.append('&#%(ref)s;' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1476
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1477
    def handle_entityref(self, ref):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1478
        # called for each entity reference, e.g. for '&copy;', ref will be 'copy'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1479
        # Reconstruct the original entity reference.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1480
        self.pieces.append('&%(ref)s;' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1481
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1482
    def handle_data(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1483
        # called for each block of plain text, i.e. outside of any tag and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1484
        # not containing any character or entity references
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1485
        # Store the original text verbatim.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1486
        if _debug: sys.stderr.write('_BaseHTMLProcessor, handle_text, text=%s\n' % text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1487
        self.pieces.append(text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1488
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1489
    def handle_comment(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1490
        # called for each HTML comment, e.g. <!-- insert Javascript code here -->
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1491
        # Reconstruct the original comment.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1492
        self.pieces.append('<!--%(text)s-->' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1493
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1494
    def handle_pi(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1495
        # called for each processing instruction, e.g. <?instruction>
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1496
        # Reconstruct original processing instruction.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1497
        self.pieces.append('<?%(text)s>' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1498
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1499
    def handle_decl(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1500
        # called for the DOCTYPE, if present, e.g.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1501
        # <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1502
        #     "http://www.w3.org/TR/html4/loose.dtd">
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1503
        # Reconstruct original DOCTYPE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1504
        self.pieces.append('<!%(text)s>' % locals())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1505
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1506
    _new_declname_match = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9:]*\s*').match
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1507
    def _scan_name(self, i, declstartpos):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1508
        rawdata = self.rawdata
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1509
        n = len(rawdata)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1510
        if i == n:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1511
            return None, -1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1512
        m = self._new_declname_match(rawdata, i)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1513
        if m:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1514
            s = m.group()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1515
            name = s.strip()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1516
            if (i + len(s)) == n:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1517
                return None, -1  # end of buffer
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1518
            return name.lower(), m.end()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1519
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1520
            self.handle_data(rawdata)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1521
#            self.updatepos(declstartpos, i)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1522
            return None, -1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1523
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1524
    def output(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1525
        '''Return processed HTML as a single string'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1526
        return ''.join([str(p) for p in self.pieces])
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1527
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1528
class _LooseFeedParser(_FeedParserMixin, _BaseHTMLProcessor):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1529
    def __init__(self, baseuri, baselang, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1530
        sgmllib.SGMLParser.__init__(self)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1531
        _FeedParserMixin.__init__(self, baseuri, baselang, encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1532
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1533
    def decodeEntities(self, element, data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1534
        data = data.replace('&#60;', '&lt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1535
        data = data.replace('&#x3c;', '&lt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1536
        data = data.replace('&#62;', '&gt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1537
        data = data.replace('&#x3e;', '&gt;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1538
        data = data.replace('&#38;', '&amp;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1539
        data = data.replace('&#x26;', '&amp;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1540
        data = data.replace('&#34;', '&quot;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1541
        data = data.replace('&#x22;', '&quot;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1542
        data = data.replace('&#39;', '&apos;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1543
        data = data.replace('&#x27;', '&apos;')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1544
        if self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1545
            data = data.replace('&lt;', '<')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1546
            data = data.replace('&gt;', '>')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1547
            data = data.replace('&amp;', '&')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1548
            data = data.replace('&quot;', '"')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1549
            data = data.replace('&apos;', "'")
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1550
        return data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1551
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1552
class _RelativeURIResolver(_BaseHTMLProcessor):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1553
    relative_uris = [('a', 'href'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1554
                     ('applet', 'codebase'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1555
                     ('area', 'href'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1556
                     ('blockquote', 'cite'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1557
                     ('body', 'background'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1558
                     ('del', 'cite'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1559
                     ('form', 'action'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1560
                     ('frame', 'longdesc'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1561
                     ('frame', 'src'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1562
                     ('iframe', 'longdesc'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1563
                     ('iframe', 'src'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1564
                     ('head', 'profile'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1565
                     ('img', 'longdesc'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1566
                     ('img', 'src'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1567
                     ('img', 'usemap'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1568
                     ('input', 'src'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1569
                     ('input', 'usemap'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1570
                     ('ins', 'cite'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1571
                     ('link', 'href'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1572
                     ('object', 'classid'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1573
                     ('object', 'codebase'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1574
                     ('object', 'data'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1575
                     ('object', 'usemap'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1576
                     ('q', 'cite'),
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1577
                     ('script', 'src')]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1578
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1579
    def __init__(self, baseuri, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1580
        _BaseHTMLProcessor.__init__(self, encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1581
        self.baseuri = baseuri
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1582
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1583
    def resolveURI(self, uri):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1584
        return _urljoin(self.baseuri, uri)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1585
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1586
    def unknown_starttag(self, tag, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1587
        attrs = self.normalize_attrs(attrs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1588
        attrs = [(key, ((tag, key) in self.relative_uris) and self.resolveURI(value) or value) for key, value in attrs]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1589
        _BaseHTMLProcessor.unknown_starttag(self, tag, attrs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1590
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1591
def _resolveRelativeURIs(htmlSource, baseURI, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1592
    if _debug: sys.stderr.write('entering _resolveRelativeURIs\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1593
    p = _RelativeURIResolver(baseURI, encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1594
    p.feed(htmlSource)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1595
    return p.output()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1596
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1597
class _HTMLSanitizer(_BaseHTMLProcessor):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1598
    acceptable_elements = ['a', 'abbr', 'acronym', 'address', 'area', 'b', 'big',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1599
      'blockquote', 'br', 'button', 'caption', 'center', 'cite', 'code', 'col',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1600
      'colgroup', 'dd', 'del', 'dfn', 'dir', 'div', 'dl', 'dt', 'em', 'fieldset',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1601
      'font', 'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'input',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1602
      'ins', 'kbd', 'label', 'legend', 'li', 'map', 'menu', 'ol', 'optgroup',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1603
      'option', 'p', 'pre', 'q', 's', 'samp', 'select', 'small', 'span', 'strike',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1604
      'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'textarea', 'tfoot', 'th',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1605
      'thead', 'tr', 'tt', 'u', 'ul', 'var']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1606
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1607
    acceptable_attributes = ['abbr', 'accept', 'accept-charset', 'accesskey',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1608
      'action', 'align', 'alt', 'axis', 'border', 'cellpadding', 'cellspacing',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1609
      'char', 'charoff', 'charset', 'checked', 'cite', 'class', 'clear', 'cols',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1610
      'colspan', 'color', 'compact', 'coords', 'datetime', 'dir', 'disabled',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1611
      'enctype', 'for', 'frame', 'headers', 'height', 'href', 'hreflang', 'hspace',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1612
      'id', 'ismap', 'label', 'lang', 'longdesc', 'maxlength', 'media', 'method',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1613
      'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1614
      'rel', 'rev', 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape', 'size',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1615
      'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title', 'type',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1616
      'usemap', 'valign', 'value', 'vspace', 'width']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1617
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1618
    unacceptable_elements_with_end_tag = ['script', 'applet']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1619
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1620
    def reset(self):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1621
        _BaseHTMLProcessor.reset(self)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1622
        self.unacceptablestack = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1623
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1624
    def unknown_starttag(self, tag, attrs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1625
        if not tag in self.acceptable_elements:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1626
            if tag in self.unacceptable_elements_with_end_tag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1627
                self.unacceptablestack += 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1628
            return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1629
        attrs = self.normalize_attrs(attrs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1630
        attrs = [(key, value) for key, value in attrs if key in self.acceptable_attributes]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1631
        _BaseHTMLProcessor.unknown_starttag(self, tag, attrs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1632
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1633
    def unknown_endtag(self, tag):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1634
        if not tag in self.acceptable_elements:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1635
            if tag in self.unacceptable_elements_with_end_tag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1636
                self.unacceptablestack -= 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1637
            return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1638
        _BaseHTMLProcessor.unknown_endtag(self, tag)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1639
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1640
    def handle_pi(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1641
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1642
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1643
    def handle_decl(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1644
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1645
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1646
    def handle_data(self, text):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1647
        if not self.unacceptablestack:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1648
            _BaseHTMLProcessor.handle_data(self, text)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1649
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1650
def _sanitizeHTML(htmlSource, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1651
    p = _HTMLSanitizer(encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1652
    p.feed(htmlSource)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1653
    data = p.output()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1654
    if TIDY_MARKUP:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1655
        # loop through list of preferred Tidy interfaces looking for one that's installed,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1656
        # then set up a common _tidy function to wrap the interface-specific API.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1657
        _tidy = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1658
        for tidy_interface in PREFERRED_TIDY_INTERFACES:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1659
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1660
                if tidy_interface == "uTidy":
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1661
                    from tidy import parseString as _utidy
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1662
                    def _tidy(data, **kwargs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1663
                        return str(_utidy(data, **kwargs))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1664
                    break
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1665
                elif tidy_interface == "mxTidy":
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1666
                    from mx.Tidy import Tidy as _mxtidy
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1667
                    def _tidy(data, **kwargs):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1668
                        nerrors, nwarnings, data, errordata = _mxtidy.tidy(data, **kwargs)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1669
                        return data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1670
                    break
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1671
            except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1672
                pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1673
        if _tidy:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1674
            utf8 = type(data) == type(u'')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1675
            if utf8:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1676
                data = data.encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1677
            data = _tidy(data, output_xhtml=1, numeric_entities=1, wrap=0, char_encoding="utf8")
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1678
            if utf8:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1679
                data = unicode(data, 'utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1680
            if data.count('<body'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1681
                data = data.split('<body', 1)[1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1682
                if data.count('>'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1683
                    data = data.split('>', 1)[1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1684
            if data.count('</body'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1685
                data = data.split('</body', 1)[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1686
    data = data.strip().replace('\r\n', '\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1687
    return data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1688
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1689
class _FeedURLHandler(urllib2.HTTPDigestAuthHandler, urllib2.HTTPRedirectHandler, urllib2.HTTPDefaultErrorHandler):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1690
    def http_error_default(self, req, fp, code, msg, headers):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1691
        if ((code / 100) == 3) and (code != 304):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1692
            return self.http_error_302(req, fp, code, msg, headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1693
        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1694
        infourl.status = code
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1695
        return infourl
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1696
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1697
    def http_error_302(self, req, fp, code, msg, headers):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1698
        if headers.dict.has_key('location'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1699
            infourl = urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1700
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1701
            infourl = urllib.addinfourl(fp, headers, req.get_full_url())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1702
        if not hasattr(infourl, 'status'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1703
            infourl.status = code
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1704
        return infourl
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1705
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1706
    def http_error_301(self, req, fp, code, msg, headers):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1707
        if headers.dict.has_key('location'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1708
            infourl = urllib2.HTTPRedirectHandler.http_error_301(self, req, fp, code, msg, headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1709
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1710
            infourl = urllib.addinfourl(fp, headers, req.get_full_url())
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1711
        if not hasattr(infourl, 'status'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1712
            infourl.status = code
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1713
        return infourl
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1714
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1715
    http_error_300 = http_error_302
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1716
    http_error_303 = http_error_302
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1717
    http_error_307 = http_error_302
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1718
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1719
    def http_error_401(self, req, fp, code, msg, headers):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1720
        # Check if
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1721
        # - server requires digest auth, AND
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1722
        # - we tried (unsuccessfully) with basic auth, AND
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1723
        # - we're using Python 2.3.3 or later (digest auth is irreparably broken in earlier versions)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1724
        # If all conditions hold, parse authentication information
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1725
        # out of the Authorization header we sent the first time
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1726
        # (for the username and password) and the WWW-Authenticate
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1727
        # header the server sent back (for the realm) and retry
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1728
        # the request with the appropriate digest auth headers instead.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1729
        # This evil genius hack has been brought to you by Aaron Swartz.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1730
        host = urlparse.urlparse(req.get_full_url())[1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1731
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1732
            assert sys.version.split()[0] >= '2.3.3'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1733
            assert base64 != None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1734
            user, passw = base64.decodestring(req.headers['Authorization'].split(' ')[1]).split(':')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1735
            realm = re.findall('realm="([^"]*)"', headers['WWW-Authenticate'])[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1736
            self.add_password(realm, host, user, passw)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1737
            retry = self.http_error_auth_reqed('www-authenticate', host, req, headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1738
            self.reset_retry_count()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1739
            return retry
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1740
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1741
            return self.http_error_default(req, fp, code, msg, headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1742
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1743
def _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1744
    """URL, filename, or string --> stream
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1745
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1746
    This function lets you define parsers that take any input source
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1747
    (URL, pathname to local or network file, or actual data as a string)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1748
    and deal with it in a uniform manner.  Returned object is guaranteed
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1749
    to have all the basic stdio read methods (read, readline, readlines).
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1750
    Just .close() the object when you're done with it.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1751
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1752
    If the etag argument is supplied, it will be used as the value of an
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1753
    If-None-Match request header.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1754
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1755
    If the modified argument is supplied, it must be a tuple of 9 integers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1756
    as returned by gmtime() in the standard Python time module. This MUST
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1757
    be in GMT (Greenwich Mean Time). The formatted date/time will be used
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1758
    as the value of an If-Modified-Since request header.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1759
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1760
    If the agent argument is supplied, it will be used as the value of a
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1761
    User-Agent request header.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1762
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1763
    If the referrer argument is supplied, it will be used as the value of a
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1764
    Referer[sic] request header.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1765
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1766
    If handlers is supplied, it is a list of handlers used to build a
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1767
    urllib2 opener.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1768
    """
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1769
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1770
    if hasattr(url_file_stream_or_string, 'read'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1771
        return url_file_stream_or_string
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1772
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1773
    if url_file_stream_or_string == '-':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1774
        return sys.stdin
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1775
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1776
    if urlparse.urlparse(url_file_stream_or_string)[0] in ('http', 'https', 'ftp'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1777
        if not agent:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1778
            agent = USER_AGENT
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1779
        # test for inline user:password for basic auth
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1780
        auth = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1781
        if base64:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1782
            urltype, rest = urllib.splittype(url_file_stream_or_string)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1783
            realhost, rest = urllib.splithost(rest)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1784
            if realhost:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1785
                user_passwd, realhost = urllib.splituser(realhost)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1786
                if user_passwd:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1787
                    url_file_stream_or_string = '%s://%s%s' % (urltype, realhost, rest)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1788
                    auth = base64.encodestring(user_passwd).strip()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1789
        # try to open with urllib2 (to use optional headers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1790
        request = urllib2.Request(url_file_stream_or_string)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1791
        request.add_header('User-Agent', agent)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1792
        if etag:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1793
            request.add_header('If-None-Match', etag)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1794
        if modified:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1795
            # format into an RFC 1123-compliant timestamp. We can't use
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1796
            # time.strftime() since the %a and %b directives can be affected
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1797
            # by the current locale, but RFC 2616 states that dates must be
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1798
            # in English.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1799
            short_weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1800
            months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1801
            request.add_header('If-Modified-Since', '%s, %02d %s %04d %02d:%02d:%02d GMT' % (short_weekdays[modified[6]], modified[2], months[modified[1] - 1], modified[0], modified[3], modified[4], modified[5]))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1802
        if referrer:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1803
            request.add_header('Referer', referrer)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1804
        if gzip and zlib:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1805
            request.add_header('Accept-encoding', 'gzip, deflate')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1806
        elif gzip:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1807
            request.add_header('Accept-encoding', 'gzip')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1808
        elif zlib:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1809
            request.add_header('Accept-encoding', 'deflate')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1810
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1811
            request.add_header('Accept-encoding', '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1812
        if auth:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1813
            request.add_header('Authorization', 'Basic %s' % auth)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1814
        if ACCEPT_HEADER:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1815
            request.add_header('Accept', ACCEPT_HEADER)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1816
        request.add_header('A-IM', 'feed') # RFC 3229 support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1817
        opener = apply(urllib2.build_opener, tuple([_FeedURLHandler()] + handlers))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1818
        opener.addheaders = [] # RMK - must clear so we only send our custom User-Agent
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1819
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1820
            return opener.open(request)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1821
        finally:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1822
            opener.close() # JohnD
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1823
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1824
    # try to open with native open function (if url_file_stream_or_string is a filename)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1825
    try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1826
        return open(url_file_stream_or_string)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1827
    except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1828
        pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1829
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1830
    # treat url_file_stream_or_string as string
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1831
    return _StringIO(str(url_file_stream_or_string))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1832
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1833
_date_handlers = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1834
def registerDateHandler(func):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1835
    '''Register a date handler function (takes string, returns 9-tuple date in GMT)'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1836
    _date_handlers.insert(0, func)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1837
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1838
# ISO-8601 date parsing routines written by Fazal Majid.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1839
# The ISO 8601 standard is very convoluted and irregular - a full ISO 8601
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1840
# parser is beyond the scope of feedparser and would be a worthwhile addition
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1841
# to the Python library.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1842
# A single regular expression cannot parse ISO 8601 date formats into groups
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1843
# as the standard is highly irregular (for instance is 030104 2003-01-04 or
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1844
# 0301-04-01), so we use templates instead.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1845
# Please note the order in templates is significant because we need a
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1846
# greedy match.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1847
_iso8601_tmpl = ['YYYY-?MM-?DD', 'YYYY-MM', 'YYYY-?OOO',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1848
                'YY-?MM-?DD', 'YY-?OOO', 'YYYY', 
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1849
                '-YY-?MM', '-OOO', '-YY',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1850
                '--MM-?DD', '--MM',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1851
                '---DD',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1852
                'CC', '']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1853
_iso8601_re = [
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1854
    tmpl.replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1855
    'YYYY', r'(?P<year>\d{4})').replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1856
    'YY', r'(?P<year>\d\d)').replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1857
    'MM', r'(?P<month>[01]\d)').replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1858
    'DD', r'(?P<day>[0123]\d)').replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1859
    'OOO', r'(?P<ordinal>[0123]\d\d)').replace(
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1860
    'CC', r'(?P<century>\d\d$)')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1861
    + r'(T?(?P<hour>\d{2}):(?P<minute>\d{2})'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1862
    + r'(:(?P<second>\d{2}))?'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1863
    + r'(?P<tz>[+-](?P<tzhour>\d{2})(:(?P<tzmin>\d{2}))?|Z)?)?'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1864
    for tmpl in _iso8601_tmpl]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1865
del tmpl
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1866
_iso8601_matches = [re.compile(regex).match for regex in _iso8601_re]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1867
del regex
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1868
def _parse_date_iso8601(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1869
    '''Parse a variety of ISO-8601-compatible formats like 20040105'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1870
    m = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1871
    for _iso8601_match in _iso8601_matches:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1872
        m = _iso8601_match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1873
        if m: break
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1874
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1875
    if m.span() == (0, 0): return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1876
    params = m.groupdict()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1877
    ordinal = params.get('ordinal', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1878
    if ordinal:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1879
        ordinal = int(ordinal)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1880
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1881
        ordinal = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1882
    year = params.get('year', '--')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1883
    if not year or year == '--':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1884
        year = time.gmtime()[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1885
    elif len(year) == 2:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1886
        # ISO 8601 assumes current century, i.e. 93 -> 2093, NOT 1993
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1887
        year = 100 * int(time.gmtime()[0] / 100) + int(year)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1888
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1889
        year = int(year)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1890
    month = params.get('month', '-')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1891
    if not month or month == '-':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1892
        # ordinals are NOT normalized by mktime, we simulate them
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1893
        # by setting month=1, day=ordinal
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1894
        if ordinal:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1895
            month = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1896
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1897
            month = time.gmtime()[1]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1898
    month = int(month)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1899
    day = params.get('day', 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1900
    if not day:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1901
        # see above
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1902
        if ordinal:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1903
            day = ordinal
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1904
        elif params.get('century', 0) or \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1905
                 params.get('year', 0) or params.get('month', 0):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1906
            day = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1907
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1908
            day = time.gmtime()[2]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1909
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1910
        day = int(day)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1911
    # special case of the century - is the first year of the 21st century
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1912
    # 2000 or 2001 ? The debate goes on...
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1913
    if 'century' in params.keys():
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1914
        year = (int(params['century']) - 1) * 100 + 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1915
    # in ISO 8601 most fields are optional
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1916
    for field in ['hour', 'minute', 'second', 'tzhour', 'tzmin']:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1917
        if not params.get(field, None):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1918
            params[field] = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1919
    hour = int(params.get('hour', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1920
    minute = int(params.get('minute', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1921
    second = int(params.get('second', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1922
    # weekday is normalized by mktime(), we can ignore it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1923
    weekday = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1924
    # daylight savings is complex, but not needed for feedparser's purposes
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1925
    # as time zones, if specified, include mention of whether it is active
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1926
    # (e.g. PST vs. PDT, CET). Using -1 is implementation-dependent and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1927
    # and most implementations have DST bugs
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1928
    daylight_savings_flag = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1929
    tm = [year, month, day, hour, minute, second, weekday,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1930
          ordinal, daylight_savings_flag]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1931
    # ISO 8601 time zone adjustments
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1932
    tz = params.get('tz')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1933
    if tz and tz != 'Z':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1934
        if tz[0] == '-':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1935
            tm[3] += int(params.get('tzhour', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1936
            tm[4] += int(params.get('tzmin', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1937
        elif tz[0] == '+':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1938
            tm[3] -= int(params.get('tzhour', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1939
            tm[4] -= int(params.get('tzmin', 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1940
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1941
            return None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1942
    # Python's time.mktime() is a wrapper around the ANSI C mktime(3c)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1943
    # which is guaranteed to normalize d/m/y/h/m/s.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1944
    # Many implementations have bugs, but we'll pretend they don't.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1945
    return time.localtime(time.mktime(tm))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1946
registerDateHandler(_parse_date_iso8601)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1947
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1948
# 8-bit date handling routines written by ytrewq1.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1949
_korean_year  = u'\ub144' # b3e2 in euc-kr
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1950
_korean_month = u'\uc6d4' # bff9 in euc-kr
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1951
_korean_day   = u'\uc77c' # c0cf in euc-kr
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1952
_korean_am    = u'\uc624\uc804' # bfc0 c0fc in euc-kr
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1953
_korean_pm    = u'\uc624\ud6c4' # bfc0 c8c4 in euc-kr
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1954
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1955
_korean_onblog_date_re = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1956
    re.compile('(\d{4})%s\s+(\d{2})%s\s+(\d{2})%s\s+(\d{2}):(\d{2}):(\d{2})' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1957
               (_korean_year, _korean_month, _korean_day))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1958
_korean_nate_date_re = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1959
    re.compile(u'(\d{4})-(\d{2})-(\d{2})\s+(%s|%s)\s+(\d{,2}):(\d{,2}):(\d{,2})' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1960
               (_korean_am, _korean_pm))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1961
def _parse_date_onblog(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1962
    '''Parse a string according to the OnBlog 8-bit date format'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1963
    m = _korean_onblog_date_re.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1964
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1965
    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1966
                {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1967
                 'hour': m.group(4), 'minute': m.group(5), 'second': m.group(6),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1968
                 'zonediff': '+09:00'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1969
    if _debug: sys.stderr.write('OnBlog date parsed as: %s\n' % w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1970
    return _parse_date_w3dtf(w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1971
registerDateHandler(_parse_date_onblog)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1972
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1973
def _parse_date_nate(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1974
    '''Parse a string according to the Nate 8-bit date format'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1975
    m = _korean_nate_date_re.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1976
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1977
    hour = int(m.group(5))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1978
    ampm = m.group(4)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1979
    if (ampm == _korean_pm):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1980
        hour += 12
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1981
    hour = str(hour)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1982
    if len(hour) == 1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1983
        hour = '0' + hour
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1984
    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1985
                {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1986
                 'hour': hour, 'minute': m.group(6), 'second': m.group(7),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1987
                 'zonediff': '+09:00'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1988
    if _debug: sys.stderr.write('Nate date parsed as: %s\n' % w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1989
    return _parse_date_w3dtf(w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1990
registerDateHandler(_parse_date_nate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1991
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1992
_mssql_date_re = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1993
    re.compile('(\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2})(\.\d+)?')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1994
def _parse_date_mssql(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1995
    '''Parse a string according to the MS SQL date format'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1996
    m = _mssql_date_re.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1997
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1998
    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  1999
                {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2000
                 'hour': m.group(4), 'minute': m.group(5), 'second': m.group(6),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2001
                 'zonediff': '+09:00'}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2002
    if _debug: sys.stderr.write('MS SQL date parsed as: %s\n' % w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2003
    return _parse_date_w3dtf(w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2004
registerDateHandler(_parse_date_mssql)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2005
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2006
# Unicode strings for Greek date strings
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2007
_greek_months = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2008
  { \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2009
   u'\u0399\u03b1\u03bd': u'Jan',       # c9e1ed in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2010
   u'\u03a6\u03b5\u03b2': u'Feb',       # d6e5e2 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2011
   u'\u039c\u03ac\u03ce': u'Mar',       # ccdcfe in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2012
   u'\u039c\u03b1\u03ce': u'Mar',       # cce1fe in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2013
   u'\u0391\u03c0\u03c1': u'Apr',       # c1f0f1 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2014
   u'\u039c\u03ac\u03b9': u'May',       # ccdce9 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2015
   u'\u039c\u03b1\u03ca': u'May',       # cce1fa in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2016
   u'\u039c\u03b1\u03b9': u'May',       # cce1e9 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2017
   u'\u0399\u03bf\u03cd\u03bd': u'Jun', # c9effded in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2018
   u'\u0399\u03bf\u03bd': u'Jun',       # c9efed in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2019
   u'\u0399\u03bf\u03cd\u03bb': u'Jul', # c9effdeb in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2020
   u'\u0399\u03bf\u03bb': u'Jul',       # c9f9eb in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2021
   u'\u0391\u03cd\u03b3': u'Aug',       # c1fde3 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2022
   u'\u0391\u03c5\u03b3': u'Aug',       # c1f5e3 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2023
   u'\u03a3\u03b5\u03c0': u'Sep',       # d3e5f0 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2024
   u'\u039f\u03ba\u03c4': u'Oct',       # cfeaf4 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2025
   u'\u039d\u03bf\u03ad': u'Nov',       # cdefdd in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2026
   u'\u039d\u03bf\u03b5': u'Nov',       # cdefe5 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2027
   u'\u0394\u03b5\u03ba': u'Dec',       # c4e5ea in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2028
  }
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2029
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2030
_greek_wdays = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2031
  { \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2032
   u'\u039a\u03c5\u03c1': u'Sun', # caf5f1 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2033
   u'\u0394\u03b5\u03c5': u'Mon', # c4e5f5 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2034
   u'\u03a4\u03c1\u03b9': u'Tue', # d4f1e9 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2035
   u'\u03a4\u03b5\u03c4': u'Wed', # d4e5f4 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2036
   u'\u03a0\u03b5\u03bc': u'Thu', # d0e5ec in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2037
   u'\u03a0\u03b1\u03c1': u'Fri', # d0e1f1 in iso-8859-7
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2038
   u'\u03a3\u03b1\u03b2': u'Sat', # d3e1e2 in iso-8859-7   
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2039
  }
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2040
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2041
_greek_date_format_re = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2042
    re.compile(u'([^,]+),\s+(\d{2})\s+([^\s]+)\s+(\d{4})\s+(\d{2}):(\d{2}):(\d{2})\s+([^\s]+)')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2043
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2044
def _parse_date_greek(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2045
    '''Parse a string according to a Greek 8-bit date format.'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2046
    m = _greek_date_format_re.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2047
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2048
    try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2049
        wday = _greek_wdays[m.group(1)]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2050
        month = _greek_months[m.group(3)]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2051
    except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2052
        return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2053
    rfc822date = '%(wday)s, %(day)s %(month)s %(year)s %(hour)s:%(minute)s:%(second)s %(zonediff)s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2054
                 {'wday': wday, 'day': m.group(2), 'month': month, 'year': m.group(4),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2055
                  'hour': m.group(5), 'minute': m.group(6), 'second': m.group(7),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2056
                  'zonediff': m.group(8)}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2057
    if _debug: sys.stderr.write('Greek date parsed as: %s\n' % rfc822date)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2058
    return _parse_date_rfc822(rfc822date)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2059
registerDateHandler(_parse_date_greek)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2060
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2061
# Unicode strings for Hungarian date strings
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2062
_hungarian_months = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2063
  { \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2064
    u'janu\u00e1r':   u'01',  # e1 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2065
    u'febru\u00e1ri': u'02',  # e1 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2066
    u'm\u00e1rcius':  u'03',  # e1 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2067
    u'\u00e1prilis':  u'04',  # e1 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2068
    u'm\u00e1ujus':   u'05',  # e1 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2069
    u'j\u00fanius':   u'06',  # fa in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2070
    u'j\u00falius':   u'07',  # fa in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2071
    u'augusztus':     u'08',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2072
    u'szeptember':    u'09',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2073
    u'okt\u00f3ber':  u'10',  # f3 in iso-8859-2
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2074
    u'november':      u'11',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2075
    u'december':      u'12',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2076
  }
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2077
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2078
_hungarian_date_format_re = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2079
  re.compile(u'(\d{4})-([^-]+)-(\d{,2})T(\d{,2}):(\d{2})((\+|-)(\d{,2}:\d{2}))')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2080
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2081
def _parse_date_hungarian(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2082
    '''Parse a string according to a Hungarian 8-bit date format.'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2083
    m = _hungarian_date_format_re.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2084
    if not m: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2085
    try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2086
        month = _hungarian_months[m.group(2)]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2087
        day = m.group(3)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2088
        if len(day) == 1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2089
            day = '0' + day
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2090
        hour = m.group(4)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2091
        if len(hour) == 1:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2092
            hour = '0' + hour
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2093
    except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2094
        return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2095
    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s%(zonediff)s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2096
                {'year': m.group(1), 'month': month, 'day': day,\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2097
                 'hour': hour, 'minute': m.group(5),\
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2098
                 'zonediff': m.group(6)}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2099
    if _debug: sys.stderr.write('Hungarian date parsed as: %s\n' % w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2100
    return _parse_date_w3dtf(w3dtfdate)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2101
registerDateHandler(_parse_date_hungarian)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2102
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2103
# W3DTF-style date parsing adapted from PyXML xml.utils.iso8601, written by
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2104
# Drake and licensed under the Python license.  Removed all range checking
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2105
# for month, day, hour, minute, and second, since mktime will normalize
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2106
# these later
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2107
def _parse_date_w3dtf(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2108
    def __extract_date(m):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2109
        year = int(m.group('year'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2110
        if year < 100:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2111
            year = 100 * int(time.gmtime()[0] / 100) + int(year)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2112
        if year < 1000:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2113
            return 0, 0, 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2114
        julian = m.group('julian')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2115
        if julian:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2116
            julian = int(julian)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2117
            month = julian / 30 + 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2118
            day = julian % 30 + 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2119
            jday = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2120
            while jday != julian:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2121
                t = time.mktime((year, month, day, 0, 0, 0, 0, 0, 0))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2122
                jday = time.gmtime(t)[-2]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2123
                diff = abs(jday - julian)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2124
                if jday > julian:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2125
                    if diff < day:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2126
                        day = day - diff
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2127
                    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2128
                        month = month - 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2129
                        day = 31
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2130
                elif jday < julian:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2131
                    if day + diff < 28:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2132
                       day = day + diff
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2133
                    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2134
                        month = month + 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2135
            return year, month, day
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2136
        month = m.group('month')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2137
        day = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2138
        if month is None:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2139
            month = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2140
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2141
            month = int(month)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2142
            day = m.group('day')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2143
            if day:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2144
                day = int(day)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2145
            else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2146
                day = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2147
        return year, month, day
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2148
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2149
    def __extract_time(m):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2150
        if not m:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2151
            return 0, 0, 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2152
        hours = m.group('hours')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2153
        if not hours:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2154
            return 0, 0, 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2155
        hours = int(hours)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2156
        minutes = int(m.group('minutes'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2157
        seconds = m.group('seconds')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2158
        if seconds:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2159
            seconds = int(seconds)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2160
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2161
            seconds = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2162
        return hours, minutes, seconds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2163
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2164
    def __extract_tzd(m):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2165
        '''Return the Time Zone Designator as an offset in seconds from UTC.'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2166
        if not m:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2167
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2168
        tzd = m.group('tzd')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2169
        if not tzd:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2170
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2171
        if tzd == 'Z':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2172
            return 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2173
        hours = int(m.group('tzdhours'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2174
        minutes = m.group('tzdminutes')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2175
        if minutes:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2176
            minutes = int(minutes)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2177
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2178
            minutes = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2179
        offset = (hours*60 + minutes) * 60
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2180
        if tzd[0] == '+':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2181
            return -offset
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2182
        return offset
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2183
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2184
    __date_re = ('(?P<year>\d\d\d\d)'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2185
                 '(?:(?P<dsep>-|)'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2186
                 '(?:(?P<julian>\d\d\d)'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2187
                 '|(?P<month>\d\d)(?:(?P=dsep)(?P<day>\d\d))?))?')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2188
    __tzd_re = '(?P<tzd>[-+](?P<tzdhours>\d\d)(?::?(?P<tzdminutes>\d\d))|Z)'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2189
    __tzd_rx = re.compile(__tzd_re)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2190
    __time_re = ('(?P<hours>\d\d)(?P<tsep>:|)(?P<minutes>\d\d)'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2191
                 '(?:(?P=tsep)(?P<seconds>\d\d(?:[.,]\d+)?))?'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2192
                 + __tzd_re)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2193
    __datetime_re = '%s(?:T%s)?' % (__date_re, __time_re)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2194
    __datetime_rx = re.compile(__datetime_re)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2195
    m = __datetime_rx.match(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2196
    if (m is None) or (m.group() != dateString): return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2197
    gmt = __extract_date(m) + __extract_time(m) + (0, 0, 0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2198
    if gmt[0] == 0: return
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2199
    return time.gmtime(time.mktime(gmt) + __extract_tzd(m) - time.timezone)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2200
registerDateHandler(_parse_date_w3dtf)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2201
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2202
def _parse_date_rfc822(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2203
    '''Parse an RFC822, RFC1123, RFC2822, or asctime-style date'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2204
    data = dateString.split()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2205
    if data[0][-1] in (',', '.') or data[0].lower() in rfc822._daynames:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2206
        del data[0]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2207
    if len(data) == 4:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2208
        s = data[3]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2209
        i = s.find('+')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2210
        if i > 0:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2211
            data[3:] = [s[:i], s[i+1:]]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2212
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2213
            data.append('')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2214
        dateString = " ".join(data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2215
    if len(data) < 5:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2216
        dateString += ' 00:00:00 GMT'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2217
    tm = rfc822.parsedate_tz(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2218
    if tm:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2219
        return time.gmtime(rfc822.mktime_tz(tm))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2220
# rfc822.py defines several time zones, but we define some extra ones.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2221
# 'ET' is equivalent to 'EST', etc.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2222
_additional_timezones = {'AT': -400, 'ET': -500, 'CT': -600, 'MT': -700, 'PT': -800}
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2223
rfc822._timezones.update(_additional_timezones)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2224
registerDateHandler(_parse_date_rfc822)    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2225
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2226
def _parse_date(dateString):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2227
    '''Parses a variety of date formats into a 9-tuple in GMT'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2228
    for handler in _date_handlers:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2229
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2230
            date9tuple = handler(dateString)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2231
            if not date9tuple: continue
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2232
            if len(date9tuple) != 9:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2233
                if _debug: sys.stderr.write('date handler function must return 9-tuple\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2234
                raise ValueError
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2235
            map(int, date9tuple)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2236
            return date9tuple
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2237
        except Exception, e:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2238
            if _debug: sys.stderr.write('%s raised %s\n' % (handler.__name__, repr(e)))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2239
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2240
    return None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2241
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2242
def _getCharacterEncoding(http_headers, xml_data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2243
    '''Get the character encoding of the XML document
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2244
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2245
    http_headers is a dictionary
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2246
    xml_data is a raw string (not Unicode)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2247
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2248
    This is so much trickier than it sounds, it's not even funny.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2249
    According to RFC 3023 ('XML Media Types'), if the HTTP Content-Type
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2250
    is application/xml, application/*+xml,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2251
    application/xml-external-parsed-entity, or application/xml-dtd,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2252
    the encoding given in the charset parameter of the HTTP Content-Type
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2253
    takes precedence over the encoding given in the XML prefix within the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2254
    document, and defaults to 'utf-8' if neither are specified.  But, if
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2255
    the HTTP Content-Type is text/xml, text/*+xml, or
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2256
    text/xml-external-parsed-entity, the encoding given in the XML prefix
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2257
    within the document is ALWAYS IGNORED and only the encoding given in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2258
    the charset parameter of the HTTP Content-Type header should be
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2259
    respected, and it defaults to 'us-ascii' if not specified.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2260
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2261
    Furthermore, discussion on the atom-syntax mailing list with the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2262
    author of RFC 3023 leads me to the conclusion that any document
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2263
    served with a Content-Type of text/* and no charset parameter
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2264
    must be treated as us-ascii.  (We now do this.)  And also that it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2265
    must always be flagged as non-well-formed.  (We now do this too.)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2266
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2267
    If Content-Type is unspecified (input was local file or non-HTTP source)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2268
    or unrecognized (server just got it totally wrong), then go by the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2269
    encoding given in the XML prefix of the document and default to
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2270
    'iso-8859-1' as per the HTTP specification (RFC 2616).
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2271
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2272
    Then, assuming we didn't find a character encoding in the HTTP headers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2273
    (and the HTTP Content-type allowed us to look in the body), we need
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2274
    to sniff the first few bytes of the XML data and try to determine
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2275
    whether the encoding is ASCII-compatible.  Section F of the XML
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2276
    specification shows the way here:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2277
    http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2278
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2279
    If the sniffed encoding is not ASCII-compatible, we need to make it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2280
    ASCII compatible so that we can sniff further into the XML declaration
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2281
    to find the encoding attribute, which will tell us the true encoding.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2282
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2283
    Of course, none of this guarantees that we will be able to parse the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2284
    feed in the declared character encoding (assuming it was declared
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2285
    correctly, which many are not).  CJKCodecs and iconv_codec help a lot;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2286
    you should definitely install them if you can.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2287
    http://cjkpython.i18n.org/
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2288
    '''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2289
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2290
    def _parseHTTPContentType(content_type):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2291
        '''takes HTTP Content-Type header and returns (content type, charset)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2292
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2293
        If no charset is specified, returns (content type, '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2294
        If no content type is specified, returns ('', '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2295
        Both return parameters are guaranteed to be lowercase strings
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2296
        '''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2297
        content_type = content_type or ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2298
        content_type, params = cgi.parse_header(content_type)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2299
        return content_type, params.get('charset', '').replace("'", '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2300
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2301
    sniffed_xml_encoding = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2302
    xml_encoding = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2303
    true_encoding = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2304
    http_content_type, http_encoding = _parseHTTPContentType(http_headers.get('content-type'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2305
    # Must sniff for non-ASCII-compatible character encodings before
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2306
    # searching for XML declaration.  This heuristic is defined in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2307
    # section F of the XML specification:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2308
    # http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2309
    try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2310
        if xml_data[:4] == '\x4c\x6f\xa7\x94':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2311
            # EBCDIC
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2312
            xml_data = _ebcdic_to_ascii(xml_data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2313
        elif xml_data[:4] == '\x00\x3c\x00\x3f':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2314
            # UTF-16BE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2315
            sniffed_xml_encoding = 'utf-16be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2316
            xml_data = unicode(xml_data, 'utf-16be').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2317
        elif (len(xml_data) >= 4) and (xml_data[:2] == '\xfe\xff') and (xml_data[2:4] != '\x00\x00'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2318
            # UTF-16BE with BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2319
            sniffed_xml_encoding = 'utf-16be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2320
            xml_data = unicode(xml_data[2:], 'utf-16be').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2321
        elif xml_data[:4] == '\x3c\x00\x3f\x00':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2322
            # UTF-16LE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2323
            sniffed_xml_encoding = 'utf-16le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2324
            xml_data = unicode(xml_data, 'utf-16le').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2325
        elif (len(xml_data) >= 4) and (xml_data[:2] == '\xff\xfe') and (xml_data[2:4] != '\x00\x00'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2326
            # UTF-16LE with BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2327
            sniffed_xml_encoding = 'utf-16le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2328
            xml_data = unicode(xml_data[2:], 'utf-16le').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2329
        elif xml_data[:4] == '\x00\x00\x00\x3c':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2330
            # UTF-32BE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2331
            sniffed_xml_encoding = 'utf-32be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2332
            xml_data = unicode(xml_data, 'utf-32be').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2333
        elif xml_data[:4] == '\x3c\x00\x00\x00':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2334
            # UTF-32LE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2335
            sniffed_xml_encoding = 'utf-32le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2336
            xml_data = unicode(xml_data, 'utf-32le').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2337
        elif xml_data[:4] == '\x00\x00\xfe\xff':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2338
            # UTF-32BE with BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2339
            sniffed_xml_encoding = 'utf-32be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2340
            xml_data = unicode(xml_data[4:], 'utf-32be').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2341
        elif xml_data[:4] == '\xff\xfe\x00\x00':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2342
            # UTF-32LE with BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2343
            sniffed_xml_encoding = 'utf-32le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2344
            xml_data = unicode(xml_data[4:], 'utf-32le').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2345
        elif xml_data[:3] == '\xef\xbb\xbf':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2346
            # UTF-8 with BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2347
            sniffed_xml_encoding = 'utf-8'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2348
            xml_data = unicode(xml_data[3:], 'utf-8').encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2349
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2350
            # ASCII-compatible
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2351
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2352
        xml_encoding_match = re.compile('^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2353
    except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2354
        xml_encoding_match = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2355
    if xml_encoding_match:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2356
        xml_encoding = xml_encoding_match.groups()[0].lower()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2357
        if sniffed_xml_encoding and (xml_encoding in ('iso-10646-ucs-2', 'ucs-2', 'csunicode', 'iso-10646-ucs-4', 'ucs-4', 'csucs4', 'utf-16', 'utf-32', 'utf_16', 'utf_32', 'utf16', 'u16')):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2358
            xml_encoding = sniffed_xml_encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2359
    acceptable_content_type = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2360
    application_content_types = ('application/xml', 'application/xml-dtd', 'application/xml-external-parsed-entity')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2361
    text_content_types = ('text/xml', 'text/xml-external-parsed-entity')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2362
    if (http_content_type in application_content_types) or \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2363
       (http_content_type.startswith('application/') and http_content_type.endswith('+xml')):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2364
        acceptable_content_type = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2365
        true_encoding = http_encoding or xml_encoding or 'utf-8'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2366
    elif (http_content_type in text_content_types) or \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2367
         (http_content_type.startswith('text/')) and http_content_type.endswith('+xml'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2368
        acceptable_content_type = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2369
        true_encoding = http_encoding or 'us-ascii'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2370
    elif http_content_type.startswith('text/'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2371
        true_encoding = http_encoding or 'us-ascii'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2372
    elif http_headers and (not http_headers.has_key('content-type')):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2373
        true_encoding = xml_encoding or 'iso-8859-1'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2374
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2375
        true_encoding = xml_encoding or 'utf-8'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2376
    return true_encoding, http_encoding, xml_encoding, sniffed_xml_encoding, acceptable_content_type
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2377
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2378
def _toUTF8(data, encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2379
    '''Changes an XML data stream on the fly to specify a new encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2380
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2381
    data is a raw sequence of bytes (not Unicode) that is presumed to be in %encoding already
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2382
    encoding is a string recognized by encodings.aliases
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2383
    '''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2384
    if _debug: sys.stderr.write('entering _toUTF8, trying encoding %s\n' % encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2385
    # strip Byte Order Mark (if present)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2386
    if (len(data) >= 4) and (data[:2] == '\xfe\xff') and (data[2:4] != '\x00\x00'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2387
        if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2388
            sys.stderr.write('stripping BOM\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2389
            if encoding != 'utf-16be':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2390
                sys.stderr.write('trying utf-16be instead\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2391
        encoding = 'utf-16be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2392
        data = data[2:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2393
    elif (len(data) >= 4) and (data[:2] == '\xff\xfe') and (data[2:4] != '\x00\x00'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2394
        if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2395
            sys.stderr.write('stripping BOM\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2396
            if encoding != 'utf-16le':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2397
                sys.stderr.write('trying utf-16le instead\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2398
        encoding = 'utf-16le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2399
        data = data[2:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2400
    elif data[:3] == '\xef\xbb\xbf':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2401
        if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2402
            sys.stderr.write('stripping BOM\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2403
            if encoding != 'utf-8':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2404
                sys.stderr.write('trying utf-8 instead\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2405
        encoding = 'utf-8'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2406
        data = data[3:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2407
    elif data[:4] == '\x00\x00\xfe\xff':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2408
        if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2409
            sys.stderr.write('stripping BOM\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2410
            if encoding != 'utf-32be':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2411
                sys.stderr.write('trying utf-32be instead\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2412
        encoding = 'utf-32be'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2413
        data = data[4:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2414
    elif data[:4] == '\xff\xfe\x00\x00':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2415
        if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2416
            sys.stderr.write('stripping BOM\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2417
            if encoding != 'utf-32le':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2418
                sys.stderr.write('trying utf-32le instead\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2419
        encoding = 'utf-32le'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2420
        data = data[4:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2421
    newdata = unicode(data, encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2422
    if _debug: sys.stderr.write('successfully converted %s data to unicode\n' % encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2423
    declmatch = re.compile('^<\?xml[^>]*?>')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2424
    newdecl = '''<?xml version='1.0' encoding='utf-8'?>'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2425
    if declmatch.search(newdata):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2426
        newdata = declmatch.sub(newdecl, newdata)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2427
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2428
        newdata = newdecl + u'\n' + newdata
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2429
    return newdata.encode('utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2430
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2431
def _stripDoctype(data):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2432
    '''Strips DOCTYPE from XML document, returns (rss_version, stripped_data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2433
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2434
    rss_version may be 'rss091n' or None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2435
    stripped_data is the same XML document, minus the DOCTYPE
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2436
    '''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2437
    entity_pattern = re.compile(r'<!ENTITY([^>]*?)>', re.MULTILINE)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2438
    data = entity_pattern.sub('', data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2439
    doctype_pattern = re.compile(r'<!DOCTYPE([^>]*?)>', re.MULTILINE)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2440
    doctype_results = doctype_pattern.findall(data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2441
    doctype = doctype_results and doctype_results[0] or ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2442
    if doctype.lower().count('netscape'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2443
        version = 'rss091n'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2444
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2445
        version = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2446
    data = doctype_pattern.sub('', data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2447
    return version, data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2448
    
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2449
def parse(url_file_stream_or_string, etag=None, modified=None, agent=None, referrer=None, handlers=[]):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2450
    '''Parse a feed from a URL, file, stream, or string'''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2451
    result = FeedParserDict()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2452
    result['feed'] = FeedParserDict()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2453
    result['entries'] = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2454
    if _XML_AVAILABLE:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2455
        result['bozo'] = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2456
    if type(handlers) == types.InstanceType:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2457
        handlers = [handlers]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2458
    try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2459
        f = _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2460
        data = f.read()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2461
    except Exception, e:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2462
        result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2463
        result['bozo_exception'] = e
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2464
        data = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2465
        f = None
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2466
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2467
    # if feed is gzip-compressed, decompress it
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2468
    if f and data and hasattr(f, 'headers'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2469
        if gzip and f.headers.get('content-encoding', '') == 'gzip':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2470
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2471
                data = gzip.GzipFile(fileobj=_StringIO(data)).read()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2472
            except Exception, e:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2473
                # Some feeds claim to be gzipped but they're not, so
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2474
                # we get garbage.  Ideally, we should re-request the
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2475
                # feed without the 'Accept-encoding: gzip' header,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2476
                # but we don't.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2477
                result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2478
                result['bozo_exception'] = e
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2479
                data = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2480
        elif zlib and f.headers.get('content-encoding', '') == 'deflate':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2481
            try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2482
                data = zlib.decompress(data, -zlib.MAX_WBITS)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2483
            except Exception, e:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2484
                result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2485
                result['bozo_exception'] = e
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2486
                data = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2487
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2488
    # save HTTP headers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2489
    if hasattr(f, 'info'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2490
        info = f.info()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2491
        result['etag'] = info.getheader('ETag')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2492
        last_modified = info.getheader('Last-Modified')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2493
        if last_modified:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2494
            result['modified'] = _parse_date(last_modified)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2495
    if hasattr(f, 'url'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2496
        result['href'] = f.url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2497
        result['status'] = 200
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2498
    if hasattr(f, 'status'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2499
        result['status'] = f.status
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2500
    if hasattr(f, 'headers'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2501
        result['headers'] = f.headers.dict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2502
    if hasattr(f, 'close'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2503
        f.close()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2504
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2505
    # there are four encodings to keep track of:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2506
    # - http_encoding is the encoding declared in the Content-Type HTTP header
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2507
    # - xml_encoding is the encoding declared in the <?xml declaration
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2508
    # - sniffed_encoding is the encoding sniffed from the first 4 bytes of the XML data
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2509
    # - result['encoding'] is the actual encoding, as per RFC 3023 and a variety of other conflicting specifications
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2510
    http_headers = result.get('headers', {})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2511
    result['encoding'], http_encoding, xml_encoding, sniffed_xml_encoding, acceptable_content_type = \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2512
        _getCharacterEncoding(http_headers, data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2513
    if http_headers and (not acceptable_content_type):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2514
        if http_headers.has_key('content-type'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2515
            bozo_message = '%s is not an XML media type' % http_headers['content-type']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2516
        else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2517
            bozo_message = 'no Content-type specified'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2518
        result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2519
        result['bozo_exception'] = NonXMLContentType(bozo_message)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2520
        
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2521
    result['version'], data = _stripDoctype(data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2522
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2523
    baseuri = http_headers.get('content-location', result.get('href'))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2524
    baselang = http_headers.get('content-language', None)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2525
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2526
    # if server sent 304, we're done
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2527
    if result.get('status', 0) == 304:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2528
        result['version'] = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2529
        result['debug_message'] = 'The feed has not changed since you last checked, ' + \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2530
            'so the server sent no data.  This is a feature, not a bug!'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2531
        return result
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2532
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2533
    # if there was a problem downloading, we're done
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2534
    if not data:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2535
        return result
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2536
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2537
    # determine character encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2538
    use_strict_parser = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2539
    known_encoding = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2540
    tried_encodings = []
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2541
    # try: HTTP encoding, declared XML encoding, encoding sniffed from BOM
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2542
    for proposed_encoding in (result['encoding'], xml_encoding, sniffed_xml_encoding):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2543
        if not proposed_encoding: continue
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2544
        if proposed_encoding in tried_encodings: continue
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2545
        tried_encodings.append(proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2546
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2547
            data = _toUTF8(data, proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2548
            known_encoding = use_strict_parser = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2549
            break
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2550
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2551
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2552
    # if no luck and we have auto-detection library, try that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2553
    if (not known_encoding) and chardet:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2554
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2555
            proposed_encoding = chardet.detect(data)['encoding']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2556
            if proposed_encoding and (proposed_encoding not in tried_encodings):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2557
                tried_encodings.append(proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2558
                data = _toUTF8(data, proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2559
                known_encoding = use_strict_parser = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2560
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2561
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2562
    # if still no luck and we haven't tried utf-8 yet, try that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2563
    if (not known_encoding) and ('utf-8' not in tried_encodings):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2564
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2565
            proposed_encoding = 'utf-8'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2566
            tried_encodings.append(proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2567
            data = _toUTF8(data, proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2568
            known_encoding = use_strict_parser = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2569
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2570
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2571
    # if still no luck and we haven't tried windows-1252 yet, try that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2572
    if (not known_encoding) and ('windows-1252' not in tried_encodings):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2573
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2574
            proposed_encoding = 'windows-1252'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2575
            tried_encodings.append(proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2576
            data = _toUTF8(data, proposed_encoding)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2577
            known_encoding = use_strict_parser = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2578
        except:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2579
            pass
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2580
    # if still no luck, give up
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2581
    if not known_encoding:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2582
        result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2583
        result['bozo_exception'] = CharacterEncodingUnknown( \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2584
            'document encoding unknown, I tried ' + \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2585
            '%s, %s, utf-8, and windows-1252 but nothing worked' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2586
            (result['encoding'], xml_encoding))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2587
        result['encoding'] = ''
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2588
    elif proposed_encoding != result['encoding']:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2589
        result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2590
        result['bozo_exception'] = CharacterEncodingOverride( \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2591
            'documented declared as %s, but parsed as %s' % \
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2592
            (result['encoding'], proposed_encoding))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2593
        result['encoding'] = proposed_encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2594
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2595
    if not _XML_AVAILABLE:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2596
        use_strict_parser = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2597
    if use_strict_parser:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2598
        # initialize the SAX parser
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2599
        feedparser = _StrictFeedParser(baseuri, baselang, 'utf-8')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2600
        saxparser = xml.sax.make_parser(PREFERRED_XML_PARSERS)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2601
        saxparser.setFeature(xml.sax.handler.feature_namespaces, 1)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2602
        saxparser.setContentHandler(feedparser)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2603
        saxparser.setErrorHandler(feedparser)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2604
        source = xml.sax.xmlreader.InputSource()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2605
        source.setByteStream(_StringIO(data))
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2606
        if hasattr(saxparser, '_ns_stack'):
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2607
            # work around bug in built-in SAX parser (doesn't recognize xml: namespace)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2608
            # PyXML doesn't have this problem, and it doesn't have _ns_stack either
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2609
            saxparser._ns_stack.append({'http://www.w3.org/XML/1998/namespace':'xml'})
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2610
        try:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2611
            saxparser.parse(source)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2612
        except Exception, e:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2613
            if _debug:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2614
                import traceback
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2615
                traceback.print_stack()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2616
                traceback.print_exc()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2617
                sys.stderr.write('xml parsing failed\n')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2618
            result['bozo'] = 1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2619
            result['bozo_exception'] = feedparser.exc or e
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2620
            use_strict_parser = 0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2621
    if not use_strict_parser:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2622
        feedparser = _LooseFeedParser(baseuri, baselang, known_encoding and 'utf-8' or '')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2623
        feedparser.feed(data)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2624
    result['feed'] = feedparser.feeddata
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2625
    result['entries'] = feedparser.entries
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2626
    result['version'] = result['version'] or feedparser.version
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2627
    result['namespaces'] = feedparser.namespacesInUse
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2628
    return result
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2629
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2630
if __name__ == '__main__':
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2631
    if not sys.argv[1:]:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2632
        print __doc__
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2633
        sys.exit(0)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2634
    else:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2635
        urls = sys.argv[1:]
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2636
    zopeCompatibilityHack()
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2637
    from pprint import pprint
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2638
    for url in urls:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2639
        print url
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2640
        print
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2641
        result = parse(url)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2642
        pprint(result)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2643
        print
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2644
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2645
#REVISION HISTORY
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2646
#1.0 - 9/27/2002 - MAP - fixed namespace processing on prefixed RSS 2.0 elements,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2647
#  added Simon Fell's test suite
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2648
#1.1 - 9/29/2002 - MAP - fixed infinite loop on incomplete CDATA sections
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2649
#2.0 - 10/19/2002
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2650
#  JD - use inchannel to watch out for image and textinput elements which can
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2651
#  also contain title, link, and description elements
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2652
#  JD - check for isPermaLink='false' attribute on guid elements
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2653
#  JD - replaced openAnything with open_resource supporting ETag and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2654
#  If-Modified-Since request headers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2655
#  JD - parse now accepts etag, modified, agent, and referrer optional
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2656
#  arguments
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2657
#  JD - modified parse to return a dictionary instead of a tuple so that any
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2658
#  etag or modified information can be returned and cached by the caller
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2659
#2.0.1 - 10/21/2002 - MAP - changed parse() so that if we don't get anything
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2660
#  because of etag/modified, return the old etag/modified to the caller to
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2661
#  indicate why nothing is being returned
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2662
#2.0.2 - 10/21/2002 - JB - added the inchannel to the if statement, otherwise its
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2663
#  useless.  Fixes the problem JD was addressing by adding it.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2664
#2.1 - 11/14/2002 - MAP - added gzip support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2665
#2.2 - 1/27/2003 - MAP - added attribute support, admin:generatorAgent.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2666
#  start_admingeneratoragent is an example of how to handle elements with
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2667
#  only attributes, no content.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2668
#2.3 - 6/11/2003 - MAP - added USER_AGENT for default (if caller doesn't specify);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2669
#  also, make sure we send the User-Agent even if urllib2 isn't available.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2670
#  Match any variation of backend.userland.com/rss namespace.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2671
#2.3.1 - 6/12/2003 - MAP - if item has both link and guid, return both as-is.
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2672
#2.4 - 7/9/2003 - MAP - added preliminary Pie/Atom/Echo support based on Sam Ruby's
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2673
#  snapshot of July 1 <http://www.intertwingly.net/blog/1506.html>; changed
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2674
#  project name
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2675
#2.5 - 7/25/2003 - MAP - changed to Python license (all contributors agree);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2676
#  removed unnecessary urllib code -- urllib2 should always be available anyway;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2677
#  return actual url, status, and full HTTP headers (as result['url'],
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2678
#  result['status'], and result['headers']) if parsing a remote feed over HTTP --
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2679
#  this should pass all the HTTP tests at <http://diveintomark.org/tests/client/http/>;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2680
#  added the latest namespace-of-the-week for RSS 2.0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2681
#2.5.1 - 7/26/2003 - RMK - clear opener.addheaders so we only send our custom
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2682
#  User-Agent (otherwise urllib2 sends two, which confuses some servers)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2683
#2.5.2 - 7/28/2003 - MAP - entity-decode inline xml properly; added support for
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2684
#  inline <xhtml:body> and <xhtml:div> as used in some RSS 2.0 feeds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2685
#2.5.3 - 8/6/2003 - TvdV - patch to track whether we're inside an image or
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2686
#  textInput, and also to return the character encoding (if specified)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2687
#2.6 - 1/1/2004 - MAP - dc:author support (MarekK); fixed bug tracking
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2688
#  nested divs within content (JohnD); fixed missing sys import (JohanS);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2689
#  fixed regular expression to capture XML character encoding (Andrei);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2690
#  added support for Atom 0.3-style links; fixed bug with textInput tracking;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2691
#  added support for cloud (MartijnP); added support for multiple
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2692
#  category/dc:subject (MartijnP); normalize content model: 'description' gets
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2693
#  description (which can come from description, summary, or full content if no
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2694
#  description), 'content' gets dict of base/language/type/value (which can come
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2695
#  from content:encoded, xhtml:body, content, or fullitem);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2696
#  fixed bug matching arbitrary Userland namespaces; added xml:base and xml:lang
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2697
#  tracking; fixed bug tracking unknown tags; fixed bug tracking content when
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2698
#  <content> element is not in default namespace (like Pocketsoap feed);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2699
#  resolve relative URLs in link, guid, docs, url, comments, wfw:comment,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2700
#  wfw:commentRSS; resolve relative URLs within embedded HTML markup in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2701
#  description, xhtml:body, content, content:encoded, title, subtitle,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2702
#  summary, info, tagline, and copyright; added support for pingback and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2703
#  trackback namespaces
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2704
#2.7 - 1/5/2004 - MAP - really added support for trackback and pingback
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2705
#  namespaces, as opposed to 2.6 when I said I did but didn't really;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2706
#  sanitize HTML markup within some elements; added mxTidy support (if
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2707
#  installed) to tidy HTML markup within some elements; fixed indentation
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2708
#  bug in _parse_date (FazalM); use socket.setdefaulttimeout if available
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2709
#  (FazalM); universal date parsing and normalization (FazalM): 'created', modified',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2710
#  'issued' are parsed into 9-tuple date format and stored in 'created_parsed',
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2711
#  'modified_parsed', and 'issued_parsed'; 'date' is duplicated in 'modified'
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2712
#  and vice-versa; 'date_parsed' is duplicated in 'modified_parsed' and vice-versa
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2713
#2.7.1 - 1/9/2004 - MAP - fixed bug handling &quot; and &apos;.  fixed memory
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2714
#  leak not closing url opener (JohnD); added dc:publisher support (MarekK);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2715
#  added admin:errorReportsTo support (MarekK); Python 2.1 dict support (MarekK)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2716
#2.7.4 - 1/14/2004 - MAP - added workaround for improperly formed <br/> tags in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2717
#  encoded HTML (skadz); fixed unicode handling in normalize_attrs (ChrisL);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2718
#  fixed relative URI processing for guid (skadz); added ICBM support; added
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2719
#  base64 support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2720
#2.7.5 - 1/15/2004 - MAP - added workaround for malformed DOCTYPE (seen on many
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2721
#  blogspot.com sites); added _debug variable
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2722
#2.7.6 - 1/16/2004 - MAP - fixed bug with StringIO importing
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2723
#3.0b3 - 1/23/2004 - MAP - parse entire feed with real XML parser (if available);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2724
#  added several new supported namespaces; fixed bug tracking naked markup in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2725
#  description; added support for enclosure; added support for source; re-added
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2726
#  support for cloud which got dropped somehow; added support for expirationDate
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2727
#3.0b4 - 1/26/2004 - MAP - fixed xml:lang inheritance; fixed multiple bugs tracking
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2728
#  xml:base URI, one for documents that don't define one explicitly and one for
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2729
#  documents that define an outer and an inner xml:base that goes out of scope
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2730
#  before the end of the document
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2731
#3.0b5 - 1/26/2004 - MAP - fixed bug parsing multiple links at feed level
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2732
#3.0b6 - 1/27/2004 - MAP - added feed type and version detection, result['version']
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2733
#  will be one of SUPPORTED_VERSIONS.keys() or empty string if unrecognized;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2734
#  added support for creativeCommons:license and cc:license; added support for
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2735
#  full Atom content model in title, tagline, info, copyright, summary; fixed bug
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2736
#  with gzip encoding (not always telling server we support it when we do)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2737
#3.0b7 - 1/28/2004 - MAP - support Atom-style author element in author_detail
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2738
#  (dictionary of 'name', 'url', 'email'); map author to author_detail if author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2739
#  contains name + email address
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2740
#3.0b8 - 1/28/2004 - MAP - added support for contributor
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2741
#3.0b9 - 1/29/2004 - MAP - fixed check for presence of dict function; added
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2742
#  support for summary
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2743
#3.0b10 - 1/31/2004 - MAP - incorporated ISO-8601 date parsing routines from
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2744
#  xml.util.iso8601
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2745
#3.0b11 - 2/2/2004 - MAP - added 'rights' to list of elements that can contain
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2746
#  dangerous markup; fiddled with decodeEntities (not right); liberalized
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2747
#  date parsing even further
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2748
#3.0b12 - 2/6/2004 - MAP - fiddled with decodeEntities (still not right);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2749
#  added support to Atom 0.2 subtitle; added support for Atom content model
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2750
#  in copyright; better sanitizing of dangerous HTML elements with end tags
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2751
#  (script, frameset)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2752
#3.0b13 - 2/8/2004 - MAP - better handling of empty HTML tags (br, hr, img,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2753
#  etc.) in embedded markup, in either HTML or XHTML form (<br>, <br/>, <br />)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2754
#3.0b14 - 2/8/2004 - MAP - fixed CDATA handling in non-wellformed feeds under
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2755
#  Python 2.1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2756
#3.0b15 - 2/11/2004 - MAP - fixed bug resolving relative links in wfw:commentRSS;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2757
#  fixed bug capturing author and contributor URL; fixed bug resolving relative
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2758
#  links in author and contributor URL; fixed bug resolvin relative links in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2759
#  generator URL; added support for recognizing RSS 1.0; passed Simon Fell's
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2760
#  namespace tests, and included them permanently in the test suite with his
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2761
#  permission; fixed namespace handling under Python 2.1
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2762
#3.0b16 - 2/12/2004 - MAP - fixed support for RSS 0.90 (broken in b15)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2763
#3.0b17 - 2/13/2004 - MAP - determine character encoding as per RFC 3023
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2764
#3.0b18 - 2/17/2004 - MAP - always map description to summary_detail (Andrei);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2765
#  use libxml2 (if available)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2766
#3.0b19 - 3/15/2004 - MAP - fixed bug exploding author information when author
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2767
#  name was in parentheses; removed ultra-problematic mxTidy support; patch to
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2768
#  workaround crash in PyXML/expat when encountering invalid entities
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2769
#  (MarkMoraes); support for textinput/textInput
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2770
#3.0b20 - 4/7/2004 - MAP - added CDF support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2771
#3.0b21 - 4/14/2004 - MAP - added Hot RSS support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2772
#3.0b22 - 4/19/2004 - MAP - changed 'channel' to 'feed', 'item' to 'entries' in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2773
#  results dict; changed results dict to allow getting values with results.key
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2774
#  as well as results[key]; work around embedded illformed HTML with half
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2775
#  a DOCTYPE; work around malformed Content-Type header; if character encoding
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2776
#  is wrong, try several common ones before falling back to regexes (if this
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2777
#  works, bozo_exception is set to CharacterEncodingOverride); fixed character
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2778
#  encoding issues in BaseHTMLProcessor by tracking encoding and converting
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2779
#  from Unicode to raw strings before feeding data to sgmllib.SGMLParser;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2780
#  convert each value in results to Unicode (if possible), even if using
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2781
#  regex-based parsing
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2782
#3.0b23 - 4/21/2004 - MAP - fixed UnicodeDecodeError for feeds that contain
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2783
#  high-bit characters in attributes in embedded HTML in description (thanks
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2784
#  Thijs van de Vossen); moved guid, date, and date_parsed to mapped keys in
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2785
#  FeedParserDict; tweaked FeedParserDict.has_key to return True if asking
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2786
#  about a mapped key
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2787
#3.0fc1 - 4/23/2004 - MAP - made results.entries[0].links[0] and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2788
#  results.entries[0].enclosures[0] into FeedParserDict; fixed typo that could
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2789
#  cause the same encoding to be tried twice (even if it failed the first time);
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2790
#  fixed DOCTYPE stripping when DOCTYPE contained entity declarations;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2791
#  better textinput and image tracking in illformed RSS 1.0 feeds
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2792
#3.0fc2 - 5/10/2004 - MAP - added and passed Sam's amp tests; added and passed
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2793
#  my blink tag tests
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2794
#3.0fc3 - 6/18/2004 - MAP - fixed bug in _changeEncodingDeclaration that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2795
#  failed to parse utf-16 encoded feeds; made source into a FeedParserDict;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2796
#  duplicate admin:generatorAgent/@rdf:resource in generator_detail.url;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2797
#  added support for image; refactored parse() fallback logic to try other
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2798
#  encodings if SAX parsing fails (previously it would only try other encodings
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2799
#  if re-encoding failed); remove unichr madness in normalize_attrs now that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2800
#  we're properly tracking encoding in and out of BaseHTMLProcessor; set
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2801
#  feed.language from root-level xml:lang; set entry.id from rdf:about;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2802
#  send Accept header
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2803
#3.0 - 6/21/2004 - MAP - don't try iso-8859-1 (can't distinguish between
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2804
#  iso-8859-1 and windows-1252 anyway, and most incorrectly marked feeds are
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2805
#  windows-1252); fixed regression that could cause the same encoding to be
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2806
#  tried twice (even if it failed the first time)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2807
#3.0.1 - 6/22/2004 - MAP - default to us-ascii for all text/* content types;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2808
#  recover from malformed content-type header parameter with no equals sign
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2809
#  ('text/xml; charset:iso-8859-1')
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2810
#3.1 - 6/28/2004 - MAP - added and passed tests for converting HTML entities
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2811
#  to Unicode equivalents in illformed feeds (aaronsw); added and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2812
#  passed tests for converting character entities to Unicode equivalents
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2813
#  in illformed feeds (aaronsw); test for valid parsers when setting
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2814
#  XML_AVAILABLE; make version and encoding available when server returns
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2815
#  a 304; add handlers parameter to pass arbitrary urllib2 handlers (like
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2816
#  digest auth or proxy support); add code to parse username/password
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2817
#  out of url and send as basic authentication; expose downloading-related
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2818
#  exceptions in bozo_exception (aaronsw); added __contains__ method to
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2819
#  FeedParserDict (aaronsw); added publisher_detail (aaronsw)
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2820
#3.2 - 7/3/2004 - MAP - use cjkcodecs and iconv_codec if available; always
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2821
#  convert feed to UTF-8 before passing to XML parser; completely revamped
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2822
#  logic for determining character encoding and attempting XML parsing
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2823
#  (much faster); increased default timeout to 20 seconds; test for presence
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2824
#  of Location header on redirects; added tests for many alternate character
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2825
#  encodings; support various EBCDIC encodings; support UTF-16BE and
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2826
#  UTF16-LE with or without a BOM; support UTF-8 with a BOM; support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2827
#  UTF-32BE and UTF-32LE with or without a BOM; fixed crashing bug if no
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2828
#  XML parsers are available; added support for 'Content-encoding: deflate';
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2829
#  send blank 'Accept-encoding: ' header if neither gzip nor zlib modules
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2830
#  are available
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2831
#3.3 - 7/15/2004 - MAP - optimize EBCDIC to ASCII conversion; fix obscure
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2832
#  problem tracking xml:base and xml:lang if element declares it, child
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2833
#  doesn't, first grandchild redeclares it, and second grandchild doesn't;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2834
#  refactored date parsing; defined public registerDateHandler so callers
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2835
#  can add support for additional date formats at runtime; added support
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2836
#  for OnBlog, Nate, MSSQL, Greek, and Hungarian dates (ytrewq1); added
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2837
#  zopeCompatibilityHack() which turns FeedParserDict into a regular
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2838
#  dictionary, required for Zope compatibility, and also makes command-
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2839
#  line debugging easier because pprint module formats real dictionaries
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2840
#  better than dictionary-like objects; added NonXMLContentType exception,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2841
#  which is stored in bozo_exception when a feed is served with a non-XML
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2842
#  media type such as 'text/plain'; respect Content-Language as default
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2843
#  language if not xml:lang is present; cloud dict is now FeedParserDict;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2844
#  generator dict is now FeedParserDict; better tracking of xml:lang,
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2845
#  including support for xml:lang='' to unset the current language;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2846
#  recognize RSS 1.0 feeds even when RSS 1.0 namespace is not the default
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2847
#  namespace; don't overwrite final status on redirects (scenarios:
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2848
#  redirecting to a URL that returns 304, redirecting to a URL that
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2849
#  redirects to another URL with a different type of redirect); add
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2850
#  support for HTTP 303 redirects
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2851
#4.0 - MAP - support for relative URIs in xml:base attribute; fixed
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2852
#  encoding issue with mxTidy (phopkins); preliminary support for RFC 3229;
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2853
#  support for Atom 1.0; support for iTunes extensions; new 'tags' for
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2854
#  categories/keywords/etc. as array of dict
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2855
#  {'term': term, 'scheme': scheme, 'label': label} to match Atom 1.0
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2856
#  terminology; parse RFC 822-style dates with no time; lots of other
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2857
#  bug fixes
c3d098d6fafa Added feedparser to soc/utils and modified svn:externals for vendor to include feedparser svn
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
  2858
#4.1 - MAP - removed socket timeout; added support for chardet library