app/htmlsanitizer/BeautifulSoupTests.py
author Mario Ferraro <fadinlight@gmail.com>
Sun, 15 Nov 2009 22:12:20 +0100
changeset 3093 d1be59b6b627
parent 2323 b3daada52dd3
permissions -rw-r--r--
GMaps related JS changed to use new google namespace. Google is going to change permanently in the future the way to load its services, so better stay safe. Also this commit shows uses of the new melange.js module. Fixes Issue 634.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
2323
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     1
# -*- coding: utf-8 -*-
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     2
"""Unit tests for Beautiful Soup.
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     3
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     4
These tests make sure the Beautiful Soup works as it should. If you
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     5
find a bug in Beautiful Soup, the best way to express it is as a test
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     6
case like this that fails."""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     7
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     8
import unittest
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
     9
from BeautifulSoup import *
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    10
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    11
class SoupTest(unittest.TestCase):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    12
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    13
    def assertSoupEquals(self, toParse, rep=None, c=BeautifulSoup,
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    14
                         encoding=None):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    15
        """Parse the given text and make sure its string rep is the other
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    16
        given text."""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    17
        if rep == None:
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    18
            rep = toParse
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    19
        obj = c(toParse)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    20
        if encoding is None:
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    21
            rep2 = obj.decode()
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    22
        else:
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    23
            rep2 = obj.encode(encoding)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    24
        self.assertEqual(rep2, rep)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    25
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    26
class FollowThatTag(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    27
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    28
    "Tests the various ways of fetching tags from a soup."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    29
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    30
    def setUp(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    31
        ml = """
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    32
        <a id="x">1</a>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    33
        <A id="a">2</a>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    34
        <b id="b">3</a>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    35
        <b href="foo" id="x">4</a>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    36
        <ac width=100>4</ac>"""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    37
        self.soup = BeautifulStoneSoup(ml)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    38
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    39
    def testFindAllByName(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    40
        matching = self.soup('a')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    41
        self.assertEqual(len(matching), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    42
        self.assertEqual(matching[0].name, 'a')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    43
        self.assertEqual(matching, self.soup.findAll('a'))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    44
        self.assertEqual(matching, self.soup.findAll(SoupStrainer('a')))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    45
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    46
    def testFindAllByAttribute(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    47
        matching = self.soup.findAll(id='x')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    48
        self.assertEqual(len(matching), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    49
        self.assertEqual(matching[0].name, 'a')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    50
        self.assertEqual(matching[1].name, 'b')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    51
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    52
        matching2 = self.soup.findAll(attrs={'id' : 'x'})
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    53
        self.assertEqual(matching, matching2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    54
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    55
        strainer = SoupStrainer(attrs={'id' : 'x'})
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    56
        self.assertEqual(matching, self.soup.findAll(strainer))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    57
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    58
        self.assertEqual(len(self.soup.findAll(id=None)), 1)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    59
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    60
        self.assertEqual(len(self.soup.findAll(width=100)), 1)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    61
        self.assertEqual(len(self.soup.findAll(junk=None)), 5)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    62
        self.assertEqual(len(self.soup.findAll(junk=[1, None])), 5)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    63
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    64
        self.assertEqual(len(self.soup.findAll(junk=re.compile('.*'))), 0)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    65
        self.assertEqual(len(self.soup.findAll(junk=True)), 0)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    66
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    67
        self.assertEqual(len(self.soup.findAll(junk=True)), 0)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    68
        self.assertEqual(len(self.soup.findAll(href=True)), 1)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    69
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    70
    def testFindallByClass(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    71
        soup = BeautifulSoup('<a>Foo</a><a class="1">Bar</a>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    72
        self.assertEqual(soup.find('a', '1').string, "Bar")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    73
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    74
    def testFindAllByList(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    75
        matching = self.soup(['a', 'ac'])
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    76
        self.assertEqual(len(matching), 3)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    77
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    78
    def testFindAllByHash(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    79
        matching = self.soup({'a' : True, 'b' : True})
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    80
        self.assertEqual(len(matching), 4)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    81
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    82
    def testFindAllText(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    83
        soup = BeautifulSoup("<html>\xbb</html>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    84
        self.assertEqual(soup.findAll(text=re.compile('.*')),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    85
                         [u'\xbb'])
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    86
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    87
    def testFindAllByRE(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    88
        import re
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    89
        r = re.compile('a.*')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    90
        self.assertEqual(len(self.soup(r)), 3)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    91
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    92
    def testFindAllByMethod(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    93
        def matchTagWhereIDMatchesName(tag):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    94
            return tag.name == tag.get('id')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    95
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    96
        matching = self.soup.findAll(matchTagWhereIDMatchesName)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    97
        self.assertEqual(len(matching), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    98
        self.assertEqual(matching[0].name, 'a')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
    99
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   100
    def testParents(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   101
        soup = BeautifulSoup('<ul id="foo"></ul><ul id="foo"><ul><ul id="foo" a="b"><b>Blah')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   102
        b = soup.b
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   103
        self.assertEquals(len(b.findParents('ul', {'id' : 'foo'})), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   104
        self.assertEquals(b.findParent('ul')['a'], 'b')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   105
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   106
    PROXIMITY_TEST = BeautifulSoup('<b id="1"><b id="2"><b id="3"><b id="4">')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   107
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   108
    def testNext(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   109
        soup = self.PROXIMITY_TEST
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   110
        b = soup.find('b', {'id' : 2})
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   111
        self.assertEquals(b.findNext('b')['id'], '3')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   112
        self.assertEquals(b.findNext('b')['id'], '3')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   113
        self.assertEquals(len(b.findAllNext('b')), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   114
        self.assertEquals(len(b.findAllNext('b', {'id' : 4})), 1)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   115
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   116
    def testPrevious(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   117
        soup = self.PROXIMITY_TEST
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   118
        b = soup.find('b', {'id' : 3})
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   119
        self.assertEquals(b.findPrevious('b')['id'], '2')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   120
        self.assertEquals(b.findPrevious('b')['id'], '2')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   121
        self.assertEquals(len(b.findAllPrevious('b')), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   122
        self.assertEquals(len(b.findAllPrevious('b', {'id' : 2})), 1)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   123
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   124
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   125
    SIBLING_TEST = BeautifulSoup('<blockquote id="1"><blockquote id="1.1"></blockquote></blockquote><blockquote id="2"><blockquote id="2.1"></blockquote></blockquote><blockquote id="3"><blockquote id="3.1"></blockquote></blockquote><blockquote id="4">')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   126
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   127
    def testNextSibling(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   128
        soup = self.SIBLING_TEST
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   129
        tag = 'blockquote'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   130
        b = soup.find(tag, {'id' : 2})
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   131
        self.assertEquals(b.findNext(tag)['id'], '2.1')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   132
        self.assertEquals(b.findNextSibling(tag)['id'], '3')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   133
        self.assertEquals(b.findNextSibling(tag)['id'], '3')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   134
        self.assertEquals(len(b.findNextSiblings(tag)), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   135
        self.assertEquals(len(b.findNextSiblings(tag, {'id' : 4})), 1)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   136
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   137
    def testPreviousSibling(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   138
        soup = self.SIBLING_TEST
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   139
        tag = 'blockquote'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   140
        b = soup.find(tag, {'id' : 3})
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   141
        self.assertEquals(b.findPrevious(tag)['id'], '2.1')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   142
        self.assertEquals(b.findPreviousSibling(tag)['id'], '2')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   143
        self.assertEquals(b.findPreviousSibling(tag)['id'], '2')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   144
        self.assertEquals(len(b.findPreviousSiblings(tag)), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   145
        self.assertEquals(len(b.findPreviousSiblings(tag, id=1)), 1)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   146
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   147
    def testTextNavigation(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   148
        soup = BeautifulSoup('Foo<b>Bar</b><i id="1"><b>Baz<br />Blee<hr id="1"/></b></i>Blargh')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   149
        baz = soup.find(text='Baz')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   150
        self.assertEquals(baz.findParent("i")['id'], '1')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   151
        self.assertEquals(baz.findNext(text='Blee'), 'Blee')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   152
        self.assertEquals(baz.findNextSibling(text='Blee'), 'Blee')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   153
        self.assertEquals(baz.findNextSibling(text='Blargh'), None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   154
        self.assertEquals(baz.findNextSibling('hr')['id'], '1')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   155
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   156
class SiblingRivalry(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   157
    "Tests the nextSibling and previousSibling navigation."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   158
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   159
    def testSiblings(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   160
        soup = BeautifulSoup("<ul><li>1<p>A</p>B<li>2<li>3</ul>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   161
        secondLI = soup.find('li').nextSibling
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   162
        self.assert_(secondLI.name == 'li' and secondLI.string == '2')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   163
        self.assertEquals(soup.find(text='1').nextSibling.name, 'p')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   164
        self.assertEquals(soup.find('p').nextSibling, 'B')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   165
        self.assertEquals(soup.find('p').nextSibling.previousSibling.nextSibling, 'B')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   166
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   167
class TagsAreObjectsToo(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   168
    "Tests the various built-in functions of Tag objects."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   169
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   170
    def testLen(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   171
        soup = BeautifulSoup("<top>1<b>2</b>3</top>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   172
        self.assertEquals(len(soup.top), 3)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   173
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   174
class StringEmUp(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   175
    "Tests the use of 'string' as an alias for a tag's only content."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   176
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   177
    def testString(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   178
        s = BeautifulSoup("<b>foo</b>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   179
        self.assertEquals(s.b.string, 'foo')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   180
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   181
    def testLackOfString(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   182
        s = BeautifulSoup("<b>f<i>e</i>o</b>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   183
        self.assert_(not s.b.string)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   184
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   185
class ThatsMyLimit(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   186
    "Tests the limit argument."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   187
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   188
    def testBasicLimits(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   189
        s = BeautifulSoup('<br id="1" /><br id="1" /><br id="1" /><br id="1" />')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   190
        self.assertEquals(len(s.findAll('br')), 4)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   191
        self.assertEquals(len(s.findAll('br', limit=2)), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   192
        self.assertEquals(len(s('br', limit=2)), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   193
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   194
class OnlyTheLonely(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   195
    "Tests the parseOnly argument to the constructor."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   196
    def setUp(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   197
        x = []
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   198
        for i in range(1,6):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   199
            x.append('<a id="%s">' % i)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   200
            for j in range(100,103):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   201
                x.append('<b id="%s.%s">Content %s.%s</b>' % (i,j, i,j))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   202
            x.append('</a>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   203
        self.x = ''.join(x)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   204
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   205
    def testOnly(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   206
        strainer = SoupStrainer("b")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   207
        soup = BeautifulSoup(self.x, parseOnlyThese=strainer)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   208
        self.assertEquals(len(soup), 15)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   209
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   210
        strainer = SoupStrainer(id=re.compile("100.*"))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   211
        soup = BeautifulSoup(self.x, parseOnlyThese=strainer)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   212
        self.assertEquals(len(soup), 5)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   213
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   214
        strainer = SoupStrainer(text=re.compile("10[01].*"))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   215
        soup = BeautifulSoup(self.x, parseOnlyThese=strainer)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   216
        self.assertEquals(len(soup), 10)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   217
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   218
        strainer = SoupStrainer(text=lambda(x):x[8]=='3')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   219
        soup = BeautifulSoup(self.x, parseOnlyThese=strainer)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   220
        self.assertEquals(len(soup), 3)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   221
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   222
class PickleMeThis(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   223
    "Testing features like pickle and deepcopy."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   224
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   225
    def setUp(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   226
        self.page = """<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   227
"http://www.w3.org/TR/REC-html40/transitional.dtd">
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   228
<html>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   229
<head>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   230
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   231
<title>Beautiful Soup: We called him Tortoise because he taught us.</title>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   232
<link rev="made" href="mailto:leonardr@segfault.org">
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   233
<meta name="Description" content="Beautiful Soup: an HTML parser optimized for screen-scraping.">
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   234
<meta name="generator" content="Markov Approximation 1.4 (module: leonardr)">
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   235
<meta name="author" content="Leonard Richardson">
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   236
</head>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   237
<body>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   238
<a href="foo">foo</a>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   239
<a href="foo"><b>bar</b></a>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   240
</body>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   241
</html>"""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   242
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   243
        self.soup = BeautifulSoup(self.page)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   244
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   245
    def testPickle(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   246
        import pickle
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   247
        dumped = pickle.dumps(self.soup, 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   248
        loaded = pickle.loads(dumped)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   249
        self.assertEqual(loaded.__class__, BeautifulSoup)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   250
        self.assertEqual(loaded.decode(), self.soup.decode())
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   251
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   252
    def testDeepcopy(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   253
        from copy import deepcopy
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   254
        deepcopy(BeautifulSoup("<a></a>"))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   255
        copied = deepcopy(self.soup)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   256
        self.assertEqual(copied.decode(), self.soup.decode())
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   257
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   258
    def testUnicodePickle(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   259
        import cPickle as pickle
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   260
        html = "<b>" + chr(0xc3) + "</b>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   261
        soup = BeautifulSoup(html)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   262
        dumped = pickle.dumps(soup, pickle.HIGHEST_PROTOCOL)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   263
        loaded = pickle.loads(dumped)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   264
        self.assertEqual(loaded.decode(), soup.decode())
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   265
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   266
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   267
class WriteOnlyCode(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   268
    "Testing the modification of the tree."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   269
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   270
    def testModifyAttributes(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   271
        soup = BeautifulSoup('<a id="1"></a>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   272
        soup.a['id'] = 2
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   273
        self.assertEqual(soup.decode(), '<a id="2"></a>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   274
        del(soup.a['id'])
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   275
        self.assertEqual(soup.decode(), '<a></a>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   276
        soup.a['id2'] = 'foo'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   277
        self.assertEqual(soup.decode(), '<a id2="foo"></a>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   278
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   279
    def testNewTagCreation(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   280
        "Makes sure tags don't step on each others' toes."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   281
        soup = BeautifulSoup()
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   282
        a = Tag(soup, 'a')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   283
        ol = Tag(soup, 'ol')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   284
        a['href'] = 'http://foo.com/'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   285
        self.assertRaises(KeyError, lambda : ol['href'])
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   286
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   287
    def testTagReplacement(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   288
        # Make sure you can replace an element with itself.
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   289
        text = "<a><b></b><c>Foo<d></d></c></a><a><e></e></a>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   290
        soup = BeautifulSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   291
        c = soup.c
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   292
        soup.c.replaceWith(c)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   293
        self.assertEquals(soup.decode(), text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   294
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   295
        # A very simple case
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   296
        soup = BeautifulSoup("<b>Argh!</b>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   297
        soup.find(text="Argh!").replaceWith("Hooray!")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   298
        newText = soup.find(text="Hooray!")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   299
        b = soup.b
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   300
        self.assertEqual(newText.previous, b)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   301
        self.assertEqual(newText.parent, b)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   302
        self.assertEqual(newText.previous.next, newText)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   303
        self.assertEqual(newText.next, None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   304
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   305
        # A more complex case
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   306
        soup = BeautifulSoup("<a><b>Argh!</b><c></c><d></d></a>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   307
        soup.b.insert(1, "Hooray!")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   308
        newText = soup.find(text="Hooray!")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   309
        self.assertEqual(newText.previous, "Argh!")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   310
        self.assertEqual(newText.previous.next, newText)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   311
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   312
        self.assertEqual(newText.previousSibling, "Argh!")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   313
        self.assertEqual(newText.previousSibling.nextSibling, newText)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   314
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   315
        self.assertEqual(newText.nextSibling, None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   316
        self.assertEqual(newText.next, soup.c)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   317
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   318
        text = "<html>There's <b>no</b> business like <b>show</b> business</html>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   319
        soup = BeautifulSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   320
        no, show = soup.findAll('b')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   321
        show.replaceWith(no)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   322
        self.assertEquals(soup.decode(), "<html>There's  business like <b>no</b> business</html>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   323
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   324
        # Even more complex
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   325
        soup = BeautifulSoup("<a><b>Find</b><c>lady!</c><d></d></a>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   326
        tag = Tag(soup, 'magictag')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   327
        tag.insert(0, "the")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   328
        soup.a.insert(1, tag)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   329
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   330
        b = soup.b
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   331
        c = soup.c
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   332
        theText = tag.find(text=True)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   333
        findText = b.find(text="Find")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   334
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   335
        self.assertEqual(findText.next, tag)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   336
        self.assertEqual(tag.previous, findText)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   337
        self.assertEqual(b.nextSibling, tag)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   338
        self.assertEqual(tag.previousSibling, b)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   339
        self.assertEqual(tag.nextSibling, c)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   340
        self.assertEqual(c.previousSibling, tag)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   341
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   342
        self.assertEqual(theText.next, c)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   343
        self.assertEqual(c.previous, theText)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   344
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   345
        # Aand... incredibly complex.
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   346
        soup = BeautifulSoup("""<a>We<b>reserve<c>the</c><d>right</d></b></a><e>to<f>refuse</f><g>service</g></e>""")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   347
        f = soup.f
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   348
        a = soup.a
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   349
        c = soup.c
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   350
        e = soup.e
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   351
        weText = a.find(text="We")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   352
        soup.b.replaceWith(soup.f)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   353
        self.assertEqual(soup.decode(), "<a>We<f>refuse</f></a><e>to<g>service</g></e>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   354
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   355
        self.assertEqual(f.previous, weText)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   356
        self.assertEqual(weText.next, f)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   357
        self.assertEqual(f.previousSibling, weText)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   358
        self.assertEqual(f.nextSibling, None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   359
        self.assertEqual(weText.nextSibling, f)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   360
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   361
    def testAppend(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   362
       doc = "<p>Don't leave me <b>here</b>.</p> <p>Don't leave me.</p>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   363
       soup = BeautifulSoup(doc)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   364
       second_para = soup('p')[1]
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   365
       bold = soup.find('b')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   366
       soup('p')[1].append(soup.find('b'))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   367
       self.assertEqual(bold.parent, second_para)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   368
       self.assertEqual(soup.decode(),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   369
                        "<p>Don't leave me .</p> "
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   370
                        "<p>Don't leave me.<b>here</b></p>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   371
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   372
    def testTagExtraction(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   373
        # A very simple case
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   374
        text = '<html><div id="nav">Nav crap</div>Real content here.</html>'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   375
        soup = BeautifulSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   376
        extracted = soup.find("div", id="nav").extract()
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   377
        self.assertEqual(soup.decode(), "<html>Real content here.</html>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   378
        self.assertEqual(extracted.decode(), '<div id="nav">Nav crap</div>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   379
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   380
        # A simple case, a more complex test.
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   381
        text = "<doc><a>1<b>2</b></a><a>i<b>ii</b></a><a>A<b>B</b></a></doc>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   382
        soup = BeautifulStoneSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   383
        doc = soup.doc
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   384
        numbers, roman, letters = soup("a")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   385
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   386
        self.assertEqual(roman.parent, doc)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   387
        oldPrevious = roman.previous
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   388
        endOfThisTag = roman.nextSibling.previous
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   389
        self.assertEqual(oldPrevious, "2")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   390
        self.assertEqual(roman.next, "i")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   391
        self.assertEqual(endOfThisTag, "ii")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   392
        self.assertEqual(roman.previousSibling, numbers)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   393
        self.assertEqual(roman.nextSibling, letters)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   394
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   395
        roman.extract()
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   396
        self.assertEqual(roman.parent, None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   397
        self.assertEqual(roman.previous, None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   398
        self.assertEqual(roman.next, "i")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   399
        self.assertEqual(letters.previous, '2')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   400
        self.assertEqual(roman.previousSibling, None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   401
        self.assertEqual(roman.nextSibling, None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   402
        self.assertEqual(endOfThisTag.next, None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   403
        self.assertEqual(roman.b.contents[0].next, None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   404
        self.assertEqual(numbers.nextSibling, letters)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   405
        self.assertEqual(letters.previousSibling, numbers)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   406
        self.assertEqual(len(doc.contents), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   407
        self.assertEqual(doc.contents[0], numbers)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   408
        self.assertEqual(doc.contents[1], letters)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   409
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   410
        # A more complex case.
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   411
        text = "<a>1<b>2<c>Hollywood, baby!</c></b></a>3"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   412
        soup = BeautifulStoneSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   413
        one = soup.find(text="1")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   414
        three = soup.find(text="3")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   415
        toExtract = soup.b
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   416
        soup.b.extract()
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   417
        self.assertEqual(one.next, three)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   418
        self.assertEqual(three.previous, one)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   419
        self.assertEqual(one.parent.nextSibling, three)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   420
        self.assertEqual(three.previousSibling, soup.a)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   421
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   422
class TheManWithoutAttributes(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   423
    "Test attribute access"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   424
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   425
    def testHasKey(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   426
        text = "<foo attr='bar'>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   427
        self.assertTrue(BeautifulSoup(text).foo.has_key('attr'))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   428
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   429
class QuoteMeOnThat(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   430
    "Test quoting"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   431
    def testQuotedAttributeValues(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   432
        self.assertSoupEquals("<foo attr='bar'></foo>",
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   433
                              '<foo attr="bar"></foo>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   434
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   435
        text = """<foo attr='bar "brawls" happen'>a</foo>"""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   436
        soup = BeautifulSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   437
        self.assertEquals(soup.decode(), text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   438
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   439
        soup.foo['attr'] = 'Brawls happen at "Bob\'s Bar"'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   440
        newText = """<foo attr='Brawls happen at "Bob&squot;s Bar"'>a</foo>"""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   441
        self.assertSoupEquals(soup.decode(), newText)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   442
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   443
        self.assertSoupEquals('<this is="really messed up & stuff">',
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   444
                              '<this is="really messed up &amp; stuff"></this>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   445
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   446
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   447
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   448
class YoureSoLiteral(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   449
    "Test literal mode."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   450
    def testLiteralMode(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   451
        text = "<script>if (i<imgs.length)</script><b>Foo</b>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   452
        soup = BeautifulSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   453
        self.assertEqual(soup.script.contents[0], "if (i<imgs.length)")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   454
        self.assertEqual(soup.b.contents[0], "Foo")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   455
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   456
    def testTextArea(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   457
        text = "<textarea><b>This is an example of an HTML tag</b><&<&</textarea>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   458
        soup = BeautifulSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   459
        self.assertEqual(soup.textarea.contents[0],
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   460
                         "<b>This is an example of an HTML tag</b><&<&")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   461
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   462
class OperatorOverload(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   463
    "Our operators do it all! Call now!"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   464
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   465
    def testTagNameAsFind(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   466
        "Tests that referencing a tag name as a member delegates to find()."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   467
        soup = BeautifulSoup('<b id="1">foo<i>bar</i></b><b>Red herring</b>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   468
        self.assertEqual(soup.b.i, soup.find('b').find('i'))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   469
        self.assertEqual(soup.b.i.string, 'bar')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   470
        self.assertEqual(soup.b['id'], '1')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   471
        self.assertEqual(soup.b.contents[0], 'foo')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   472
        self.assert_(not soup.a)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   473
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   474
        #Test the .fooTag variant of .foo.
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   475
        self.assertEqual(soup.bTag.iTag.string, 'bar')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   476
        self.assertEqual(soup.b.iTag.string, 'bar')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   477
        self.assertEqual(soup.find('b').find('i'), soup.bTag.iTag)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   478
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   479
class NestableEgg(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   480
    """Here we test tag nesting. TEST THE NEST, DUDE! X-TREME!"""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   481
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   482
    def testParaInsideBlockquote(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   483
        soup = BeautifulSoup('<blockquote><p><b>Foo</blockquote><p>Bar')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   484
        self.assertEqual(soup.blockquote.p.b.string, 'Foo')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   485
        self.assertEqual(soup.blockquote.b.string, 'Foo')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   486
        self.assertEqual(soup.find('p', recursive=False).string, 'Bar')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   487
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   488
    def testNestedTables(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   489
        text = """<table id="1"><tr><td>Here's another table:
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   490
        <table id="2"><tr><td>Juicy text</td></tr></table></td></tr></table>"""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   491
        soup = BeautifulSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   492
        self.assertEquals(soup.table.table.td.string, 'Juicy text')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   493
        self.assertEquals(len(soup.findAll('table')), 2)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   494
        self.assertEquals(len(soup.table.findAll('table')), 1)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   495
        self.assertEquals(soup.find('table', {'id' : 2}).parent.parent.parent.name,
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   496
                          'table')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   497
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   498
        text = "<table><tr><td><div><table>Foo</table></div></td></tr></table>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   499
        soup = BeautifulSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   500
        self.assertEquals(soup.table.tr.td.div.table.contents[0], "Foo")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   501
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   502
        text = """<table><thead><tr>Foo</tr></thead><tbody><tr>Bar</tr></tbody>
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   503
        <tfoot><tr>Baz</tr></tfoot></table>"""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   504
        soup = BeautifulSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   505
        self.assertEquals(soup.table.thead.tr.contents[0], "Foo")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   506
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   507
    def testBadNestedTables(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   508
        soup = BeautifulSoup("<table><tr><table><tr id='nested'>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   509
        self.assertEquals(soup.table.tr.table.tr['id'], 'nested')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   510
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   511
class CleanupOnAisleFour(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   512
    """Here we test cleanup of text that breaks HTMLParser or is just
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   513
    obnoxious."""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   514
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   515
    def testSelfClosingtag(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   516
        self.assertEqual(BeautifulSoup("Foo<br/>Bar").find('br').decode(),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   517
                         '<br />')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   518
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   519
        self.assertSoupEquals('<p>test1<br/>test2</p>',
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   520
                              '<p>test1<br />test2</p>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   521
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   522
        text = '<p>test1<selfclosing>test2'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   523
        soup = BeautifulStoneSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   524
        self.assertEqual(soup.decode(),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   525
                         '<p>test1<selfclosing>test2</selfclosing></p>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   526
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   527
        soup = BeautifulStoneSoup(text, selfClosingTags='selfclosing')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   528
        self.assertEqual(soup.decode(),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   529
                         '<p>test1<selfclosing />test2</p>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   530
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   531
    def testSelfClosingTagOrNot(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   532
        text = "<item><link>http://foo.com/</link></item>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   533
        self.assertEqual(BeautifulStoneSoup(text).decode(), text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   534
        self.assertEqual(BeautifulSoup(text).decode(),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   535
                         '<item><link />http://foo.com/</item>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   536
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   537
    def testBooleanAttributes(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   538
        text = "<td nowrap>foo</td>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   539
        self.assertSoupEquals(text, text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   540
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   541
    def testCData(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   542
        xml = "<root>foo<![CDATA[foobar]]>bar</root>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   543
        self.assertSoupEquals(xml, xml)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   544
        r = re.compile("foo.*bar")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   545
        soup = BeautifulSoup(xml)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   546
        self.assertEquals(soup.find(text=r).string, "foobar")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   547
        self.assertEquals(soup.find(text=r).__class__, CData)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   548
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   549
    def testComments(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   550
        xml = "foo<!--foobar-->baz"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   551
        self.assertSoupEquals(xml)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   552
        r = re.compile("foo.*bar")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   553
        soup = BeautifulSoup(xml)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   554
        self.assertEquals(soup.find(text=r).string, "foobar")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   555
        self.assertEquals(soup.find(text="foobar").__class__, Comment)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   556
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   557
    def testDeclaration(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   558
        xml = "foo<!DOCTYPE foobar>baz"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   559
        self.assertSoupEquals(xml)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   560
        r = re.compile(".*foo.*bar")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   561
        soup = BeautifulSoup(xml)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   562
        text = "DOCTYPE foobar"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   563
        self.assertEquals(soup.find(text=r).string, text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   564
        self.assertEquals(soup.find(text=text).__class__, Declaration)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   565
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   566
        namespaced_doctype = ('<!DOCTYPE xsl:stylesheet SYSTEM "htmlent.dtd">'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   567
                              '<html>foo</html>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   568
        soup = BeautifulSoup(namespaced_doctype)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   569
        self.assertEquals(soup.contents[0],
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   570
                          'DOCTYPE xsl:stylesheet SYSTEM "htmlent.dtd"')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   571
        self.assertEquals(soup.html.contents[0], 'foo')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   572
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   573
    def testEntityConversions(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   574
        text = "&lt;&lt;sacr&eacute;&#32;bleu!&gt;&gt;"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   575
        soup = BeautifulStoneSoup(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   576
        self.assertSoupEquals(text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   577
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   578
        xmlEnt = BeautifulStoneSoup.XML_ENTITIES
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   579
        htmlEnt = BeautifulStoneSoup.HTML_ENTITIES
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   580
        xhtmlEnt = BeautifulStoneSoup.XHTML_ENTITIES
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   581
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   582
        soup = BeautifulStoneSoup(text, convertEntities=xmlEnt)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   583
        self.assertEquals(soup.decode(), "<<sacr&eacute; bleu!>>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   584
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   585
        soup = BeautifulStoneSoup(text, convertEntities=xmlEnt)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   586
        self.assertEquals(soup.decode(), "<<sacr&eacute; bleu!>>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   587
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   588
        soup = BeautifulStoneSoup(text, convertEntities=htmlEnt)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   589
        self.assertEquals(soup.decode(), u"<<sacr\xe9 bleu!>>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   590
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   591
        # Make sure the "XML", "HTML", and "XHTML" settings work.
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   592
        text = "&lt;&trade;&apos;"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   593
        soup = BeautifulStoneSoup(text, convertEntities=xmlEnt)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   594
        self.assertEquals(soup.decode(), u"<&trade;'")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   595
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   596
        soup = BeautifulStoneSoup(text, convertEntities=htmlEnt)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   597
        self.assertEquals(soup.decode(), u"<\u2122&apos;")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   598
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   599
        soup = BeautifulStoneSoup(text, convertEntities=xhtmlEnt)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   600
        self.assertEquals(soup.decode(), u"<\u2122'")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   601
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   602
    def testNonBreakingSpaces(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   603
        soup = BeautifulSoup("<a>&nbsp;&nbsp;</a>",
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   604
                             convertEntities=BeautifulStoneSoup.HTML_ENTITIES)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   605
        self.assertEquals(soup.decode(), u"<a>\xa0\xa0</a>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   606
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   607
    def testWhitespaceInDeclaration(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   608
        self.assertSoupEquals('<! DOCTYPE>', '<!DOCTYPE>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   609
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   610
    def testJunkInDeclaration(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   611
        self.assertSoupEquals('<! Foo = -8>a', '<!Foo = -8>a')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   612
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   613
    def testIncompleteDeclaration(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   614
        self.assertSoupEquals('a<!b <p>c')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   615
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   616
    def testEntityReplacement(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   617
        self.assertSoupEquals('<b>hello&nbsp;there</b>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   618
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   619
    def testEntitiesInAttributeValues(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   620
        self.assertSoupEquals('<x t="x&#241;">', '<x t="x\xc3\xb1"></x>',
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   621
                              encoding='utf-8')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   622
        self.assertSoupEquals('<x t="x&#xf1;">', '<x t="x\xc3\xb1"></x>',
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   623
                              encoding='utf-8')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   624
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   625
        soup = BeautifulSoup('<x t="&gt;&trade;">',
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   626
                             convertEntities=BeautifulStoneSoup.HTML_ENTITIES)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   627
        self.assertEquals(soup.decode(), u'<x t="&gt;\u2122"></x>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   628
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   629
        uri = "http://crummy.com?sacr&eacute;&amp;bleu"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   630
        link = '<a href="%s"></a>' % uri
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   631
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   632
        soup = BeautifulSoup(link, convertEntities=BeautifulSoup.HTML_ENTITIES)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   633
        self.assertEquals(soup.decode(),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   634
                          link.replace("&eacute;", u"\xe9"))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   635
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   636
        uri = "http://crummy.com?sacr&eacute;&bleu"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   637
        link = '<a href="%s"></a>' % uri
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   638
        soup = BeautifulSoup(link, convertEntities=BeautifulSoup.HTML_ENTITIES)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   639
        self.assertEquals(soup.a['href'],
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   640
                          uri.replace("&eacute;", u"\xe9"))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   641
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   642
    def testNakedAmpersands(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   643
        html = {'convertEntities':BeautifulStoneSoup.HTML_ENTITIES}
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   644
        soup = BeautifulStoneSoup("AT&T ", **html)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   645
        self.assertEquals(soup.decode(), 'AT&amp;T ')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   646
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   647
        nakedAmpersandInASentence = "AT&T was Ma Bell"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   648
        soup = BeautifulStoneSoup(nakedAmpersandInASentence,**html)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   649
        self.assertEquals(soup.decode(), \
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   650
               nakedAmpersandInASentence.replace('&','&amp;'))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   651
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   652
        invalidURL = '<a href="http://example.org?a=1&b=2;3">foo</a>'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   653
        validURL = invalidURL.replace('&','&amp;')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   654
        soup = BeautifulStoneSoup(invalidURL)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   655
        self.assertEquals(soup.decode(), validURL)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   656
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   657
        soup = BeautifulStoneSoup(validURL)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   658
        self.assertEquals(soup.decode(), validURL)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   659
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   660
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   661
class EncodeRed(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   662
    """Tests encoding conversion, Unicode conversion, and Microsoft
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   663
    smart quote fixes."""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   664
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   665
    def testUnicodeDammitStandalone(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   666
        markup = "<foo>\x92</foo>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   667
        dammit = UnicodeDammit(markup)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   668
        self.assertEquals(dammit.unicode, "<foo>&#x2019;</foo>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   669
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   670
        hebrew = "\xed\xe5\xec\xf9"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   671
        dammit = UnicodeDammit(hebrew, ["iso-8859-8"])
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   672
        self.assertEquals(dammit.unicode, u'\u05dd\u05d5\u05dc\u05e9')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   673
        self.assertEquals(dammit.originalEncoding, 'iso-8859-8')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   674
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   675
    def testGarbageInGarbageOut(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   676
        ascii = "<foo>a</foo>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   677
        asciiSoup = BeautifulStoneSoup(ascii)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   678
        self.assertEquals(ascii, asciiSoup.decode())
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   679
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   680
        unicodeData = u"<foo>\u00FC</foo>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   681
        utf8 = unicodeData.encode("utf-8")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   682
        self.assertEquals(utf8, '<foo>\xc3\xbc</foo>')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   683
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   684
        unicodeSoup = BeautifulStoneSoup(unicodeData)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   685
        self.assertEquals(unicodeData, unicodeSoup.decode())
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   686
        self.assertEquals(unicodeSoup.foo.string, u'\u00FC')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   687
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   688
        utf8Soup = BeautifulStoneSoup(utf8, fromEncoding='utf-8')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   689
        self.assertEquals(utf8, utf8Soup.encode('utf-8'))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   690
        self.assertEquals(utf8Soup.originalEncoding, "utf-8")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   691
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   692
        utf8Soup = BeautifulStoneSoup(unicodeData)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   693
        self.assertEquals(utf8, utf8Soup.encode('utf-8'))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   694
        self.assertEquals(utf8Soup.originalEncoding, None)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   695
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   696
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   697
    def testHandleInvalidCodec(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   698
        for bad_encoding in ['.utf8', '...', 'utF---16.!']:
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   699
            soup = BeautifulSoup(u"Räksmörgås".encode("utf-8"),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   700
                                 fromEncoding=bad_encoding)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   701
            self.assertEquals(soup.originalEncoding, 'utf-8')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   702
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   703
    def testUnicodeSearch(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   704
        html = u'<html><body><h1>Räksmörgås</h1></body></html>'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   705
        soup = BeautifulSoup(html)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   706
        self.assertEqual(soup.find(text=u'Räksmörgås'),u'Räksmörgås')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   707
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   708
    def testRewrittenXMLHeader(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   709
        euc_jp = '<?xml version="1.0 encoding="euc-jp"?>\n<foo>\n\xa4\xb3\xa4\xec\xa4\xcfEUC-JP\xa4\xc7\xa5\xb3\xa1\xbc\xa5\xc7\xa5\xa3\xa5\xf3\xa5\xb0\xa4\xb5\xa4\xec\xa4\xbf\xc6\xfc\xcb\xdc\xb8\xec\xa4\xce\xa5\xd5\xa5\xa1\xa5\xa4\xa5\xeb\xa4\xc7\xa4\xb9\xa1\xa3\n</foo>\n'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   710
        utf8 = "<?xml version='1.0' encoding='utf-8'?>\n<foo>\n\xe3\x81\x93\xe3\x82\x8c\xe3\x81\xafEUC-JP\xe3\x81\xa7\xe3\x82\xb3\xe3\x83\xbc\xe3\x83\x87\xe3\x82\xa3\xe3\x83\xb3\xe3\x82\xb0\xe3\x81\x95\xe3\x82\x8c\xe3\x81\x9f\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e\xe3\x81\xae\xe3\x83\x95\xe3\x82\xa1\xe3\x82\xa4\xe3\x83\xab\xe3\x81\xa7\xe3\x81\x99\xe3\x80\x82\n</foo>\n"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   711
        soup = BeautifulStoneSoup(euc_jp)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   712
        if soup.originalEncoding != "euc-jp":
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   713
            raise Exception("Test failed when parsing euc-jp document. "
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   714
                            "If you're running Python >=2.4, or you have "
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   715
                            "cjkcodecs installed, this is a real problem. "
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   716
                            "Otherwise, ignore it.")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   717
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   718
        self.assertEquals(soup.originalEncoding, "euc-jp")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   719
        self.assertEquals(soup.renderContents('utf-8'), utf8)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   720
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   721
        old_text = "<?xml encoding='windows-1252'><foo>\x92</foo>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   722
        new_text = "<?xml version='1.0' encoding='utf-8'?><foo>&rsquo;</foo>"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   723
        self.assertSoupEquals(old_text, new_text)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   724
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   725
    def testRewrittenMetaTag(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   726
        no_shift_jis_html = '''<html><head>\n<meta http-equiv="Content-language" content="ja" /></head><body><pre>\n\x82\xb1\x82\xea\x82\xcdShift-JIS\x82\xc5\x83R\x81[\x83f\x83B\x83\x93\x83O\x82\xb3\x82\xea\x82\xbd\x93\xfa\x96{\x8c\xea\x82\xcc\x83t\x83@\x83C\x83\x8b\x82\xc5\x82\xb7\x81B\n</pre></body></html>'''
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   727
        soup = BeautifulSoup(no_shift_jis_html)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   728
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   729
        # Beautiful Soup used to try to rewrite the meta tag even if the
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   730
        # meta tag got filtered out by the strainer. This test makes
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   731
        # sure that doesn't happen.
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   732
        strainer = SoupStrainer('pre')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   733
        soup = BeautifulSoup(no_shift_jis_html, parseOnlyThese=strainer)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   734
        self.assertEquals(soup.contents[0].name, 'pre')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   735
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   736
        meta_tag = ('<meta content="text/html; charset=x-sjis" '
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   737
                    'http-equiv="Content-type" />')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   738
        shift_jis_html = (
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   739
            '<html><head>\n%s\n'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   740
            '<meta http-equiv="Content-language" content="ja" />'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   741
            '</head><body><pre>\n'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   742
            '\x82\xb1\x82\xea\x82\xcdShift-JIS\x82\xc5\x83R\x81[\x83f'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   743
            '\x83B\x83\x93\x83O\x82\xb3\x82\xea\x82\xbd\x93\xfa\x96{\x8c'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   744
            '\xea\x82\xcc\x83t\x83@\x83C\x83\x8b\x82\xc5\x82\xb7\x81B\n'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   745
            '</pre></body></html>') % meta_tag
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   746
        soup = BeautifulSoup(shift_jis_html)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   747
        if soup.originalEncoding != "shift-jis":
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   748
            raise Exception("Test failed when parsing shift-jis document "
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   749
                            "with meta tag '%s'."
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   750
                            "If you're running Python >=2.4, or you have "
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   751
                            "cjkcodecs installed, this is a real problem. "
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   752
                            "Otherwise, ignore it." % meta_tag)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   753
        self.assertEquals(soup.originalEncoding, "shift-jis")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   754
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   755
        content_type_tag = soup.meta['content']
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   756
        self.assertEquals(content_type_tag[content_type_tag.find('charset='):],
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   757
                          'charset=%SOUP-ENCODING%')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   758
        content_type = str(soup.meta)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   759
        index = content_type.find('charset=')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   760
        self.assertEqual(content_type[index:index+len('charset=utf8')+1],
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   761
                         'charset=utf-8')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   762
        content_type = soup.meta.encode('shift-jis')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   763
        index = content_type.find('charset=')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   764
        self.assertEqual(content_type[index:index+len('charset=shift-jis')],
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   765
                         'charset=shift-jis'.encode())
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   766
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   767
        self.assertEquals(soup.encode('utf-8'), (
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   768
                '<html><head>\n'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   769
                '<meta content="text/html; charset=utf-8" '
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   770
                'http-equiv="Content-type" />\n'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   771
                '<meta http-equiv="Content-language" content="ja" />'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   772
                '</head><body><pre>\n'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   773
                '\xe3\x81\x93\xe3\x82\x8c\xe3\x81\xafShift-JIS\xe3\x81\xa7\xe3'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   774
                '\x82\xb3\xe3\x83\xbc\xe3\x83\x87\xe3\x82\xa3\xe3\x83\xb3\xe3'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   775
                '\x82\xb0\xe3\x81\x95\xe3\x82\x8c\xe3\x81\x9f\xe6\x97\xa5\xe6'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   776
                '\x9c\xac\xe8\xaa\x9e\xe3\x81\xae\xe3\x83\x95\xe3\x82\xa1\xe3'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   777
                '\x82\xa4\xe3\x83\xab\xe3\x81\xa7\xe3\x81\x99\xe3\x80\x82\n'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   778
                '</pre></body></html>'))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   779
        self.assertEquals(soup.encode("shift-jis"),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   780
                          shift_jis_html.replace('x-sjis'.encode(),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   781
                                                 'shift-jis'.encode()))
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   782
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   783
        isolatin = """<html><meta http-equiv="Content-type" content="text/html; charset=ISO-Latin-1" />Sacr\xe9 bleu!</html>"""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   784
        soup = BeautifulSoup(isolatin)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   785
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   786
        utf8 = isolatin.replace("ISO-Latin-1".encode(), "utf-8".encode())
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   787
        utf8 = utf8.replace("\xe9", "\xc3\xa9")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   788
        self.assertSoupEquals(soup.encode("utf-8"), utf8, encoding='utf-8')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   789
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   790
    def testHebrew(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   791
        iso_8859_8= '<HEAD>\n<TITLE>Hebrew (ISO 8859-8) in Visual Directionality</TITLE>\n\n\n\n</HEAD>\n<BODY>\n<H1>Hebrew (ISO 8859-8) in Visual Directionality</H1>\n\xed\xe5\xec\xf9\n</BODY>\n'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   792
        utf8 = '<head>\n<title>Hebrew (ISO 8859-8) in Visual Directionality</title>\n</head>\n<body>\n<h1>Hebrew (ISO 8859-8) in Visual Directionality</h1>\n\xd7\x9d\xd7\x95\xd7\x9c\xd7\xa9\n</body>\n'
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   793
        soup = BeautifulStoneSoup(iso_8859_8, fromEncoding="iso-8859-8")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   794
        self.assertEquals(soup.encode('utf-8'), utf8)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   795
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   796
    def testSmartQuotesNotSoSmartAnymore(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   797
        self.assertSoupEquals("\x91Foo\x92 <!--blah-->",
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   798
                              '&lsquo;Foo&rsquo; <!--blah-->')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   799
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   800
    def testDontConvertSmartQuotesWhenAlsoConvertingEntities(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   801
        smartQuotes = "Il a dit, \x8BSacr&eacute; bl&#101;u!\x9b"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   802
        soup = BeautifulSoup(smartQuotes)
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   803
        self.assertEquals(soup.decode(),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   804
                          'Il a dit, &lsaquo;Sacr&eacute; bl&#101;u!&rsaquo;')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   805
        soup = BeautifulSoup(smartQuotes, convertEntities="html")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   806
        self.assertEquals(soup.encode('utf-8'),
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   807
                          'Il a dit, \xe2\x80\xb9Sacr\xc3\xa9 bleu!\xe2\x80\xba')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   808
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   809
    def testDontSeeSmartQuotesWhereThereAreNone(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   810
        utf_8 = "\343\202\261\343\203\274\343\202\277\343\202\244 Watch"
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   811
        self.assertSoupEquals(utf_8, encoding='utf-8')
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   812
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   813
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   814
class Whitewash(SoupTest):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   815
    """Test whitespace preservation."""
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   816
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   817
    def testPreservedWhitespace(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   818
        self.assertSoupEquals("<pre>   </pre>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   819
        self.assertSoupEquals("<pre> woo  </pre>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   820
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   821
    def testCollapsedWhitespace(self):
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   822
        self.assertSoupEquals("<p>   </p>", "<p> </p>")
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   823
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   824
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   825
if __name__ == '__main__':
b3daada52dd3 Add BeautifulSoup Python HTML/XML parser to Melange repository.
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
diff changeset
   826
    unittest.main()