app/django/utils/stopwords.py
author Todd Larsen <tlarsen@google.com>
Wed, 21 Jan 2009 00:27:39 +0000
changeset 858 e79e7a22326f
parent 54 03e267d67478
permissions -rw-r--r--
Add an export() view, and implement it as text/text for Document. For every Model except Document, the public() view is displayed for any attempts to access the export() view. Currently, the permissions for export() are the same as for public(). This seems reasonable for Document, since anyone could extract the raw HTML from the page source anyway. The permissions should probably be different for other types of exports, such as vCard or iCard exports of profiles, CSV exports of lists, etc. Patch by: Todd Larsen Review by: to-be-reviewed

# Performance note: I benchmarked this code using a set instead of
# a list for the stopwords and was surprised to find that the list
# performed /better/ than the set - maybe because it's only a small
# list.

stopwords = '''
i
a
an
are
as
at
be
by
for
from
how
in
is
it
of
on
or
that
the
this
to
was
what
when
where
'''.split()

def strip_stopwords(sentence):
    "Removes stopwords - also normalizes whitespace"
    words = sentence.split()
    sentence = []
    for word in words:
        if word.lower() not in stopwords:
            sentence.append(word)
    return u' '.join(sentence)