app/django/utils/stopwords.py
author Todd Larsen <tlarsen@google.com>
Thu, 25 Sep 2008 17:17:11 +0000
changeset 202 b8b4a83788d4
parent 54 03e267d67478
permissions -rw-r--r--
A key_name controller module to collect all of the name...() functions that compose Model entity key names, plus some minor changes to other controller modules to illustrate the proposed use. Patch by: Todd Larsen Review by: Pawel Solyga Review URL: http://codereviews.googleopensourceprograms.com/804 Review URL: http://codereviews.googleopensourceprograms.com/804

# Performance note: I benchmarked this code using a set instead of
# a list for the stopwords and was surprised to find that the list
# performed /better/ than the set - maybe because it's only a small
# list.

stopwords = '''
i
a
an
are
as
at
be
by
for
from
how
in
is
it
of
on
or
that
the
this
to
was
what
when
where
'''.split()

def strip_stopwords(sentence):
    "Removes stopwords - also normalizes whitespace"
    words = sentence.split()
    sentence = []
    for word in words:
        if word.lower() not in stopwords:
            sentence.append(word)
    return u' '.join(sentence)