app/django/utils/stopwords.py
author Madhusudan C.S. <madhusudancs@gmail.com>
Fri, 05 Jun 2009 21:27:03 +0200
changeset 2397 d943fa182fae
parent 54 03e267d67478
permissions -rw-r--r--
Moved the GHOP module into the modules package. This also includes moving the templates and content into their respective place inside the Soc folder. This is to avoid adding every folder to the app.yaml file. Patch by: Madhusudan C.S. and Lennard de Rijk Reviewed by: Lennard de Rijk

# Performance note: I benchmarked this code using a set instead of
# a list for the stopwords and was surprised to find that the list
# performed /better/ than the set - maybe because it's only a small
# list.

stopwords = '''
i
a
an
are
as
at
be
by
for
from
how
in
is
it
of
on
or
that
the
this
to
was
what
when
where
'''.split()

def strip_stopwords(sentence):
    "Removes stopwords - also normalizes whitespace"
    words = sentence.split()
    sentence = []
    for word in words:
        if word.lower() not in stopwords:
            sentence.append(word)
    return u' '.join(sentence)