app/django/utils/stopwords.py
author Sverre Rabbelier <srabbelier@gmail.com>
Mon, 03 Nov 2008 19:47:02 +0000
changeset 439 0658c3c9a9dc
parent 54 03e267d67478
permissions -rw-r--r--
Minor fixes needed for generic key name We no longer try to retreive an entity when there are unset fields. This sort of makes 'getIfFields' obsolete, since we check if fields now anyway. This is needed because getKeyFieldsFromDict expects the fields to be set. Also a minor fix in a Django template so that the generic 'edit' page has a working delete button again.

# Performance note: I benchmarked this code using a set instead of
# a list for the stopwords and was surprised to find that the list
# performed /better/ than the set - maybe because it's only a small
# list.

stopwords = '''
i
a
an
are
as
at
be
by
for
from
how
in
is
it
of
on
or
that
the
this
to
was
what
when
where
'''.split()

def strip_stopwords(sentence):
    "Removes stopwords - also normalizes whitespace"
    words = sentence.split()
    sentence = []
    for word in words:
        if word.lower() not in stopwords:
            sentence.append(word)
    return u' '.join(sentence)