app/django/utils/stopwords.py
author Madhusudan C.S <madhusudancs@gmail.com>
Sat, 25 Jul 2009 01:09:46 +0530
changeset 2679 0ede2f3adbc1
parent 54 03e267d67478
permissions -rw-r--r--
Adds to Melange a tags framework based on taggable-mixin. The taggable-mixin allowed only tag per Datastore model. This is extended framework allows any arbitrary number of tags per Datastore model. Also, now one can define different models for different Tag types which are all inherited from the base Tag model provided by taggable-mixin. The GHOPTask model makes use of 2 tags per model, one for difficulty and the other for task_type, both using the tags framework. Reviewed by: Paweł Sołyga

# Performance note: I benchmarked this code using a set instead of
# a list for the stopwords and was surprised to find that the list
# performed /better/ than the set - maybe because it's only a small
# list.

stopwords = '''
i
a
an
are
as
at
be
by
for
from
how
in
is
it
of
on
or
that
the
this
to
was
what
when
where
'''.split()

def strip_stopwords(sentence):
    "Removes stopwords - also normalizes whitespace"
    words = sentence.split()
    sentence = []
    for word in words:
        if word.lower() not in stopwords:
            sentence.append(word)
    return u' '.join(sentence)