app/django/utils/stopwords.py
author Pawel Solyga <Pawel.Solyga@gmail.com>
Sun, 08 Mar 2009 00:00:13 +0000
changeset 1731 254375a57d62
parent 54 03e267d67478
permissions -rw-r--r--
Add json2.js to repository under json folder, update build script and app.yaml.template files. This is pretty useful set of functions for JSON manipulation in Javascript, we will mostly use stringify function. This code is on Public Domain license and comes from http://www.json.org/js.html. Patch by: Pawel Solyga Reviewed by: to-be-reviewed

# Performance note: I benchmarked this code using a set instead of
# a list for the stopwords and was surprised to find that the list
# performed /better/ than the set - maybe because it's only a small
# list.

stopwords = '''
i
a
an
are
as
at
be
by
for
from
how
in
is
it
of
on
or
that
the
this
to
was
what
when
where
'''.split()

def strip_stopwords(sentence):
    "Removes stopwords - also normalizes whitespace"
    words = sentence.split()
    sentence = []
    for word in words:
        if word.lower() not in stopwords:
            sentence.append(word)
    return u' '.join(sentence)