author | Pawel Solyga <Pawel.Solyga@gmail.com> |
Thu, 12 Feb 2009 12:30:36 +0000 | |
changeset 1278 | a7766286a7be |
parent 297 | 35211afcd563 |
child 2309 | be1b94099f2d |
permissions | -rwxr-xr-x |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
1 |
#!/usr/bin/env python |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
2 |
# |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
3 |
# Copyright 2007 Google Inc. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
4 |
# |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
5 |
# Licensed under the Apache License, Version 2.0 (the "License"); |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
6 |
# you may not use this file except in compliance with the License. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
7 |
# You may obtain a copy of the License at |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
8 |
# |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
9 |
# http://www.apache.org/licenses/LICENSE-2.0 |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
10 |
# |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
11 |
# Unless required by applicable law or agreed to in writing, software |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
12 |
# distributed under the License is distributed on an "AS IS" BASIS, |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
13 |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
14 |
# See the License for the specific language governing permissions and |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
15 |
# limitations under the License. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
16 |
# |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
17 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
18 |
"""Full text indexing and search, implemented in pure python. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
19 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
20 |
Defines a SearchableModel subclass of db.Model that supports full text |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
21 |
indexing and search, based on the datastore's existing indexes. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
22 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
23 |
Don't expect too much. First, there's no ranking, which is a killer drawback. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
24 |
There's also no exact phrase match, substring match, boolean operators, |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
25 |
stemming, or other common full text search features. Finally, support for stop |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
26 |
words (common words that are not indexed) is currently limited to English. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
27 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
28 |
To be indexed, entities must be created and saved as SearchableModel |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
29 |
instances, e.g.: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
30 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
31 |
class Article(search.SearchableModel): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
32 |
text = db.TextProperty() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
33 |
... |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
34 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
35 |
article = Article(text=...) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
36 |
article.save() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
37 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
38 |
To search the full text index, use the SearchableModel.all() method to get an |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
39 |
instance of SearchableModel.Query, which subclasses db.Query. Use its search() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
40 |
method to provide a search query, in addition to any other filters or sort |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
41 |
orders, e.g.: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
42 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
43 |
query = article.all().search('a search query').filter(...).order(...) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
44 |
for result in query: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
45 |
... |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
46 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
47 |
The full text index is stored in a property named __searchable_text_index. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
48 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
49 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
50 |
In general, if you just want to provide full text search, you *don't* need to |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
51 |
add any extra indexes to your index.yaml. However, if you want to use search() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
52 |
in a query *in addition to* an ancestor, filter, or sort order, you'll need to |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
53 |
create an index in index.yaml with the __searchable_text_index property. For |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
54 |
example: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
55 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
56 |
- kind: Article |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
57 |
properties: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
58 |
- name: __searchable_text_index |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
59 |
- name: date |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
60 |
direction: desc |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
61 |
... |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
62 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
63 |
Note that using SearchableModel will noticeable increase the latency of save() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
64 |
operations, since it writes an index row for each indexable word. This also |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
65 |
means that the latency of save() will increase roughly with the size of the |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
66 |
properties in a given entity. Caveat hacker! |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
67 |
""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
68 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
69 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
70 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
71 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
72 |
import re |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
73 |
import string |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
74 |
import sys |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
75 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
76 |
from google.appengine.api import datastore |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
77 |
from google.appengine.api import datastore_errors |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
78 |
from google.appengine.api import datastore_types |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
79 |
from google.appengine.ext import db |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
80 |
from google.appengine.datastore import datastore_pb |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
81 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
82 |
class SearchableEntity(datastore.Entity): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
83 |
"""A subclass of datastore.Entity that supports full text indexing. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
84 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
85 |
Automatically indexes all string and Text properties, using the datastore's |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
86 |
built-in per-property indices. To search, use the SearchableQuery class and |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
87 |
its Search() method. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
88 |
""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
89 |
_FULL_TEXT_INDEX_PROPERTY = '__searchable_text_index' |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
90 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
91 |
_FULL_TEXT_MIN_LENGTH = 3 |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
92 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
93 |
_FULL_TEXT_STOP_WORDS = frozenset([ |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
94 |
'a', 'about', 'according', 'accordingly', 'affected', 'affecting', 'after', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
95 |
'again', 'against', 'all', 'almost', 'already', 'also', 'although', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
96 |
'always', 'am', 'among', 'an', 'and', 'any', 'anyone', 'apparently', 'are', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
97 |
'arise', 'as', 'aside', 'at', 'away', 'be', 'became', 'because', 'become', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
98 |
'becomes', 'been', 'before', 'being', 'between', 'both', 'briefly', 'but', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
99 |
'by', 'came', 'can', 'cannot', 'certain', 'certainly', 'could', 'did', 'do', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
100 |
'does', 'done', 'during', 'each', 'either', 'else', 'etc', 'ever', 'every', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
101 |
'following', 'for', 'found', 'from', 'further', 'gave', 'gets', 'give', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
102 |
'given', 'giving', 'gone', 'got', 'had', 'hardly', 'has', 'have', 'having', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
103 |
'here', 'how', 'however', 'i', 'if', 'in', 'into', 'is', 'it', 'itself', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
104 |
'just', 'keep', 'kept', 'knowledge', 'largely', 'like', 'made', 'mainly', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
105 |
'make', 'many', 'might', 'more', 'most', 'mostly', 'much', 'must', 'nearly', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
106 |
'necessarily', 'neither', 'next', 'no', 'none', 'nor', 'normally', 'not', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
107 |
'noted', 'now', 'obtain', 'obtained', 'of', 'often', 'on', 'only', 'or', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
108 |
'other', 'our', 'out', 'owing', 'particularly', 'past', 'perhaps', 'please', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
109 |
'poorly', 'possible', 'possibly', 'potentially', 'predominantly', 'present', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
110 |
'previously', 'primarily', 'probably', 'prompt', 'promptly', 'put', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
111 |
'quickly', 'quite', 'rather', 'readily', 'really', 'recently', 'regarding', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
112 |
'regardless', 'relatively', 'respectively', 'resulted', 'resulting', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
113 |
'results', 'said', 'same', 'seem', 'seen', 'several', 'shall', 'should', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
114 |
'show', 'showed', 'shown', 'shows', 'significantly', 'similar', 'similarly', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
115 |
'since', 'slightly', 'so', 'some', 'sometime', 'somewhat', 'soon', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
116 |
'specifically', 'state', 'states', 'strongly', 'substantially', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
117 |
'successfully', 'such', 'sufficiently', 'than', 'that', 'the', 'their', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
118 |
'theirs', 'them', 'then', 'there', 'therefore', 'these', 'they', 'this', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
119 |
'those', 'though', 'through', 'throughout', 'to', 'too', 'toward', 'under', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
120 |
'unless', 'until', 'up', 'upon', 'use', 'used', 'usefully', 'usefulness', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
121 |
'using', 'usually', 'various', 'very', 'was', 'we', 'were', 'what', 'when', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
122 |
'where', 'whether', 'which', 'while', 'who', 'whose', 'why', 'widely', |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
123 |
'will', 'with', 'within', 'without', 'would', 'yet', 'you']) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
124 |
|
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
125 |
_word_delimiter_regex = re.compile('[' + re.escape(string.punctuation) + ']') |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
126 |
|
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
127 |
def __init__(self, kind_or_entity, word_delimiter_regex=None, *args, |
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
128 |
**kwargs): |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
129 |
"""Constructor. May be called as a copy constructor. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
130 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
131 |
If kind_or_entity is a datastore.Entity, copies it into this Entity. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
132 |
datastore.Get() and Query() returns instances of datastore.Entity, so this |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
133 |
is useful for converting them back to SearchableEntity so that they'll be |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
134 |
indexed when they're stored back in the datastore. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
135 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
136 |
Otherwise, passes through the positional and keyword args to the |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
137 |
datastore.Entity constructor. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
138 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
139 |
Args: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
140 |
kind_or_entity: string or datastore.Entity |
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
141 |
word_delimiter_regex: a regex matching characters that delimit words |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
142 |
""" |
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
143 |
self._word_delimiter_regex = word_delimiter_regex |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
144 |
if isinstance(kind_or_entity, datastore.Entity): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
145 |
self._Entity__key = kind_or_entity._Entity__key |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
146 |
self.update(kind_or_entity) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
147 |
else: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
148 |
super(SearchableEntity, self).__init__(kind_or_entity, *args, **kwargs) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
149 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
150 |
def _ToPb(self): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
151 |
"""Rebuilds the full text index, then delegates to the superclass. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
152 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
153 |
Returns: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
154 |
entity_pb.Entity |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
155 |
""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
156 |
if SearchableEntity._FULL_TEXT_INDEX_PROPERTY in self: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
157 |
del self[SearchableEntity._FULL_TEXT_INDEX_PROPERTY] |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
158 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
159 |
index = set() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
160 |
for (name, values) in self.items(): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
161 |
if not isinstance(values, list): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
162 |
values = [values] |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
163 |
if (isinstance(values[0], basestring) and |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
164 |
not isinstance(values[0], datastore_types.Blob)): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
165 |
for value in values: |
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
166 |
index.update(SearchableEntity._FullTextIndex( |
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
167 |
value, self._word_delimiter_regex)) |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
168 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
169 |
index_list = list(index) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
170 |
if index_list: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
171 |
self[SearchableEntity._FULL_TEXT_INDEX_PROPERTY] = index_list |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
172 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
173 |
return super(SearchableEntity, self)._ToPb() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
174 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
175 |
@classmethod |
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
176 |
def _FullTextIndex(cls, text, word_delimiter_regex=None): |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
177 |
"""Returns a set of keywords appropriate for full text indexing. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
178 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
179 |
See SearchableQuery.Search() for details. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
180 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
181 |
Args: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
182 |
text: string |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
183 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
184 |
Returns: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
185 |
set of strings |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
186 |
""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
187 |
|
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
188 |
if word_delimiter_regex is None: |
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
189 |
word_delimiter_regex = cls._word_delimiter_regex |
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
190 |
|
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
191 |
if text: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
192 |
datastore_types.ValidateString(text, 'text', max_len=sys.maxint) |
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
193 |
text = word_delimiter_regex.sub(' ', text) |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
194 |
words = text.lower().split() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
195 |
|
149
f2e327a7c5de
Load ../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
109
diff
changeset
|
196 |
words = set(unicode(w) for w in words) |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
197 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
198 |
words -= cls._FULL_TEXT_STOP_WORDS |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
199 |
for word in list(words): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
200 |
if len(word) < cls._FULL_TEXT_MIN_LENGTH: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
201 |
words.remove(word) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
202 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
203 |
else: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
204 |
words = set() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
205 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
206 |
return words |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
207 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
208 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
209 |
class SearchableQuery(datastore.Query): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
210 |
"""A subclass of datastore.Query that supports full text search. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
211 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
212 |
Only searches over entities that were created and stored using the |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
213 |
SearchableEntity or SearchableModel classes. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
214 |
""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
215 |
|
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
216 |
def Search(self, search_query, word_delimiter_regex=None): |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
217 |
"""Add a search query. This may be combined with filters. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
218 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
219 |
Note that keywords in the search query will be silently dropped if they |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
220 |
are stop words or too short, ie if they wouldn't be indexed. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
221 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
222 |
Args: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
223 |
search_query: string |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
224 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
225 |
Returns: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
226 |
# this query |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
227 |
SearchableQuery |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
228 |
""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
229 |
datastore_types.ValidateString(search_query, 'search query') |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
230 |
self._search_query = search_query |
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
231 |
self._word_delimiter_regex = word_delimiter_regex |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
232 |
return self |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
233 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
234 |
def _ToPb(self, limit=None, offset=None): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
235 |
"""Adds filters for the search query, then delegates to the superclass. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
236 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
237 |
Raises BadFilterError if a filter on the index property already exists. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
238 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
239 |
Args: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
240 |
# an upper bound on the number of results returned by the query. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
241 |
limit: int |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
242 |
# number of results that match the query to skip. limit is applied |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
243 |
# after the offset is fulfilled. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
244 |
offset: int |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
245 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
246 |
Returns: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
247 |
datastore_pb.Query |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
248 |
""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
249 |
if SearchableEntity._FULL_TEXT_INDEX_PROPERTY in self: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
250 |
raise datastore_errors.BadFilterError( |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
251 |
'%s is a reserved name.' % SearchableEntity._FULL_TEXT_INDEX_PROPERTY) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
252 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
253 |
pb = super(SearchableQuery, self)._ToPb(limit=limit, offset=offset) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
254 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
255 |
if hasattr(self, '_search_query'): |
297
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
256 |
keywords = SearchableEntity._FullTextIndex( |
35211afcd563
Load /Users/solydzajs/Downloads/google_appengine/ into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
149
diff
changeset
|
257 |
self._search_query, self._word_delimiter_regex) |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
258 |
for keyword in keywords: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
259 |
filter = pb.add_filter() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
260 |
filter.set_op(datastore_pb.Query_Filter.EQUAL) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
261 |
prop = filter.add_property() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
262 |
prop.set_name(SearchableEntity._FULL_TEXT_INDEX_PROPERTY) |
1278
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
263 |
prop.set_multiple(len(keywords) > 1) |
149
f2e327a7c5de
Load ../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
109
diff
changeset
|
264 |
prop.mutable_value().set_stringvalue(unicode(keyword).encode('utf-8')) |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
265 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
266 |
return pb |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
267 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
268 |
|
1278
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
269 |
class SearchableMultiQuery(datastore.MultiQuery): |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
270 |
"""A multiquery that supports Search() by searching subqueries.""" |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
271 |
|
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
272 |
def Search(self, *args, **kwargs): |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
273 |
"""Add a search query, by trying to add it to all subqueries. |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
274 |
|
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
275 |
Args: |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
276 |
args: Passed to Search on each subquery. |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
277 |
kwargs: Passed to Search on each subquery. |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
278 |
|
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
279 |
Returns: |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
280 |
self for consistency with SearchableQuery. |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
281 |
""" |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
282 |
for q in self: |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
283 |
q.Search(*args, **kwargs) |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
284 |
return self |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
285 |
|
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
286 |
|
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
287 |
class SearchableModel(db.Model): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
288 |
"""A subclass of db.Model that supports full text search and indexing. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
289 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
290 |
Automatically indexes all string-based properties. To search, use the all() |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
291 |
method to get a SearchableModel.Query, then use its search() method. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
292 |
""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
293 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
294 |
class Query(db.Query): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
295 |
"""A subclass of db.Query that supports full text search.""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
296 |
_search_query = None |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
297 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
298 |
def search(self, search_query): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
299 |
"""Adds a full text search to this query. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
300 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
301 |
Args: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
302 |
search_query, a string containing the full text search query. |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
303 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
304 |
Returns: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
305 |
self |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
306 |
""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
307 |
self._search_query = search_query |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
308 |
return self |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
309 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
310 |
def _get_query(self): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
311 |
"""Wraps db.Query._get_query() and injects SearchableQuery.""" |
1278
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
312 |
query = db.Query._get_query(self, |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
313 |
_query_class=SearchableQuery, |
a7766286a7be
Load /Users/solydzajs/Downloads/google_appengine into
Pawel Solyga <Pawel.Solyga@gmail.com>
parents:
297
diff
changeset
|
314 |
_multi_query_class=SearchableMultiQuery) |
109
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
315 |
if self._search_query: |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
316 |
query.Search(self._search_query) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
317 |
return query |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
318 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
319 |
def _populate_internal_entity(self): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
320 |
"""Wraps db.Model._populate_internal_entity() and injects |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
321 |
SearchableEntity.""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
322 |
return db.Model._populate_internal_entity(self, |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
323 |
_entity_class=SearchableEntity) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
324 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
325 |
@classmethod |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
326 |
def from_entity(cls, entity): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
327 |
"""Wraps db.Model.from_entity() and injects SearchableEntity.""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
328 |
if not isinstance(entity, SearchableEntity): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
329 |
entity = SearchableEntity(entity) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
330 |
return super(SearchableModel, cls).from_entity(entity) |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
331 |
|
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
332 |
@classmethod |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
333 |
def all(cls): |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
334 |
"""Returns a SearchableModel.Query for this kind.""" |
620f9b141567
Load ../../google_appengine into trunk/thirdparty/google_appengine.
Todd Larsen <tlarsen@google.com>
parents:
diff
changeset
|
335 |
return SearchableModel.Query(cls) |