blob: d95fcf81fefedb4bf65e9b3048c67fe5621fd033 [file] [log] [blame]
Ian Maxon9c40a662018-02-09 12:42:56 -08001<!DOCTYPE html>
2<!--
3 | Generated by Apache Maven Doxia at 2018-02-09
4 | Rendered using Apache Maven Fluido Skin 1.3.0
5-->
6<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7 <head>
8 <meta charset="UTF-8" />
9 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
10 <meta name="Date-Revision-yyyymmdd" content="20180209" />
11 <meta http-equiv="Content-Language" content="en" />
12 <title>AsterixDB &#x2013; AsterixDB Support of Full-text search queries</title>
13 <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
14 <link rel="stylesheet" href="../css/site.css" />
15 <link rel="stylesheet" href="../css/print.css" media="print" />
16
17
18 <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
19
20
21
Ian Maxon9c40a662018-02-09 12:42:56 -080022
Ian Maxon9c40a662018-02-09 12:42:56 -080023
24 </head>
25 <body class="topBarDisabled">
26
27
28
29
30 <div class="container-fluid">
31 <div id="banner">
32 <div class="pull-left">
33 <a href=".././" id="bannerLeft">
34 <img src="../images/asterixlogo.png" alt="AsterixDB"/>
35 </a>
36 </div>
37 <div class="pull-right"> </div>
38 <div class="clear"><hr/></div>
39 </div>
40
41 <div id="breadcrumbs">
42 <ul class="breadcrumb">
43
44
45 <li id="publishDate">Last Published: 2018-02-09</li>
46
47
48
49 <li id="projectVersion" class="pull-right">Version: 0.9.3</li>
50
51 <li class="divider pull-right">|</li>
52
53 <li class="pull-right"> <a href="../index.html" title="Documentation Home">
54 Documentation Home</a>
55 </li>
56
57 </ul>
58 </div>
59
60
61 <div class="row-fluid">
62 <div id="leftColumn" class="span3">
63 <div class="well sidebar-nav">
64
65
66 <ul class="nav nav-list">
67 <li class="nav-header">Get Started - Installation</li>
68
69 <li>
70
71 <a href="../ncservice.html" title="Option 1: using NCService">
72 <i class="none"></i>
73 Option 1: using NCService</a>
74 </li>
75
76 <li>
77
78 <a href="../ansible.html" title="Option 2: using Ansible">
79 <i class="none"></i>
80 Option 2: using Ansible</a>
81 </li>
82
83 <li>
84
85 <a href="../aws.html" title="Option 3: using Amazon Web Services">
86 <i class="none"></i>
87 Option 3: using Amazon Web Services</a>
88 </li>
89
90 <li>
91
92 <a href="../yarn.html" title="Option 4: using YARN">
93 <i class="none"></i>
94 Option 4: using YARN</a>
95 </li>
96
97 <li>
98
99 <a href="../install.html" title="Option 5: using Managix (deprecated)">
100 <i class="none"></i>
101 Option 5: using Managix (deprecated)</a>
102 </li>
103 <li class="nav-header">AsterixDB Primer</li>
104
105 <li>
106
107 <a href="../sqlpp/primer-sqlpp.html" title="Option 1: using SQL++">
108 <i class="none"></i>
109 Option 1: using SQL++</a>
110 </li>
111
112 <li>
113
114 <a href="../aql/primer.html" title="Option 2: using AQL">
115 <i class="none"></i>
116 Option 2: using AQL</a>
117 </li>
118 <li class="nav-header">Data Model</li>
119
120 <li>
121
122 <a href="../datamodel.html" title="The Asterix Data Model">
123 <i class="none"></i>
124 The Asterix Data Model</a>
125 </li>
126 <li class="nav-header">Queries - SQL++</li>
127
128 <li>
129
130 <a href="../sqlpp/manual.html" title="The SQL++ Query Language">
131 <i class="none"></i>
132 The SQL++ Query Language</a>
133 </li>
134
135 <li>
136
137 <a href="../sqlpp/builtins.html" title="Builtin Functions">
138 <i class="none"></i>
139 Builtin Functions</a>
140 </li>
141 <li class="nav-header">Queries - AQL</li>
142
143 <li>
144
145 <a href="../aql/manual.html" title="The Asterix Query Language (AQL)">
146 <i class="none"></i>
147 The Asterix Query Language (AQL)</a>
148 </li>
149
150 <li>
151
152 <a href="../aql/builtins.html" title="Builtin Functions">
153 <i class="none"></i>
154 Builtin Functions</a>
155 </li>
156 <li class="nav-header">API/SDK</li>
157
158 <li>
159
160 <a href="../api.html" title="HTTP API">
161 <i class="none"></i>
162 HTTP API</a>
163 </li>
164
165 <li>
166
167 <a href="../csv.html" title="CSV Output">
168 <i class="none"></i>
169 CSV Output</a>
170 </li>
171 <li class="nav-header">Advanced Features</li>
172
173 <li class="active">
174
175 <a href="#"><i class="none"></i>Support of Full-text Queries</a>
176 </li>
177
178 <li>
179
180 <a href="../aql/externaldata.html" title="Accessing External Data">
181 <i class="none"></i>
182 Accessing External Data</a>
183 </li>
184
185 <li>
186
187 <a href="../feeds/tutorial.html" title="Support for Data Ingestion">
188 <i class="none"></i>
189 Support for Data Ingestion</a>
190 </li>
191
192 <li>
193
194 <a href="../udf.html" title="User Defined Functions">
195 <i class="none"></i>
196 User Defined Functions</a>
197 </li>
198
199 <li>
200
201 <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
202 <i class="none"></i>
203 Filter-Based LSM Index Acceleration</a>
204 </li>
205
206 <li>
207
208 <a href="../aql/similarity.html" title="Support of Similarity Queries">
209 <i class="none"></i>
210 Support of Similarity Queries</a>
211 </li>
212 </ul>
213
214
215
216 <hr class="divider" />
217
218 <div id="poweredBy">
219 <div class="clear"></div>
220 <div class="clear"></div>
221 <div class="clear"></div>
222 <a href=".././" title="AsterixDB" class="builtBy">
223 <img class="builtBy" alt="AsterixDB" src="../images/asterixlogo.png" />
224 </a>
225 </div>
226 </div>
227 </div>
228
229
230 <div id="bodyColumn" class="span9" >
231
232 <!-- ! Licensed to the Apache Software Foundation (ASF) under one
233 ! or more contributor license agreements. See the NOTICE file
234 ! distributed with this work for additional information
235 ! regarding copyright ownership. The ASF licenses this file
236 ! to you under the Apache License, Version 2.0 (the
237 ! "License"); you may not use this file except in compliance
238 ! with the License. You may obtain a copy of the License at
239 !
240 ! http://www.apache.org/licenses/LICENSE-2.0
241 !
242 ! Unless required by applicable law or agreed to in writing,
243 ! software distributed under the License is distributed on an
244 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
245 ! KIND, either express or implied. See the License for the
246 ! specific language governing permissions and limitations
247 ! under the License.
248 ! --><h1>AsterixDB Support of Full-text search queries</h1>
249<div class="section">
250<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
251
252<ul>
253
254<li><a href="#Motivation">Motivation</a></li>
255
256<li><a href="#Syntax">Syntax</a></li>
257
258<li><a href="#FulltextIndex">Creating and utilizing a Full-text index</a></li>
259</ul></div>
260<div class="section">
261<h2><a name="Motivation_Back_to_TOC"></a><a name="Motivation" id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
262<p>Full-Text Search (FTS) queries are widely used in applications where users need to find records that satisfy an FTS predicate, i.e., where simple string-based matching is not sufficient. These queries are important when finding documents that contain a certain keyword is crucial. FTS queries are different from substring matching queries in that FTS queries find their query predicates as exact keywords in the given string, rather than treating a query predicate as a sequence of characters. For example, an FTS query that finds &#x201c;rain&#x201d; correctly returns a document when it contains &#x201c;rain&#x201d; as a word. However, a substring-matching query returns a document whenever it contains &#x201c;rain&#x201d; as a substring, for instance, a document with &#x201c;brain&#x201d; or &#x201c;training&#x201d; would be returned as well.</p></div>
263<div class="section">
264<h2><a name="Syntax_Back_to_TOC"></a><a name="Syntax" id="Syntax">Syntax</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
265<p>The syntax of AsterixDB FTS follows a portion of the XQuery FullText Search syntax. Two basic forms are as follows:</p>
266
267<div class="source">
268<div class="source">
269<pre> ftcontains(Expression1, Expression2, {FullTextOption})
270 ftcontains(Expression1, Expression2)
271</pre></div></div>
272<p>For example, we can execute the following query to find tweet messages where the <tt>message-text</tt> field includes &#x201c;voice&#x201d; as a word. Please note that an FTS search is case-insensitive. Thus, &#x201c;Voice&#x201d; or &#x201c;voice&#x201d; will be evaluated as the same word.</p>
273
274<div class="source">
275<div class="source">
276<pre> use dataverse TinySocial;
277
278 for $msg in dataset TweetMessages
279 where ftcontains($msg.message-text, &quot;voice&quot;, {&quot;mode&quot;:&quot;any&quot;})
280 return {&quot;id&quot;: $msg.id}
281</pre></div></div>
282<p>The DDL and DML of TinySocial can be found in <a href="primer.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB">ADM: Modeling Semistructed Data in AsterixDB</a>.</p>
283<p>The same query can be also expressed in the SQL++.</p>
284
285<div class="source">
286<div class="source">
287<pre> use TinySocial;
288
289 select element {&quot;id&quot;:msg.id}
290 from TweetMessages as msg
291 where TinySocial.ftcontains(msg.`message-text`, &quot;voice&quot;, {&quot;mode&quot;:&quot;any&quot;})
292</pre></div></div>
293<p>The <tt>Expression1</tt> is an expression that should be evaluable as a string at runtime as in the above example where <tt>$msg.message-text</tt> is a string field. The <tt>Expression2</tt> can be a string, an (un)ordered list of string value(s), or an expression. In the last case, the given expression should be evaluable into one of the first two types, i.e., into a string value or an (un)ordered list of string value(s).</p>
294<p>The following examples are all valid expressions.</p>
295
296<div class="source">
297<div class="source">
298<pre> ... where ftcontains($msg.message-text, &quot;sound&quot;)
299 ... where ftcontains($msg.message-text, &quot;sound&quot;, {&quot;mode&quot;:&quot;any&quot;})
300 ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;any&quot;})
301 ... where ftcontains($msg.message-text, {{&quot;speed&quot;, &quot;stand&quot;, &quot;customization&quot;}}, {&quot;mode&quot;:&quot;all&quot;})
302 ... where ftcontains($msg.message-text, let $keyword_list := [&quot;voice&quot;, &quot;system&quot;] return $keyword_list, {&quot;mode&quot;:&quot;all&quot;})
303 ... where ftcontains($msg.message-text, $keyword_list, {&quot;mode&quot;:&quot;any&quot;})
304</pre></div></div>
305<p>In the last example above, <tt>$keyword_list</tt> should evaluate to a string or an (un)ordered list of string value(s).</p>
306<p>The last <tt>FullTextOption</tt> parameter clarifies the given FTS request. If you omit the <tt>FullTextOption</tt> parameter, then the default value will be set for each possible option. Currently, we only have one option named <tt>mode</tt>. And as we extend the FTS feature, more options will be added. Please note that the format of <tt>FullTextOption</tt> is a record, thus you need to put the option(s) in a record <tt>{}</tt>. The <tt>mode</tt> option indicates whether the given FTS query is a conjunctive (AND) or disjunctive (OR) search request. This option can be either <tt>&#x201c;any&#x201d;</tt> or <tt>&#x201c;all&#x201d;</tt>. The default value for <tt>mode</tt> is <tt>&#x201c;all&#x201d;</tt>. If one specifies <tt>&#x201c;any&#x201d;</tt>, a disjunctive search will be conducted. For example, the following query will find documents whose <tt>message-text</tt> field contains &#x201c;sound&#x201d; or &#x201c;system&#x201d;, so a document will be returned if it contains either &#x201c;sound&#x201d;, &#x201c;system&#x201d;, or both of the keywords.</p>
307
308<div class="source">
309<div class="source">
310<pre> ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;any&quot;})
311</pre></div></div>
312<p>The other option parameter,<tt>&#x201c;all&#x201d;</tt>, specifies a conjunctive search. The following examples will find the documents whose <tt>message-text</tt> field contains both &#x201c;sound&#x201d; and &#x201c;system&#x201d;. If a document contains only &#x201c;sound&#x201d; or &#x201c;system&#x201d; but not both, it will not be returned.</p>
313
314<div class="source">
315<div class="source">
316<pre> ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;all&quot;})
317 ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;])
318</pre></div></div>
319<p>Currently AsterixDB doesn&#x2019;t (yet) support phrase searches, so the following query will not work.</p>
320
321<div class="source">
322<div class="source">
323<pre> ... where ftcontains($msg.message-text, &quot;sound system&quot;, {&quot;mode&quot;:&quot;any&quot;})
324</pre></div></div>
325<p>As a workaround solution, the following query can be used to achieve a roughly similar goal. The difference is that the following queries will find documents where <tt>$msg.message-text</tt> contains both &#x201c;sound&#x201d; and &#x201c;system&#x201d;, but the order and adjacency of &#x201c;sound&#x201d; and &#x201c;system&#x201d; are not checked, unlike in a phrase search. As a result, the query below would also return documents with &#x201c;sound system can be installed.&#x201d;, &#x201c;system sound is perfect.&#x201d;, or &#x201c;sound is not clear. You may need to install a new system.&#x201d;</p>
326
327<div class="source">
328<div class="source">
329<pre> ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;all&quot;})
330 ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;])
331</pre></div></div></div>
332<div class="section">
333<h2><a name="Creating_and_utilizing_a_Full-text_index_Back_to_TOC"></a><a name="FulltextIndex" id="FulltextIndex">Creating and utilizing a Full-text index</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
334<p>When there is a full-text index on the field that is being searched, rather than scanning all records, AsterixDB can utilize that index to expedite the execution of a FTS query. To create a full-text index, you need to specify the index type as <tt>fulltext</tt> in your DDL statement. For instance, the following DDL statement create a full-text index on the TweetMessages.message-text attribute.</p>
335
336<div class="source">
337<div class="source">
338<pre>create index messageFTSIdx on TweetMessages(message-text) type fulltext;
339</pre></div></div></div>
340 </div>
341 </div>
342 </div>
343
344 <hr/>
345
346 <footer>
347 <div class="container-fluid">
348 <div class="row span12">Copyright &copy; 2018
349 <a href="https://www.apache.org/">The Apache Software Foundation</a>.
350 All Rights Reserved.
351
352 </div>
353
354 <?xml version="1.0" encoding="UTF-8"?>
355<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
356 feather logo, and the Apache AsterixDB project logo are either
357 registered trademarks or trademarks of The Apache Software
358 Foundation in the United States and other countries.
359 All other marks mentioned may be trademarks or registered
360 trademarks of their respective owners.</div>
361
362
363 </div>
364 </footer>
365 </body>
366</html>