blob: 30c299eb18637ceedc61f69687fc64d527a41a88 [file] [log] [blame]
Ian Maxonbf2c56b2017-01-24 14:14:49 -08001<!DOCTYPE html>
2<!--
Ian Maxond5b11d82017-01-25 10:48:05 -08003 | Generated by Apache Maven Doxia at 2017-01-25
Ian Maxonbf2c56b2017-01-24 14:14:49 -08004 | Rendered using Apache Maven Fluido Skin 1.3.0
5-->
6<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7 <head>
8 <meta charset="UTF-8" />
9 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
Ian Maxond5b11d82017-01-25 10:48:05 -080010 <meta name="Date-Revision-yyyymmdd" content="20170125" />
Ian Maxonbf2c56b2017-01-24 14:14:49 -080011 <meta http-equiv="Content-Language" content="en" />
12 <title>AsterixDB &#x2013; AsterixDB Support of Full-text search queries</title>
13 <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
14 <link rel="stylesheet" href="../css/site.css" />
15 <link rel="stylesheet" href="../css/print.css" media="print" />
16
17
18 <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
19
20
21
22<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
23 (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
24 m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
25 })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
26
27 ga('create', 'UA-41536543-1', 'uci.edu');
28 ga('send', 'pageview');</script>
29
30 </head>
31 <body class="topBarDisabled">
32
33
34
35
36 <div class="container-fluid">
37 <div id="banner">
38 <div class="pull-left">
39 <a href=".././" id="bannerLeft">
40 <img src="../images/asterixlogo.png" alt="AsterixDB"/>
41 </a>
42 </div>
43 <div class="pull-right"> </div>
44 <div class="clear"><hr/></div>
45 </div>
46
47 <div id="breadcrumbs">
48 <ul class="breadcrumb">
49
50
Ian Maxond5b11d82017-01-25 10:48:05 -080051 <li id="publishDate">Last Published: 2017-01-25</li>
Ian Maxonbf2c56b2017-01-24 14:14:49 -080052
53
54
55 <li id="projectVersion" class="pull-right">Version: 0.9.0</li>
56
57 <li class="divider pull-right">|</li>
58
59 <li class="pull-right"> <a href="../index.html" title="Documentation Home">
60 Documentation Home</a>
61 </li>
62
63 </ul>
64 </div>
65
66
67 <div class="row-fluid">
68 <div id="leftColumn" class="span3">
69 <div class="well sidebar-nav">
70
71
72 <ul class="nav nav-list">
73 <li class="nav-header">Get Started - Installation</li>
74
75 <li>
76
77 <a href="../ncservice.html" title="Option 1: using NCService">
78 <i class="none"></i>
79 Option 1: using NCService</a>
80 </li>
81
82 <li>
83
84 <a href="../install.html" title="Option 2: using Managix">
85 <i class="none"></i>
86 Option 2: using Managix</a>
87 </li>
88
89 <li>
90
91 <a href="../yarn.html" title="Option 3: using YARN">
92 <i class="none"></i>
93 Option 3: using YARN</a>
94 </li>
95 <li class="nav-header">AsterixDB Primer</li>
96
97 <li>
98
99 <a href="../sqlpp/primer-sqlpp.html" title="Option 1: using SQL++">
100 <i class="none"></i>
101 Option 1: using SQL++</a>
102 </li>
103
104 <li>
105
106 <a href="../aql/primer.html" title="Option 2: using AQL">
107 <i class="none"></i>
108 Option 2: using AQL</a>
109 </li>
110 <li class="nav-header">Data Model</li>
111
112 <li>
113
114 <a href="../datamodel.html" title="The Asterix Data Model">
115 <i class="none"></i>
116 The Asterix Data Model</a>
117 </li>
118 <li class="nav-header">Queries - SQL++</li>
119
120 <li>
121
122 <a href="../sqlpp/manual.html" title="The SQL++ Query Language">
123 <i class="none"></i>
124 The SQL++ Query Language</a>
125 </li>
126
127 <li>
128
129 <a href="../sqlpp/builtins.html" title="Builtin Functions">
130 <i class="none"></i>
131 Builtin Functions</a>
132 </li>
133 <li class="nav-header">Queries - AQL</li>
134
135 <li>
136
137 <a href="../aql/manual.html" title="The Asterix Query Language (AQL)">
138 <i class="none"></i>
139 The Asterix Query Language (AQL)</a>
140 </li>
141
142 <li>
143
144 <a href="../aql/builtins.html" title="Builtin Functions">
145 <i class="none"></i>
146 Builtin Functions</a>
147 </li>
148 <li class="nav-header">Advanced Features</li>
149
150 <li>
151
152 <a href="../aql/similarity.html" title="Support of Similarity Queries">
153 <i class="none"></i>
154 Support of Similarity Queries</a>
155 </li>
156
157 <li class="active">
158
159 <a href="#"><i class="none"></i>Support of Full-text Queries</a>
160 </li>
161
162 <li>
163
164 <a href="../aql/externaldata.html" title="Accessing External Data">
165 <i class="none"></i>
166 Accessing External Data</a>
167 </li>
168
169 <li>
170
171 <a href="../feeds/tutorial.html" title="Support for Data Ingestion">
172 <i class="none"></i>
173 Support for Data Ingestion</a>
174 </li>
175
176 <li>
177
178 <a href="../udf.html" title="User Defined Functions">
179 <i class="none"></i>
180 User Defined Functions</a>
181 </li>
182
183 <li>
184
185 <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
186 <i class="none"></i>
187 Filter-Based LSM Index Acceleration</a>
188 </li>
189 <li class="nav-header">API/SDK</li>
190
191 <li>
192
193 <a href="../api.html" title="HTTP API">
194 <i class="none"></i>
195 HTTP API</a>
196 </li>
197 </ul>
198
199
200
201 <hr class="divider" />
202
203 <div id="poweredBy">
204 <div class="clear"></div>
205 <div class="clear"></div>
206 <div class="clear"></div>
207 <a href=".././" title="AsterixDB" class="builtBy">
208 <img class="builtBy" alt="AsterixDB" src="../images/asterixlogo.png" />
209 </a>
210 </div>
211 </div>
212 </div>
213
214
215 <div id="bodyColumn" class="span9" >
216
217 <!-- ! Licensed to the Apache Software Foundation (ASF) under one
218 ! or more contributor license agreements. See the NOTICE file
219 ! distributed with this work for additional information
220 ! regarding copyright ownership. The ASF licenses this file
221 ! to you under the Apache License, Version 2.0 (the
222 ! "License"); you may not use this file except in compliance
223 ! with the License. You may obtain a copy of the License at
224 !
225 ! http://www.apache.org/licenses/LICENSE-2.0
226 !
227 ! Unless required by applicable law or agreed to in writing,
228 ! software distributed under the License is distributed on an
229 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
230 ! KIND, either express or implied. See the License for the
231 ! specific language governing permissions and limitations
232 ! under the License.
233 ! --><h1>AsterixDB Support of Full-text search queries</h1>
234<div class="section">
235<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
236
237<ul>
238
239<li><a href="#Motivation">Motivation</a></li>
240
241<li><a href="#Syntax">Syntax</a></li>
242
243<li><a href="#FulltextIndex">Creating and utilizing a Full-text index</a></li>
244</ul></div>
245<div class="section">
246<h2><a name="Motivation_Back_to_TOC"></a><a name="Motivation" id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
247<p>Full-Text Search (FTS) queries are widely used in applications where users need to find records that satisfy an FTS predicate, i.e., where simple string-based matching is not sufficient. These queries are important when finding documents that contain a certain keyword is crucial. FTS queries are different from substring matching queries in that FTS queries find their query predicates as exact keywords in the given string, rather than treating a query predicate as a sequence of characters. For example, an FTS query that finds &#x201c;rain&#x201d; correctly returns a document when it contains &#x201c;rain&#x201d; as a word. However, a substring-matching query returns a document whenever it contains &#x201c;rain&#x201d; as a substring, for instance, a document with &#x201c;brain&#x201d; or &#x201c;training&#x201d; would be returned as well.</p></div>
248<div class="section">
249<h2><a name="Syntax_Back_to_TOC"></a><a name="Syntax" id="Syntax">Syntax</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
250<p>The syntax of AsterixDB FTS follows a portion of the XQuery FullText Search syntax. Two basic forms are as follows:</p>
251
252<div class="source">
253<div class="source">
254<pre> ftcontains(Expression1, Expression2, {FullTextOption})
255 ftcontains(Expression1, Expression2)
256</pre></div></div>
257<p>For example, we can execute the following query to find tweet messages where the <tt>message-text</tt> field includes &#x201c;voice&#x201d; as a word. Please note that an FTS search is case-insensitive. Thus, &#x201c;Voice&#x201d; or &#x201c;voice&#x201d; will be evaluated as the same word.</p>
258
259<div class="source">
260<div class="source">
261<pre> use dataverse TinySocial;
262
263 for $msg in dataset TweetMessages
264 where ftcontains($msg.message-text, &quot;voice&quot;, {&quot;mode&quot;:&quot;any&quot;})
265 return {&quot;id&quot;: $msg.id}
266</pre></div></div>
267<p>The DDL and DML of TinySocial can be found in <a href="primer.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB">ADM: Modeling Semistructed Data in AsterixDB</a>.</p>
268<p>The <tt>Expression1</tt> is an expression that should be evaluable as a string at runtime as in the above example where <tt>$msg.message-text</tt> is a string field. The <tt>Expression2</tt> can be a string, an (un)ordered list of string value(s), or an expression. In the last case, the given expression should be evaluable into one of the first two types, i.e., into a string value or an (un)ordered list of string value(s).</p>
269<p>The following examples are all valid expressions.</p>
270
271<div class="source">
272<div class="source">
273<pre> ... where ftcontains($msg.message-text, &quot;sound&quot;)
274 ... where ftcontains($msg.message-text, &quot;sound&quot;, {&quot;mode&quot;:&quot;any&quot;})
275 ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;any&quot;})
276 ... where ftcontains($msg.message-text, {{&quot;speed&quot;, &quot;stand&quot;, &quot;customization&quot;}}, {&quot;mode&quot;:&quot;all&quot;})
277 ... where ftcontains($msg.message-text, let $keyword_list := [&quot;voice&quot;, &quot;system&quot;] return $keyword_list, {&quot;mode&quot;:&quot;all&quot;})
278 ... where ftcontains($msg.message-text, $keyword_list, {&quot;mode&quot;:&quot;any&quot;})
279</pre></div></div>
280<p>In the last example above, <tt>$keyword_list</tt> should evaluate to a string or an (un)ordered list of string value(s).</p>
281<p>The last <tt>FullTextOption</tt> parameter clarifies the given FTS request. If you omit the <tt>FullTextOption</tt> parameter, then the default value will be set for each possible option. Currently, we only have one option named <tt>mode</tt>. And as we extend the FTS feature, more options will be added. Please note that the format of <tt>FullTextOption</tt> is a record, thus you need to put the option(s) in a record <tt>{}</tt>. The <tt>mode</tt> option indicates whether the given FTS query is a conjunctive (AND) or disjunctive (OR) search request. This option can be either <tt>&#x201c;any&#x201d;</tt> or <tt>&#x201c;all&#x201d;</tt>. The default value for <tt>mode</tt> is <tt>&#x201c;all&#x201d;</tt>. If one specifies <tt>&#x201c;any&#x201d;</tt>, a disjunctive search will be conducted. For example, the following query will find documents whose <tt>message-text</tt> field contains &#x201c;sound&#x201d; or &#x201c;system&#x201d;, so a document will be returned if it contains either &#x201c;sound&#x201d;, &#x201c;system&#x201d;, or both of the keywords.</p>
282
283<div class="source">
284<div class="source">
285<pre> ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;any&quot;})
286</pre></div></div>
287<p>The other option parameter,<tt>&#x201c;all&#x201d;</tt>, specifies a conjunctive search. The following examples will find the documents whose <tt>message-text</tt> field contains both &#x201c;sound&#x201d; and &#x201c;system&#x201d;. If a document contains only &#x201c;sound&#x201d; or &#x201c;system&#x201d; but not both, it will not be returned.</p>
288
289<div class="source">
290<div class="source">
291<pre> ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;all&quot;})
292 ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;])
293</pre></div></div>
294<p>Currently AsterixDB doesn&#x2019;t (yet) support phrase searches, so the following query will not work.</p>
295
296<div class="source">
297<div class="source">
298<pre> ... where ftcontains($msg.message-text, &quot;sound system&quot;, {&quot;mode&quot;:&quot;any&quot;})
299</pre></div></div>
300<p>As a workaround solution, the following query can be used to achieve a roughly similar goal. The difference is that the following queries will find documents where <tt>$msg.message-text</tt> contains both &#x201c;sound&#x201d; and &#x201c;system&#x201d;, but the order and adjacency of &#x201c;sound&#x201d; and &#x201c;system&#x201d; are not checked, unlike in a phrase search. As a result, the query below would also return documents with &#x201c;sound system can be installed.&#x201d;, &#x201c;system sound is perfect.&#x201d;, or &#x201c;sound is not clear. You may need to install a new system.&#x201d;</p>
301
302<div class="source">
303<div class="source">
304<pre> ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;all&quot;})
305 ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;])
306</pre></div></div></div>
307 </div>
308 </div>
309 </div>
310
311 <hr/>
312
313 <footer>
314 <div class="container-fluid">
315 <div class="row span12">Copyright &copy; 2017
316 <a href="https://www.apache.org/">The Apache Software Foundation</a>.
317 All Rights Reserved.
318
319 </div>
320
321 <?xml version="1.0" encoding="UTF-8"?>
322<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
323 feather logo, and the Apache AsterixDB project logo are either
324 registered trademarks or trademarks of The Apache Software
325 Foundation in the United States and other countries.
326 All other marks mentioned may be trademarks or registered
327 trademarks of their respective owners.</div>
328
329
330 </div>
331 </footer>
332 </body>
333</html>