blob: 17da1513ad8284e3422adc5082a1807beac40ef8 [file] [log] [blame]
Ian Maxon41b806c2019-03-07 15:58:20 -08001<!DOCTYPE html>
2<!--
3 | Generated by Apache Maven Doxia Site Renderer 1.8.1 from src/site/markdown/aql/fulltext.md at 2019-03-07
4 | Rendered using Apache Maven Fluido Skin 1.7
5-->
6<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7 <head>
8 <meta charset="UTF-8" />
9 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
10 <meta name="Date-Revision-yyyymmdd" content="20190307" />
11 <meta http-equiv="Content-Language" content="en" />
12 <title>AsterixDB &#x2013; AsterixDB Support of Full-text search queries</title>
13 <link rel="stylesheet" href="../css/apache-maven-fluido-1.7.min.css" />
14 <link rel="stylesheet" href="../css/site.css" />
15 <link rel="stylesheet" href="../css/print.css" media="print" />
16 <script type="text/javascript" src="../js/apache-maven-fluido-1.7.min.js"></script>
17
18 </head>
19 <body class="topBarDisabled">
20 <div class="container-fluid">
21 <div id="banner">
22 <div class="pull-left"><a href=".././" id="bannerLeft"><img src="../images/asterixlogo.png" alt="AsterixDB"/></a></div>
23 <div class="pull-right"></div>
24 <div class="clear"><hr/></div>
25 </div>
26
27 <div id="breadcrumbs">
28 <ul class="breadcrumb">
29 <li id="publishDate">Last Published: 2019-03-07</li>
30 <li id="projectVersion" class="pull-right">Version: 0.9.4</li>
31 <li class="pull-right"><a href="../index.html" title="Documentation Home">Documentation Home</a></li>
32 </ul>
33 </div>
34 <div class="row-fluid">
35 <div id="leftColumn" class="span2">
36 <div class="well sidebar-nav">
37 <ul class="nav nav-list">
38 <li class="nav-header">Get Started - Installation</li>
39 <li><a href="../ncservice.html" title="Option 1: using NCService"><span class="none"></span>Option 1: using NCService</a></li>
40 <li><a href="../ansible.html" title="Option 2: using Ansible"><span class="none"></span>Option 2: using Ansible</a></li>
41 <li><a href="../aws.html" title="Option 3: using Amazon Web Services"><span class="none"></span>Option 3: using Amazon Web Services</a></li>
42 <li class="nav-header">AsterixDB Primer</li>
43 <li><a href="../sqlpp/primer-sqlpp.html" title="Option 1: using SQL++"><span class="none"></span>Option 1: using SQL++</a></li>
44 <li><a href="../aql/primer.html" title="Option 2: using AQL"><span class="none"></span>Option 2: using AQL</a></li>
45 <li class="nav-header">Data Model</li>
46 <li><a href="../datamodel.html" title="The Asterix Data Model"><span class="none"></span>The Asterix Data Model</a></li>
47 <li class="nav-header">Queries - SQL++</li>
48 <li><a href="../sqlpp/manual.html" title="The SQL++ Query Language"><span class="none"></span>The SQL++ Query Language</a></li>
49 <li><a href="../sqlpp/builtins.html" title="Builtin Functions"><span class="none"></span>Builtin Functions</a></li>
50 <li class="nav-header">Queries - AQL</li>
51 <li><a href="../aql/manual.html" title="The Asterix Query Language (AQL)"><span class="none"></span>The Asterix Query Language (AQL)</a></li>
52 <li><a href="../aql/builtins.html" title="Builtin Functions"><span class="none"></span>Builtin Functions</a></li>
53 <li class="nav-header">API/SDK</li>
54 <li><a href="../api.html" title="HTTP API"><span class="none"></span>HTTP API</a></li>
55 <li><a href="../csv.html" title="CSV Output"><span class="none"></span>CSV Output</a></li>
56 <li class="nav-header">Advanced Features</li>
57 <li class="active"><a href="#"><span class="none"></span>Support of Full-text Queries</a></li>
58 <li><a href="../aql/externaldata.html" title="Accessing External Data"><span class="none"></span>Accessing External Data</a></li>
59 <li><a href="../feeds/tutorial.html" title="Support for Data Ingestion"><span class="none"></span>Support for Data Ingestion</a></li>
60 <li><a href="../udf.html" title="User Defined Functions"><span class="none"></span>User Defined Functions</a></li>
61 <li><a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration"><span class="none"></span>Filter-Based LSM Index Acceleration</a></li>
62 <li><a href="../aql/similarity.html" title="Support of Similarity Queries"><span class="none"></span>Support of Similarity Queries</a></li>
63</ul>
64 <hr />
65 <div id="poweredBy">
66 <div class="clear"></div>
67 <div class="clear"></div>
68 <div class="clear"></div>
69 <div class="clear"></div>
70<a href=".././" title="AsterixDB" class="builtBy"><img class="builtBy" alt="AsterixDB" src="../images/asterixlogo.png" /></a>
71 </div>
72 </div>
73 </div>
74 <div id="bodyColumn" class="span10" >
75<!--
76 ! Licensed to the Apache Software Foundation (ASF) under one
77 ! or more contributor license agreements. See the NOTICE file
78 ! distributed with this work for additional information
79 ! regarding copyright ownership. The ASF licenses this file
80 ! to you under the Apache License, Version 2.0 (the
81 ! "License"); you may not use this file except in compliance
82 ! with the License. You may obtain a copy of the License at
83 !
84 ! http://www.apache.org/licenses/LICENSE-2.0
85 !
86 ! Unless required by applicable law or agreed to in writing,
87 ! software distributed under the License is distributed on an
88 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
89 ! KIND, either express or implied. See the License for the
90 ! specific language governing permissions and limitations
91 ! under the License.
92 !-->
93<h1>AsterixDB Support of Full-text search queries</h1>
94<div class="section">
95<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
96<ul>
97
98<li><a href="#Motivation">Motivation</a></li>
99<li><a href="#Syntax">Syntax</a></li>
100<li><a href="#FulltextIndex">Creating and utilizing a Full-text index</a></li>
101</ul></div>
102<div class="section">
103<h2><a name="Motivation_.5BBack_to_TOC.5D"></a><a name="Motivation" id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
104<p>Full-Text Search (FTS) queries are widely used in applications where users need to find records that satisfy an FTS predicate, i.e., where simple string-based matching is not sufficient. These queries are important when finding documents that contain a certain keyword is crucial. FTS queries are different from substring matching queries in that FTS queries find their query predicates as exact keywords in the given string, rather than treating a query predicate as a sequence of characters. For example, an FTS query that finds &#x201c;rain&#x201d; correctly returns a document when it contains &#x201c;rain&#x201d; as a word. However, a substring-matching query returns a document whenever it contains &#x201c;rain&#x201d; as a substring, for instance, a document with &#x201c;brain&#x201d; or &#x201c;training&#x201d; would be returned as well.</p></div>
105<div class="section">
106<h2><a name="Syntax_.5BBack_to_TOC.5D"></a><a name="Syntax" id="Syntax">Syntax</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
107<p>The syntax of AsterixDB FTS follows a portion of the XQuery FullText Search syntax. Two basic forms are as follows:</p>
108
109<div>
110<div>
111<pre class="source"> ftcontains(Expression1, Expression2, {FullTextOption})
112 ftcontains(Expression1, Expression2)
113</pre></div></div>
114
115<p>For example, we can execute the following query to find tweet messages where the <tt>message-text</tt> field includes &#x201c;voice&#x201d; as a word. Please note that an FTS search is case-insensitive. Thus, &#x201c;Voice&#x201d; or &#x201c;voice&#x201d; will be evaluated as the same word.</p>
116
117<div>
118<div>
119<pre class="source"> use dataverse TinySocial;
120
121 for $msg in dataset TweetMessages
122 where ftcontains($msg.message-text, &quot;voice&quot;, {&quot;mode&quot;:&quot;any&quot;})
123 return {&quot;id&quot;: $msg.id}
124</pre></div></div>
125
126<p>The DDL and DML of TinySocial can be found in <a href="primer.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB">ADM: Modeling Semistructed Data in AsterixDB</a>.</p>
127<p>The same query can be also expressed in the SQL++.</p>
128
129<div>
130<div>
131<pre class="source"> use TinySocial;
132
133 select element {&quot;id&quot;:msg.id}
134 from TweetMessages as msg
135 where TinySocial.ftcontains(msg.`message-text`, &quot;voice&quot;, {&quot;mode&quot;:&quot;any&quot;})
136</pre></div></div>
137
138<p>The <tt>Expression1</tt> is an expression that should be evaluable as a string at runtime as in the above example where <tt>$msg.message-text</tt> is a string field. The <tt>Expression2</tt> can be a string, an (un)ordered list of string value(s), or an expression. In the last case, the given expression should be evaluable into one of the first two types, i.e., into a string value or an (un)ordered list of string value(s).</p>
139<p>The following examples are all valid expressions.</p>
140
141<div>
142<div>
143<pre class="source"> ... where ftcontains($msg.message-text, &quot;sound&quot;)
144 ... where ftcontains($msg.message-text, &quot;sound&quot;, {&quot;mode&quot;:&quot;any&quot;})
145 ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;any&quot;})
146 ... where ftcontains($msg.message-text, {{&quot;speed&quot;, &quot;stand&quot;, &quot;customization&quot;}}, {&quot;mode&quot;:&quot;all&quot;})
147 ... where ftcontains($msg.message-text, let $keyword_list := [&quot;voice&quot;, &quot;system&quot;] return $keyword_list, {&quot;mode&quot;:&quot;all&quot;})
148 ... where ftcontains($msg.message-text, $keyword_list, {&quot;mode&quot;:&quot;any&quot;})
149</pre></div></div>
150
151<p>In the last example above, <tt>$keyword_list</tt> should evaluate to a string or an (un)ordered list of string value(s).</p>
152<p>The last <tt>FullTextOption</tt> parameter clarifies the given FTS request. If you omit the <tt>FullTextOption</tt> parameter, then the default value will be set for each possible option. Currently, we only have one option named <tt>mode</tt>. And as we extend the FTS feature, more options will be added. Please note that the format of <tt>FullTextOption</tt> is a record, thus you need to put the option(s) in a record <tt>{}</tt>. The <tt>mode</tt> option indicates whether the given FTS query is a conjunctive (AND) or disjunctive (OR) search request. This option can be either <tt>&#x201c;any&#x201d;</tt> or <tt>&#x201c;all&#x201d;</tt>. The default value for <tt>mode</tt> is <tt>&#x201c;all&#x201d;</tt>. If one specifies <tt>&#x201c;any&#x201d;</tt>, a disjunctive search will be conducted. For example, the following query will find documents whose <tt>message-text</tt> field contains &#x201c;sound&#x201d; or &#x201c;system&#x201d;, so a document will be returned if it contains either &#x201c;sound&#x201d;, &#x201c;system&#x201d;, or both of the keywords.</p>
153
154<div>
155<div>
156<pre class="source"> ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;any&quot;})
157</pre></div></div>
158
159<p>The other option parameter,<tt>&#x201c;all&#x201d;</tt>, specifies a conjunctive search. The following examples will find the documents whose <tt>message-text</tt> field contains both &#x201c;sound&#x201d; and &#x201c;system&#x201d;. If a document contains only &#x201c;sound&#x201d; or &#x201c;system&#x201d; but not both, it will not be returned.</p>
160
161<div>
162<div>
163<pre class="source"> ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;all&quot;})
164 ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;])
165</pre></div></div>
166
167<p>Currently AsterixDB doesn&#x2019;t (yet) support phrase searches, so the following query will not work.</p>
168
169<div>
170<div>
171<pre class="source"> ... where ftcontains($msg.message-text, &quot;sound system&quot;, {&quot;mode&quot;:&quot;any&quot;})
172</pre></div></div>
173
174<p>As a workaround solution, the following query can be used to achieve a roughly similar goal. The difference is that the following queries will find documents where <tt>$msg.message-text</tt> contains both &#x201c;sound&#x201d; and &#x201c;system&#x201d;, but the order and adjacency of &#x201c;sound&#x201d; and &#x201c;system&#x201d; are not checked, unlike in a phrase search. As a result, the query below would also return documents with &#x201c;sound system can be installed.&#x201d;, &#x201c;system sound is perfect.&#x201d;, or &#x201c;sound is not clear. You may need to install a new system.&#x201d;</p>
175
176<div>
177<div>
178<pre class="source"> ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;], {&quot;mode&quot;:&quot;all&quot;})
179 ... where ftcontains($msg.message-text, [&quot;sound&quot;, &quot;system&quot;])
180</pre></div></div>
181</div>
182<div class="section">
183<h2><a name="Creating_and_utilizing_a_Full-text_index_.5BBack_to_TOC.5D"></a><a name="FulltextIndex" id="FulltextIndex">Creating and utilizing a Full-text index</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
184<p>When there is a full-text index on the field that is being searched, rather than scanning all records, AsterixDB can utilize that index to expedite the execution of a FTS query. To create a full-text index, you need to specify the index type as <tt>fulltext</tt> in your DDL statement. For instance, the following DDL statement create a full-text index on the TweetMessages.message-text attribute.</p>
185
186<div>
187<div>
188<pre class="source">create index messageFTSIdx on TweetMessages(message-text) type fulltext;
189</pre></div></div></div>
190 </div>
191 </div>
192 </div>
193 <hr/>
194 <footer>
195 <div class="container-fluid">
196 <div class="row-fluid">
197<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
198 feather logo, and the Apache AsterixDB project logo are either
199 registered trademarks or trademarks of The Apache Software
200 Foundation in the United States and other countries.
201 All other marks mentioned may be trademarks or registered
202 trademarks of their respective owners.
203 </div>
204 </div>
205 </div>
206 </footer>
207 </body>
208</html>