blob: dabfd1da735036cd022c031930de0bdca3a85ed5 [file] [log] [blame]
Ian Maxon3355d4c2021-12-13 12:38:15 -08001<!DOCTYPE html>
2<!--
3 | Generated by Apache Maven Doxia Site Renderer 1.8.1 from src/site/markdown/sqlpp/arrayindex.md at 2021-12-13
4 | Rendered using Apache Maven Fluido Skin 1.7
5-->
6<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7 <head>
8 <meta charset="UTF-8" />
9 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
10 <meta name="Date-Revision-yyyymmdd" content="20211213" />
11 <meta http-equiv="Content-Language" content="en" />
12 <title>AsterixDB &#x2013; AsterixDB Support of Array Indexes</title>
13 <link rel="stylesheet" href="../css/apache-maven-fluido-1.7.min.css" />
14 <link rel="stylesheet" href="../css/site.css" />
15 <link rel="stylesheet" href="../css/print.css" media="print" />
16 <script type="text/javascript" src="../js/apache-maven-fluido-1.7.min.js"></script>
17
18 </head>
19 <body class="topBarDisabled">
20 <div class="container-fluid">
21 <div id="banner">
22 <div class="pull-left"><a href=".././" id="bannerLeft"><img src="../images/asterixlogo.png" alt="AsterixDB"/></a></div>
23 <div class="pull-right"></div>
24 <div class="clear"><hr/></div>
25 </div>
26
27 <div id="breadcrumbs">
28 <ul class="breadcrumb">
29 <li id="publishDate">Last Published: 2021-12-13</li>
30 <li id="projectVersion" class="pull-right">Version: 0.9.7.1</li>
31 <li class="pull-right"><a href="../index.html" title="Documentation Home">Documentation Home</a></li>
32 </ul>
33 </div>
34 <div class="row-fluid">
35 <div id="leftColumn" class="span2">
36 <div class="well sidebar-nav">
37 <ul class="nav nav-list">
38 <li class="nav-header">Get Started - Installation</li>
39 <li><a href="../ncservice.html" title="Option 1: using NCService"><span class="none"></span>Option 1: using NCService</a></li>
40 <li><a href="../ansible.html" title="Option 2: using Ansible"><span class="none"></span>Option 2: using Ansible</a></li>
41 <li><a href="../aws.html" title="Option 3: using Amazon Web Services"><span class="none"></span>Option 3: using Amazon Web Services</a></li>
42 <li class="nav-header">AsterixDB Primer</li>
43 <li><a href="../sqlpp/primer-sqlpp.html" title="Using SQL++"><span class="none"></span>Using SQL++</a></li>
44 <li class="nav-header">Data Model</li>
45 <li><a href="../datamodel.html" title="The Asterix Data Model"><span class="none"></span>The Asterix Data Model</a></li>
46 <li class="nav-header">Queries</li>
47 <li><a href="../sqlpp/manual.html" title="The SQL++ Query Language"><span class="none"></span>The SQL++ Query Language</a></li>
48 <li><a href="../SQLPP.html" title="Raw SQL++ Grammar"><span class="none"></span>Raw SQL++ Grammar</a></li>
49 <li><a href="../sqlpp/builtins.html" title="Builtin Functions"><span class="none"></span>Builtin Functions</a></li>
50 <li class="nav-header">API/SDK</li>
51 <li><a href="../api.html" title="HTTP API"><span class="none"></span>HTTP API</a></li>
52 <li><a href="../csv.html" title="CSV Output"><span class="none"></span>CSV Output</a></li>
53 <li class="nav-header">Advanced Features</li>
54 <li><a href="../aql/externaldata.html" title="Accessing External Data"><span class="none"></span>Accessing External Data</a></li>
55 <li><a href="../feeds.html" title="Data Ingestion with Feeds"><span class="none"></span>Data Ingestion with Feeds</a></li>
56 <li><a href="../udf.html" title="User Defined Functions"><span class="none"></span>User Defined Functions</a></li>
57 <li><a href="../sqlpp/filters.html" title="Filter-Based LSM Index Acceleration"><span class="none"></span>Filter-Based LSM Index Acceleration</a></li>
58 <li><a href="../sqlpp/fulltext.html" title="Support of Full-text Queries"><span class="none"></span>Support of Full-text Queries</a></li>
59 <li><a href="../sqlpp/similarity.html" title="Support of Similarity Queries"><span class="none"></span>Support of Similarity Queries</a></li>
60 <li><a href="../interval_join.html" title="Support of Interval Joins"><span class="none"></span>Support of Interval Joins</a></li>
61 <li class="active"><a href="#"><span class="none"></span>Support of Array Indexes</a></li>
62 <li class="nav-header">Deprecated</li>
63 <li><a href="../aql/primer.html" title="AsterixDB Primer: Using AQL"><span class="none"></span>AsterixDB Primer: Using AQL</a></li>
64 <li><a href="../aql/manual.html" title="Queries: The Asterix Query Language (AQL)"><span class="none"></span>Queries: The Asterix Query Language (AQL)</a></li>
65 <li><a href="../aql/builtins.html" title="Queries: Builtin Functions (AQL)"><span class="none"></span>Queries: Builtin Functions (AQL)</a></li>
66</ul>
67 <hr />
68 <div id="poweredBy">
69 <div class="clear"></div>
70 <div class="clear"></div>
71 <div class="clear"></div>
72 <div class="clear"></div>
73<a href=".././" title="AsterixDB" class="builtBy"><img class="builtBy" alt="AsterixDB" src="../images/asterixlogo.png" /></a>
74 </div>
75 </div>
76 </div>
77 <div id="bodyColumn" class="span10" >
78<!--
79 ! Licensed to the Apache Software Foundation (ASF) under one
80 ! or more contributor license agreements. See the NOTICE file
81 ! distributed with this work for additional information
82 ! regarding copyright ownership. The ASF licenses this file
83 ! to you under the Apache License, Version 2.0 (the
84 ! "License"); you may not use this file except in compliance
85 ! with the License. You may obtain a copy of the License at
86 !
87 ! http://www.apache.org/licenses/LICENSE-2.0
88 !
89 ! Unless required by applicable law or agreed to in writing,
90 ! software distributed under the License is distributed on an
91 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
92 ! KIND, either express or implied. See the License for the
93 ! specific language governing permissions and limitations
94 ! under the License.
95 !-->
96<h1>AsterixDB Support of Array Indexes</h1>
97<div class="section">
98<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
99<ul>
100
101<li><a href="#Overview">Overview</a></li>
102<li><a href="#QuantificationQueries">Quantification Queries</a></li>
103<li><a href="#ExplicitUnnestQueries">Explicit Unnesting Queries</a></li>
104<li><a href="#JoinQueries">Join Queries</a></li>
105<li><a href="#ComplexIndexingExamples">Complex Indexing Examples</a></li>
106</ul></div>
107<div class="section">
108<h2><a name="Overview_.5BBack_to_TOC.5D"></a><a name="Overview" id="Overview">Overview</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
109<p>Array indexes are used in applications where users want to accelerate a query that involves some array-valued or multiset-valued field. This enables fast evaluation of predicates in queries involving arrays or multisets in datasets. For brevity, all further mentions of array-valued fields are also applicable to multiset-valued fields.</p>
110<p>Array-valued fields are a natural data modeling concept for documents. In the traditional inventory management example, it is natural for the line items of an order to exist as a part of the order itself. Previously if an AsterixDB user wanted to optimize a query involving a predicate on the line items of an order, they would a) have to undertake some form of schema migration to separate the line items from the orders into different datasets, b) create an index on the new dataset for line items, and finally c) modify their query to join orders and line items. With the introduction of array indexes in AsterixDB, users can keep their arrays intact and still reap the performance benefits of an index.</p>
111<p>It should be noted that in AsterixDB, array indexes are <i>not</i> meant to serve as covering indexes. In fact due to AsterixDB&#x2019;s record-level locking, index-only plans involving multi-valued fields (i.e. array indexes and inverted indexes) are not currently possible. Instead, array indexes are simply meant to accelerate queries involving multi-valued fields.</p></div>
112<div class="section">
113<h2><a name="Quantification_Queries_.5BBack_to_TOC.5D"></a><a name="QuantificationQueries" id="QuantificationQueries">Quantification Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
114<p>A common use-case for array indexes involves quantifying some or all elements within an array. Quantification queries have two variants: existential and universal. Existential queries ask if <i>any</i> element in some array satisfies a given predicate. Membership queries are a specific type of existential query, asking if any element in some array is equal to a particular value. Universal queries ask if <i>all</i> elements in some array satisfy a particular predicate. Empty arrays are not stored in an array index, meaning that a user must additionally specify that the array is non-empty to tell AsterixDB that it is possible to use an array index as an access method for the given query.</p>
115<p>All query examples here will use the orders and products datasets below.</p>
116
117<div>
118<div>
119<pre class="source">CREATE TYPE ordersType AS {
120 orderno: int,
121 custid: string,
122 items: [{ itemno: int, productno: int, qty: int, price: float }]
123};
124CREATE DATASET orders (ordersType) PRIMARY KEY orderno;
125
126CREATE TYPE productsType AS {
127 productno: int,
128 categories: {{ string }}
129};
130CREATE DATASET products (productsType) PRIMARY KEY productno;
131</pre></div></div>
132
133<p>Let us now create an index on the <tt>categories</tt> multiset of the <tt>products</tt> dataset.</p>
134
135<div>
136<div>
137<pre class="source">CREATE INDEX pCategoriesIdx ON products (UNNEST categories);
138</pre></div></div>
139
140<p>Suppose we now want to find all products that have the category &#x201c;Food&#x201d;. The following membership query will utilize the index we just created.</p>
141
142<div>
143<div>
144<pre class="source">SET `compiler.arrayindex` &quot;true&quot;;
145
146SELECT p
147FROM products p
148WHERE &quot;Food&quot; IN p.categories;
149</pre></div></div>
150
151<p>We can also rewrite the query above as an explicit existential quantification query with an equality predicate and the index will be utilized.</p>
152
153<div>
154<div>
155<pre class="source">SET `compiler.arrayindex` &quot;true&quot;;
156
157SELECT p
158FROM products p
159WHERE SOME c IN p.categories SATISFIES c = &quot;Food&quot;;
160</pre></div></div>
161
162<p>Let us now create an index on the <tt>qty</tt> and <tt>price</tt> fields in the <tt>items</tt> array of the <tt>orders</tt> dataset.</p>
163
164<div>
165<div>
166<pre class="source">CREATE INDEX oItemsQtyPriceIdx ON orders (UNNEST items SELECT qty, price);
167</pre></div></div>
168
169<p>Now suppose we want to find all orders that only have items with large quantities and low prices, not counting orders without any items. The following universal quantification query will utilize the index we just created.</p>
170
171<div>
172<div>
173<pre class="source">SET `compiler.arrayindex` &quot;true&quot;;
174
175SELECT o
176FROM orders o
177WHERE LEN(o.items) &gt; 0 AND
178 (EVERY i IN o.items SATISFIES i.qty &gt; 100 AND i.price &lt; 5.00);
179</pre></div></div>
180
181<p>Take note of the <tt>LEN(o.items) &gt; 0</tt> conjunct. Array indexes cannot be used for queries with potentially empty arrays.</p></div>
182<div class="section">
183<h2><a name="Explicit_Unnesting_Queries_.5BBack_to_TOC.5D"></a><a name="ExplicitUnnestQueries" id="ExplicitUnnestQueries">Explicit Unnesting Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
184<p>Array indexes can also be used to accelerate queries that involve the explicit unnesting of array fields. We can express the same membership / existential example above using an explicit <tt>UNNEST</tt> query. (To keep the same cardinality as the query above (i.e. to undo the <tt>UNNEST</tt>), we add a <tt>DISTINCT</tt> clause, though the index would be utilized either way.)</p>
185
186<div>
187<div>
188<pre class="source">SET `compiler.arrayindex` &quot;true&quot;;
189
190SELECT DISTINCT p
191FROM products p, p.categories c
192WHERE c = &quot;Food&quot;;
193</pre></div></div>
194
195<p>As another example, suppose that we want to find all orders that have <i>some</i> item with a large quantity. The following query will utilize the <tt>oItemsQtyPriceIdx</tt> we created, using only the first field in the index <tt>qty</tt>.</p>
196
197<div>
198<div>
199<pre class="source">SET `compiler.arrayindex` &quot;true&quot;;
200
201SELECT DISTINCT o
202FROM orders o, o.items i
203WHERE i.qty &gt; 100;
204</pre></div></div>
205</div>
206<div class="section">
207<h2><a name="Join_Queries_.5BBack_to_TOC.5D"></a><a name="JoinQueries" id="JoinQueries">Join Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
208<p>Finally, array indexes can also be used for index nested-loop joins if the field being joined is located within an array. Let us create another index on the <tt>items</tt> array of the <tt>orders</tt> dataset, this time on the <tt>productno</tt> field.</p>
209
210<div>
211<div>
212<pre class="source">CREATE INDEX oProductIDIdx ON orders (UNNEST items SELECT productno);
213</pre></div></div>
214
215<p>Now suppose we want to find all products located in a specific order. We can accomplish this with the join query below. Note that we must specify the <tt>indexnl</tt> join hint to tell AsterixDB that we want to optimize this specific join, as hash joins are the default join method otherwise.</p>
216
217<div>
218<div>
219<pre class="source">SET `compiler.arrayindex` &quot;true&quot;;
220
221SELECT DISTINCT p
222FROM products p, orders o, o.items i
223WHERE i.productno /*+ indexnl */ = p.productno
224 AND o.custid = &quot;C41&quot;;
225</pre></div></div>
226</div>
227<div class="section">
228<h2><a name="Complex_Indexing_Examples_.5BBack_to_TOC.5D"></a><a name="ComplexIndexingExamples" id="ComplexIndexingExamples">Complex Indexing Examples</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
229<div class="section">
230<h3><a name="Open_Indexes"></a>Open Indexes</h3>
231<p>Similar to atomic indexes, array indexes are not limited to closed fields. The following DDLs illustrate how we could express <tt>CREATE INDEX</tt> statements comparable to those above if the to-be-indexed fields were not included in the their dataset&#x2019;s type definitions.</p>
232
233<div>
234<div>
235<pre class="source">CREATE INDEX pCategoriesIdx ON products (UNNEST categories : string);
236CREATE INDEX oItemsQtyPriceIdx ON orders (UNNEST items SELECT qty : int, price : int);
237CREATE INDEX oProductIDIdx ON orders (UNNEST items SELECT productno : int);
238</pre></div></div>
239</div>
240<div class="section">
241<h3><a name="Arrays_in_Arrays"></a>Arrays in Arrays</h3>
242<p>Array indexes are not just limited to arrays of depth = 1. We can generalize this to arrays of arbitrary depth, as long as an object encapsulates each array. The following DDLs describe indexing the <tt>qty</tt> field in an <tt>items</tt> array at various depths.</p>
243
244<div>
245<div>
246<pre class="source">// { orderno: ..., items0: [ { items1: [ { qty: int, ... } ] } ] }
247CREATE INDEX oItemItemQtyIdx ON orders (UNNEST items0 UNNEST items1 SELECT qty);
248
249// { orderno: ..., items0: [ { items1: [ { items2: [ { qty: int, ... } ] } ] } ] }
250CREATE INDEX oItemItemItemQtyIdx ON orders (UNNEST items0 UNNEST items1 UNNEST items2 SELECT qty);
251</pre></div></div>
252
253<p>The queries below will utilize the indexes above. The first query utilizes the <tt>oItemItemQtyIdx</tt> index through nested existential quantification. The second query utilizes the <tt>oItemItemItemQtyIdx</tt> index with three unnesting clauses.</p>
254
255<div>
256<div>
257<pre class="source">SET `compiler.arrayindex` &quot;true&quot;;
258
259SELECT o
260FROM orders o
261WHERE SOME o0 IN o.items0 SATISFIES (
262 SOME o1 IN o0.items1 SATISFIES o1.qty = 100
263);
264
265SELECT DISTINCT o
266FROM orders o, o.items0 o0, o0.items1 o1, o1.items2 o2
267WHERE o2.qty = 100;
268</pre></div></div></div></div>
269 </div>
270 </div>
271 </div>
272 <hr/>
273 <footer>
274 <div class="container-fluid">
275 <div class="row-fluid">
276<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
277 feather logo, and the Apache AsterixDB project logo are either
278 registered trademarks or trademarks of The Apache Software
279 Foundation in the United States and other countries.
280 All other marks mentioned may be trademarks or registered
281 trademarks of their respective owners.
282 </div>
283 </div>
284 </div>
285 </footer>
286 </body>
287</html>