blob: b6d4ae9e45b810bb48e232e29b7d55f9e97fc153 [file] [log] [blame]
Ian Maxonbf8620b2024-04-01 16:09:18 -07001<!DOCTYPE html>
2<!--
3 | Generated by Apache Maven Doxia Site Renderer 1.8.1 from src/site/markdown/sqlpp/arrayindex.md at 2024-04-01
4 | Rendered using Apache Maven Fluido Skin 1.7
5-->
6<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7 <head>
8 <meta charset="UTF-8" />
9 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
10 <meta name="Date-Revision-yyyymmdd" content="20240401" />
11 <meta http-equiv="Content-Language" content="en" />
12 <title>AsterixDB &#x2013; AsterixDB Support of Array Indexes</title>
13 <link rel="stylesheet" href="../css/apache-maven-fluido-1.7.min.css" />
14 <link rel="stylesheet" href="../css/site.css" />
15 <link rel="stylesheet" href="../css/print.css" media="print" />
16 <script type="text/javascript" src="../js/apache-maven-fluido-1.7.min.js"></script>
17
18 </head>
19 <body class="topBarDisabled">
20 <div class="container-fluid">
21 <div id="banner">
22 <div class="pull-left"><a href=".././" id="bannerLeft"><img src="../images/asterixlogo.png" alt="AsterixDB"/></a></div>
23 <div class="pull-right"></div>
24 <div class="clear"><hr/></div>
25 </div>
26
27 <div id="breadcrumbs">
28 <ul class="breadcrumb">
29 <li id="publishDate">Last Published: 2024-04-01</li>
30 <li id="projectVersion" class="pull-right">Version: 0.9.9</li>
31 <li class="pull-right"><a href="../index.html" title="Documentation Home">Documentation Home</a></li>
32 </ul>
33 </div>
34 <div class="row-fluid">
35 <div id="leftColumn" class="span2">
36 <div class="well sidebar-nav">
37 <ul class="nav nav-list">
38 <li class="nav-header">Get Started - Installation</li>
39 <li><a href="../ncservice.html" title="Option 1: using NCService"><span class="none"></span>Option 1: using NCService</a></li>
40 <li><a href="../ansible.html" title="Option 2: using Ansible"><span class="none"></span>Option 2: using Ansible</a></li>
41 <li><a href="../aws.html" title="Option 3: using Amazon Web Services"><span class="none"></span>Option 3: using Amazon Web Services</a></li>
42 <li class="nav-header">AsterixDB Primer</li>
43 <li><a href="../sqlpp/primer-sqlpp.html" title="Using SQL++"><span class="none"></span>Using SQL++</a></li>
44 <li class="nav-header">Data Model</li>
45 <li><a href="../datamodel.html" title="The Asterix Data Model"><span class="none"></span>The Asterix Data Model</a></li>
46 <li class="nav-header">Queries</li>
47 <li><a href="../sqlpp/manual.html" title="The SQL++ Query Language"><span class="none"></span>The SQL++ Query Language</a></li>
48 <li><a href="../SQLPP.html" title="Raw SQL++ Grammar"><span class="none"></span>Raw SQL++ Grammar</a></li>
49 <li><a href="../sqlpp/builtins.html" title="Builtin Functions"><span class="none"></span>Builtin Functions</a></li>
50 <li class="nav-header">API/SDK</li>
51 <li><a href="../api.html" title="HTTP API"><span class="none"></span>HTTP API</a></li>
52 <li><a href="../csv.html" title="CSV Output"><span class="none"></span>CSV Output</a></li>
53 <li class="nav-header">Advanced Features</li>
54 <li><a href="../aql/externaldata.html" title="Accessing External Data"><span class="none"></span>Accessing External Data</a></li>
55 <li><a href="../feeds.html" title="Data Ingestion with Feeds"><span class="none"></span>Data Ingestion with Feeds</a></li>
56 <li><a href="../udf.html" title="User Defined Functions"><span class="none"></span>User Defined Functions</a></li>
57 <li><a href="../sqlpp/filters.html" title="Filter-Based LSM Index Acceleration"><span class="none"></span>Filter-Based LSM Index Acceleration</a></li>
58 <li><a href="../sqlpp/fulltext.html" title="Support of Full-text Queries"><span class="none"></span>Support of Full-text Queries</a></li>
59 <li><a href="../sqlpp/similarity.html" title="Support of Similarity Queries"><span class="none"></span>Support of Similarity Queries</a></li>
60 <li><a href="../geo/quickstart.html" title="GIS Support Overview"><span class="none"></span>GIS Support Overview</a></li>
61 <li><a href="../geo/functions.html" title="GIS Functions"><span class="none"></span>GIS Functions</a></li>
62 <li><a href="../interval_join.html" title="Support of Interval Joins"><span class="none"></span>Support of Interval Joins</a></li>
63 <li><a href="../spatial_join.html" title="Support of Spatial Joins"><span class="none"></span>Support of Spatial Joins</a></li>
64 <li class="active"><a href="#"><span class="none"></span>Support of Array Indexes</a></li>
65 <li class="nav-header">Deprecated</li>
66 <li><a href="../aql/primer.html" title="AsterixDB Primer: Using AQL"><span class="none"></span>AsterixDB Primer: Using AQL</a></li>
67 <li><a href="../aql/manual.html" title="Queries: The Asterix Query Language (AQL)"><span class="none"></span>Queries: The Asterix Query Language (AQL)</a></li>
68 <li><a href="../aql/builtins.html" title="Queries: Builtin Functions (AQL)"><span class="none"></span>Queries: Builtin Functions (AQL)</a></li>
69</ul>
70 <hr />
71 <div id="poweredBy">
72 <div class="clear"></div>
73 <div class="clear"></div>
74 <div class="clear"></div>
75 <div class="clear"></div>
76<a href=".././" title="AsterixDB" class="builtBy"><img class="builtBy" alt="AsterixDB" src="../images/asterixlogo.png" /></a>
77 </div>
78 </div>
79 </div>
80 <div id="bodyColumn" class="span10" >
81<!--
82 ! Licensed to the Apache Software Foundation (ASF) under one
83 ! or more contributor license agreements. See the NOTICE file
84 ! distributed with this work for additional information
85 ! regarding copyright ownership. The ASF licenses this file
86 ! to you under the Apache License, Version 2.0 (the
87 ! "License"); you may not use this file except in compliance
88 ! with the License. You may obtain a copy of the License at
89 !
90 ! http://www.apache.org/licenses/LICENSE-2.0
91 !
92 ! Unless required by applicable law or agreed to in writing,
93 ! software distributed under the License is distributed on an
94 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
95 ! KIND, either express or implied. See the License for the
96 ! specific language governing permissions and limitations
97 ! under the License.
98 !-->
99<h1>AsterixDB Support of Array Indexes</h1>
100<div class="section">
101<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
102<ul>
103
104<li><a href="#Overview">Overview</a></li>
105<li><a href="#QuantificationQueries">Quantification Queries</a></li>
106<li><a href="#ExplicitUnnestQueries">Explicit Unnesting Queries</a></li>
107<li><a href="#JoinQueries">Join Queries</a></li>
108<li><a href="#ComplexIndexingExamples">Complex Indexing Examples</a></li>
109</ul></div>
110<div class="section">
111<h2><a name="Overview_.5BBack_to_TOC.5D"></a><a name="Overview" id="Overview">Overview</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
112<p>Array indexes are used in applications where users want to accelerate a query that involves some array-valued or multiset-valued field. This enables fast evaluation of predicates in queries involving arrays or multisets in datasets. For brevity, all further mentions of array-valued fields are also applicable to multiset-valued fields.</p>
113<p>Array-valued fields are a natural data modeling concept for documents. In the traditional inventory management example, it is natural for the line items of an order to exist as a part of the order itself. Previously if an AsterixDB user wanted to optimize a query involving a predicate on the line items of an order, they would a) have to undertake some form of schema migration to separate the line items from the orders into different datasets, b) create an index on the new dataset for line items, and finally c) modify their query to join orders and line items. With the introduction of array indexes in AsterixDB, users can keep their arrays intact and still reap the performance benefits of an index.</p>
114<p>It should be noted that in AsterixDB, array indexes are <i>not</i> meant to serve as covering indexes. In fact due to AsterixDB&#x2019;s record-level locking, index-only plans involving multi-valued fields (i.e. array indexes and inverted indexes) are not currently possible. Instead, array indexes are simply meant to accelerate queries involving multi-valued fields.</p></div>
115<div class="section">
116<h2><a name="Quantification_Queries_.5BBack_to_TOC.5D"></a><a name="QuantificationQueries" id="QuantificationQueries">Quantification Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
117<p>A common use-case for array indexes involves quantifying some or all elements within an array. Quantification queries have two variants: existential and universal. Existential queries ask if <i>any</i> element in some array satisfies a given predicate. Membership queries are a specific type of existential query, asking if any element in some array is equal to a particular value. Universal queries ask if <i>all</i> elements in some array satisfy a particular predicate. Empty arrays are not stored in an array index, meaning that a user must additionally specify that the array is non-empty to tell AsterixDB that it is possible to use an array index as an access method for the given query.</p>
118<p>All query examples here will use the orders and products datasets below.</p>
119
120<div>
121<div>
122<pre class="source">CREATE TYPE ordersType AS {
123 orderno: int,
124 custid: string,
125 items: [{ itemno: int, productno: int, qty: int, price: float }]
126};
127CREATE DATASET orders (ordersType) PRIMARY KEY orderno;
128
129CREATE TYPE productsType AS {
130 productno: int,
131 categories: {{ string }}
132};
133CREATE DATASET products (productsType) PRIMARY KEY productno;
134</pre></div></div>
135
136<p>Let us now create an index on the <tt>categories</tt> multiset of the <tt>products</tt> dataset.</p>
137
138<div>
139<div>
140<pre class="source">CREATE INDEX pCategoriesIdx ON products (UNNEST categories) EXCLUDE UNKNOWN KEY;
141</pre></div></div>
142
143<p>Suppose we now want to find all products that have the category &#x201c;Food&#x201d;. The following membership query will utilize the index we just created.</p>
144
145<div>
146<div>
147<pre class="source">SELECT p
148FROM products p
149WHERE &quot;Food&quot; IN p.categories;
150</pre></div></div>
151
152<p>We can also rewrite the query above as an explicit existential quantification query with an equality predicate and the index will be utilized.</p>
153
154<div>
155<div>
156<pre class="source">SELECT p
157FROM products p
158WHERE SOME c IN p.categories SATISFIES c = &quot;Food&quot;;
159</pre></div></div>
160
161<p>Let us now create an index on the <tt>qty</tt> and <tt>price</tt> fields in the <tt>items</tt> array of the <tt>orders</tt> dataset.</p>
162
163<div>
164<div>
165<pre class="source">CREATE INDEX oItemsQtyPriceIdx ON orders (UNNEST items SELECT qty, price) EXCLUDE UNKNOWN KEY;
166</pre></div></div>
167
168<p>Now suppose we want to find all orders that only have items with large quantities and low prices, not counting orders without any items. The following universal quantification query will utilize the index we just created.</p>
169
170<div>
171<div>
172<pre class="source">SELECT o
173FROM orders o
174WHERE SOME AND EVERY i IN o.items SATISFIES i.qty &gt; 100 AND i.price &lt; 5.00;
175</pre></div></div>
176
177<p>Take note of the <tt>SOME AND EVERY</tt> quantifier instead of the <tt>EVERY</tt> quantifier. Array indexes cannot be used for queries with potentially empty arrays.</p></div>
178<div class="section">
179<h2><a name="Explicit_Unnesting_Queries_.5BBack_to_TOC.5D"></a><a name="ExplicitUnnestQueries" id="ExplicitUnnestQueries">Explicit Unnesting Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
180<p>Array indexes can also be used to accelerate queries that involve the explicit unnesting of array fields. We can express the same membership / existential example above using an explicit <tt>UNNEST</tt> query. (To keep the same cardinality as the query above (i.e. to undo the <tt>UNNEST</tt>), we add a <tt>DISTINCT</tt> clause, though the index would be utilized either way.)</p>
181
182<div>
183<div>
184<pre class="source">SELECT DISTINCT p
185FROM products p, p.categories c
186WHERE c = &quot;Food&quot;;
187</pre></div></div>
188
189<p>As another example, suppose that we want to find all orders that have <i>some</i> item with a large quantity. The following query will utilize the <tt>oItemsQtyPriceIdx</tt> we created, using only the first field in the index <tt>qty</tt>.</p>
190
191<div>
192<div>
193<pre class="source">SELECT DISTINCT o
194FROM orders o, o.items i
195WHERE i.qty &gt; 100;
196</pre></div></div>
197</div>
198<div class="section">
199<h2><a name="Join_Queries_.5BBack_to_TOC.5D"></a><a name="JoinQueries" id="JoinQueries">Join Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
200<p>Finally, array indexes can also be used for index nested-loop joins if the field being joined is located within an array. Let us create another index on the <tt>items</tt> array of the <tt>orders</tt> dataset, this time on the <tt>productno</tt> field.</p>
201
202<div>
203<div>
204<pre class="source">CREATE INDEX oProductIDIdx ON orders (UNNEST items SELECT productno) EXCLUDE UNKNOWN KEY;
205</pre></div></div>
206
207<p>Now suppose we want to find all products located in a specific order. We can accomplish this with the join query below. Note that we must specify the <tt>indexnl</tt> join hint to tell AsterixDB that we want to optimize this specific join, as hash joins are the default join method otherwise.</p>
208
209<div>
210<div>
211<pre class="source">SELECT DISTINCT p
212FROM products p, orders o
213WHERE o.custid = &quot;C41&quot; AND
214 SOME i IN o.items SATISFIES i.productno /*+ indexnl */ = p.productno;
215</pre></div></div>
216</div>
217<div class="section">
218<h2><a name="Complex_Indexing_Examples_.5BBack_to_TOC.5D"></a><a name="ComplexIndexingExamples" id="ComplexIndexingExamples">Complex Indexing Examples</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
219<div class="section">
220<h3><a name="Open_Indexes"></a>Open Indexes</h3>
221<p>Similar to atomic indexes, array indexes are not limited to closed fields. The following DDLs illustrate how we could express <tt>CREATE INDEX</tt> statements comparable to those above if the to-be-indexed fields were not included in the their dataset&#x2019;s type definitions.</p>
222
223<div>
224<div>
225<pre class="source">CREATE INDEX pCategoriesIdx ON products (UNNEST categories : string) EXCLUDE UNKNOWN KEY;
226CREATE INDEX oItemsQtyPriceIdx ON orders (UNNEST items SELECT qty : int, price : int) EXCLUDE UNKNOWN KEY;
227CREATE INDEX oProductIDIdx ON orders (UNNEST items SELECT productno : int) EXCLUDE UNKNOWN KEY;
228</pre></div></div>
229</div>
230<div class="section">
231<h3><a name="Composite_Atomic-Array_Indexes"></a>Composite Atomic-Array Indexes</h3>
232<p>Indexed elements within array indexes are also not limited to fields within arrays. The following DDLs demonstrate indexing fields that are within an array and fields that are outside any array.</p>
233
234<div>
235<div>
236<pre class="source">CREATE INDEX oOrderNoItemPriceIdx ON orders (orderno, ( UNNEST items SELECT price )) EXCLUDE UNKNOWN KEY;
237CREATE INDEX oOrderItemPriceNoIdx ON orders (( UNNEST items SELECT price ), orderno) EXCLUDE UNKNOWN KEY;
238</pre></div></div>
239</div>
240<div class="section">
241<h3><a name="Arrays_in_Arrays"></a>Arrays in Arrays</h3>
242<p>Array indexes are not just limited to arrays of depth = 1. We can generalize this to arrays of arbitrary depth, as long as an object encapsulates each array. The following DDLs describe indexing the <tt>qty</tt> field in an <tt>items</tt> array at various depths.</p>
243
244<div>
245<div>
246<pre class="source">// { orderno: ..., items0: [ { items1: [ { qty: int, ... } ] } ] }
247CREATE INDEX oItemItemQtyIdx ON orders (UNNEST items0 UNNEST items1 SELECT qty) EXCLUDE UNKNOWN KEY;
248
249// { orderno: ..., items0: [ { items1: [ { items2: [ { qty: int, ... } ] } ] } ] }
250CREATE INDEX oItemItemItemQtyIdx ON orders (UNNEST items0 UNNEST items1 UNNEST items2 SELECT qty) EXCLUDE UNKNOWN KEY;
251</pre></div></div>
252
253<p>The queries below will utilize the indexes above. The first query utilizes the <tt>oItemItemQtyIdx</tt> index through nested existential quantification. The second query utilizes the <tt>oItemItemItemQtyIdx</tt> index with three unnesting clauses.</p>
254
255<div>
256<div>
257<pre class="source">SELECT o
258FROM orders o
259WHERE SOME o0 IN o.items0 SATISFIES (
260 SOME o1 IN o0.items1 SATISFIES o1.qty = 100
261);
262
263SELECT DISTINCT o
264FROM orders o, o.items0 o0, o0.items1 o1, o1.items2 o2
265WHERE o2.qty = 100;
266</pre></div></div></div></div>
267 </div>
268 </div>
269 </div>
270 <hr/>
271 <footer>
272 <div class="container-fluid">
273 <div class="row-fluid">
274<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
275 feather logo, and the Apache AsterixDB project logo are either
276 registered trademarks or trademarks of The Apache Software
277 Foundation in the United States and other countries.
278 All other marks mentioned may be trademarks or registered
279 trademarks of their respective owners.
280 </div>
281 </div>
282 </div>
283 </footer>
284 </body>
285</html>