blob: 8d62f4409d2e0a5b118a96a82e7d900a2c265b2e [file] [log] [blame]
Ian Maxona1cc51b2020-08-07 13:11:35 -07001<!DOCTYPE html>
2<!--
3 | Generated by Apache Maven Doxia Site Renderer 1.8.1 from target/generated-site/markdown/sqlpp/manual.md at 2020-08-07
4 | Rendered using Apache Maven Fluido Skin 1.7
5-->
6<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7 <head>
8 <meta charset="UTF-8" />
9 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
10 <meta name="Date-Revision-yyyymmdd" content="20200807" />
11 <meta http-equiv="Content-Language" content="en" />
12 <title>AsterixDB &#x2013; The Query Language</title>
13 <link rel="stylesheet" href="../css/apache-maven-fluido-1.7.min.css" />
14 <link rel="stylesheet" href="../css/site.css" />
15 <link rel="stylesheet" href="../css/print.css" media="print" />
16 <script type="text/javascript" src="../js/apache-maven-fluido-1.7.min.js"></script>
17
18 </head>
19 <body class="topBarDisabled">
20 <div class="container-fluid">
21 <div id="banner">
22 <div class="pull-left"><a href=".././" id="bannerLeft"><img src="../images/asterixlogo.png" alt="AsterixDB"/></a></div>
23 <div class="pull-right"></div>
24 <div class="clear"><hr/></div>
25 </div>
26
27 <div id="breadcrumbs">
28 <ul class="breadcrumb">
29 <li id="publishDate">Last Published: 2020-08-07</li>
30 <li id="projectVersion" class="pull-right">Version: 0.9.5</li>
31 <li class="pull-right"><a href="../index.html" title="Documentation Home">Documentation Home</a></li>
32 </ul>
33 </div>
34 <div class="row-fluid">
35 <div id="leftColumn" class="span2">
36 <div class="well sidebar-nav">
37 <ul class="nav nav-list">
38 <li class="nav-header">Get Started - Installation</li>
39 <li><a href="../ncservice.html" title="Option 1: using NCService"><span class="none"></span>Option 1: using NCService</a></li>
40 <li><a href="../ansible.html" title="Option 2: using Ansible"><span class="none"></span>Option 2: using Ansible</a></li>
41 <li><a href="../aws.html" title="Option 3: using Amazon Web Services"><span class="none"></span>Option 3: using Amazon Web Services</a></li>
42 <li class="nav-header">AsterixDB Primer</li>
43 <li><a href="../sqlpp/primer-sqlpp.html" title="Using SQL++"><span class="none"></span>Using SQL++</a></li>
44 <li class="nav-header">Data Model</li>
45 <li><a href="../datamodel.html" title="The Asterix Data Model"><span class="none"></span>The Asterix Data Model</a></li>
46 <li class="nav-header">Queries</li>
47 <li class="active"><a href="#"><span class="none"></span>The SQL++ Query Language</a></li>
48 <li><a href="../sqlpp/builtins.html" title="Builtin Functions"><span class="none"></span>Builtin Functions</a></li>
49 <li class="nav-header">API/SDK</li>
50 <li><a href="../api.html" title="HTTP API"><span class="none"></span>HTTP API</a></li>
51 <li><a href="../csv.html" title="CSV Output"><span class="none"></span>CSV Output</a></li>
52 <li class="nav-header">Advanced Features</li>
53 <li><a href="../aql/externaldata.html" title="Accessing External Data"><span class="none"></span>Accessing External Data</a></li>
54 <li><a href="../feeds.html" title="Data Ingestion with Feeds"><span class="none"></span>Data Ingestion with Feeds</a></li>
55 <li><a href="../udf.html" title="User Defined Functions"><span class="none"></span>User Defined Functions</a></li>
56 <li><a href="../sqlpp/filters.html" title="Filter-Based LSM Index Acceleration"><span class="none"></span>Filter-Based LSM Index Acceleration</a></li>
57 <li><a href="../sqlpp/fulltext.html" title="Support of Full-text Queries"><span class="none"></span>Support of Full-text Queries</a></li>
58 <li><a href="../sqlpp/similarity.html" title="Support of Similarity Queries"><span class="none"></span>Support of Similarity Queries</a></li>
59 <li class="nav-header">Deprecated</li>
60 <li><a href="../aql/primer.html" title="AsterixDB Primer: Using AQL"><span class="none"></span>AsterixDB Primer: Using AQL</a></li>
61 <li><a href="../aql/manual.html" title="Queries: The Asterix Query Language (AQL)"><span class="none"></span>Queries: The Asterix Query Language (AQL)</a></li>
62 <li><a href="../aql/builtins.html" title="Queries: Builtin Functions (AQL)"><span class="none"></span>Queries: Builtin Functions (AQL)</a></li>
63</ul>
64 <hr />
65 <div id="poweredBy">
66 <div class="clear"></div>
67 <div class="clear"></div>
68 <div class="clear"></div>
69 <div class="clear"></div>
70<a href=".././" title="AsterixDB" class="builtBy"><img class="builtBy" alt="AsterixDB" src="../images/asterixlogo.png" /></a>
71 </div>
72 </div>
73 </div>
74 <div id="bodyColumn" class="span10" >
75<!--
76 ! Licensed to the Apache Software Foundation (ASF) under one
77 ! or more contributor license agreements. See the NOTICE file
78 ! distributed with this work for additional information
79 ! regarding copyright ownership. The ASF licenses this file
80 ! to you under the Apache License, Version 2.0 (the
81 ! "License"); you may not use this file except in compliance
82 ! with the License. You may obtain a copy of the License at
83 !
84 ! http://www.apache.org/licenses/LICENSE-2.0
85 !
86 ! Unless required by applicable law or agreed to in writing,
87 ! software distributed under the License is distributed on an
88 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
89 ! KIND, either express or implied. See the License for the
90 ! specific language governing permissions and limitations
91 ! under the License.
92 !-->
93<h1>The Query Language</h1>
94<ul>
95
96<li><a href="#Introduction">1. Introduction</a></li>
97<li><a href="#Expressions">2. Expressions</a>
98<ul>
99
100<li><a href="#Operator_expressions">Operator Expressions</a>
101<ul>
102
103<li><a href="#Arithmetic_operators">Arithmetic Operators</a></li>
104<li><a href="#Collection_operators">Collection Operators</a></li>
105<li><a href="#Comparison_operators">Comparison Operators</a></li>
106<li><a href="#Logical_operators">Logical Operators</a></li>
107</ul>
108</li>
109<li><a href="#Quantified_expressions">Quantified Expressions</a></li>
110<li><a href="#Path_expressions">Path Expressions</a></li>
111<li><a href="#Primary_expressions">Primary Expressions</a>
112<ul>
113
114<li><a href="#Literals">Literals</a></li>
115<li><a href="#Variable_references">Variable References</a></li>
116<li><a href="#Parenthesized_expressions">Parenthesized Expressions</a></li>
117<li><a href="#Function_call_expressions">Function call Expressions</a></li>
118<li><a href="#Case_expressions">Case Expressions</a></li>
119<li><a href="#Constructors">Constructors</a></li>
120</ul>
121</li>
122</ul>
123</li>
124<li><a href="#Queries">3. Queries</a>
125<ul>
126
127<li><a href="#Declarations">Declarations</a></li>
128<li><a href="#SELECT_statements">SELECT Statements</a></li>
129<li><a href="#Select_clauses">SELECT Clauses</a>
130<ul>
131
132<li><a href="#Select_element">Select Element/Value/Raw</a></li>
133<li><a href="#SQL_select">SQL-style Select</a></li>
134<li><a href="#Select_star">Select *</a></li>
135<li><a href="#Select_distinct">Select Distinct</a></li>
136<li><a href="#Unnamed_projections">Unnamed Projections</a></li>
137<li><a href="#Abbreviated_field_access_expressions">Abbreviated Field Access Expressions</a></li>
138</ul>
139</li>
140<li><a href="#Unnest_clauses">UNNEST Clauses</a>
141<ul>
142
143<li><a href="#Inner_unnests">Inner Unnests</a></li>
144<li><a href="#Left_outer_unnests">Left Outer Unnests</a></li>
145<li><a href="#Expressing_joins_using_unnests">Expressing Joins Using Unnests</a></li>
146</ul>
147</li>
148<li><a href="#From_clauses">FROM clauses</a>
149<ul>
150
151<li><a href="#Binding_expressions">Binding Expressions</a></li>
152<li><a href="#Multiple_from_terms">Multiple From Terms</a></li>
153<li><a href="#Expressing_joins_using_from_terms">Expressing Joins Using From Terms</a></li>
154<li><a href="#Implicit_binding_variables">Implicit Binding Variables</a></li>
155</ul>
156</li>
157<li><a href="#Join_clauses">JOIN Clauses</a>
158<ul>
159
160<li><a href="#Inner_joins">Inner Joins</a></li>
161<li><a href="#Left_outer_joins">Left Outer Joins</a></li>
162</ul>
163</li>
164<li><a href="#Group_By_clauses">GROUP BY Clauses</a>
165<ul>
166
167<li><a href="#Group_variables">Group Variables</a></li>
168<li><a href="#Implicit_group_key_variables">Implicit Group Key Variables</a></li>
169<li><a href="#Implicit_group_variables">Implicit Group Variables</a></li>
170<li><a href="#Aggregation_functions">Aggregation Functions</a></li>
171<li><a href="#SQL-92_aggregation_functions">SQL-92 Aggregation Functions</a></li>
172<li><a href="#SQL-92_compliant_gby">SQL-92 Compliant GROUP BY Aggregations</a></li>
173<li><a href="#Column_aliases">Column Aliases</a></li>
174</ul>
175</li>
176<li><a href="#Where_having_clauses">WHERE Clauses and HAVING Clauses</a></li>
177<li><a href="#Order_By_clauses">ORDER BY Clauses</a></li>
178<li><a href="#Limit_clauses">LIMIT Clauses</a></li>
179<li><a href="#With_clauses">WITH Clauses</a></li>
180<li><a href="#Let_clauses">LET Clauses</a></li>
181<li><a href="#Union_all">UNION ALL</a></li>
182<li><a href="#Over_clauses">OVER Clauses</a>
183<ul>
184
185<li><a href="#Window_function_call">Window Function Call</a></li>
186<li><a href="#Window_function_options">Window Function Options</a></li>
187<li><a href="#Window_frame_variable">Window Frame Variable</a></li>
188<li><a href="#Window_definition">Window Definition</a></li>
189</ul>
190</li>
191<li><a href="#Vs_SQL-92">Differences from SQL-92</a></li>
192</ul>
193</li>
194<li><a href="#Errors">4. Errors</a>
195<ul>
196
197<li><a href="#Syntax_errors">Syntax Errors</a></li>
198<li><a href="#Identifier_resolution_errors">Identifier Resolution Errors</a></li>
199<li><a href="#Type_errors">Type Errors</a></li>
200<li><a href="#Resource_errors">Resource Errors</a></li>
201</ul>
202</li>
203<li><a href="#DDL_and_DML_statements">5. DDL and DML Statements</a>
204<ul>
205
206<li><a href="#Lifecycle_management_statements">Lifecycle Management Statements</a>
207<ul>
208
209<li><a href="#Dataverses">Dataverses</a></li>
210<li><a href="#Types">Types</a></li>
211<li><a href="#Datasets">Datasets</a></li>
212<li><a href="#Indices">Indices</a></li>
213<li><a href="#Functions">Functions</a></li>
214<li><a href="#Synonyms">Synonyms</a></li>
215<li><a href="#Removal">Removal</a></li>
216<li><a href="#Load_statement">Load Statement</a></li>
217</ul>
218</li>
219<li><a href="#Modification_statements">Modification Statements</a>
220<ul>
221
222<li><a href="#Inserts">Inserts</a></li>
223<li><a href="#Upserts">Upserts</a></li>
224<li><a href="#Deletes">Deletes</a></li>
225</ul>
226</li>
227</ul>
228</li>
229<li><a href="#Reserved_keywords">Appendix 1. Reserved Keywords</a></li>
230<li><a href="#Performance_tuning">Appendix 2. Performance Tuning</a>
231<ul>
232
233<li><a href="#Parallelism_parameter">Parallelism Parameter</a></li>
234<li><a href="#Memory_parameters">Memory Parameters</a></li>
235</ul>
236</li>
237<li><a href="#Variable_bindings_and_name_resolution">Appendix 3. Variable Bindings and Name Resolution</a></li>
238</ul><!--
239 ! Licensed to the Apache Software Foundation (ASF) under one
240 ! or more contributor license agreements. See the NOTICE file
241 ! distributed with this work for additional information
242 ! regarding copyright ownership. The ASF licenses this file
243 ! to you under the Apache License, Version 2.0 (the
244 ! "License"); you may not use this file except in compliance
245 ! with the License. You may obtain a copy of the License at
246 !
247 ! http://www.apache.org/licenses/LICENSE-2.0
248 !
249 ! Unless required by applicable law or agreed to in writing,
250 ! software distributed under the License is distributed on an
251 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
252 ! KIND, either express or implied. See the License for the
253 ! specific language governing permissions and limitations
254 ! under the License.
255 !-->
256
257<h1><a name="Introduction" id="Introduction">1. Introduction</a><font size="3" /></h1>
258<p>This document is intended as a reference guide to the full syntax and semantics of AsterixDB&#x2019;s query language, a SQL-based language for working with semistructured data. The language is a derivative of SQL++, a declarative query language for JSON data which is largely backwards compatible with SQL. SQL++ originated from research in the FORWARD project at UC San Diego, and it has much in common with SQL; some differences exist due to the different data models that the two languages were designed to serve. SQL was designed for interacting with the flat, schema-ified world of relational databases, while SQL++ generalizes SQL to also handle nested data formats (like JSON) and the schema-optional (or even schema-less) data models of modern NoSQL and BigData systems.</p>
259<p>In the context of Apache AsterixDB, the query language is intended for working with the Asterix Data Model (<a href="../datamodel.html">ADM</a>), a data model based on a superset of JSON with an enriched and flexible type system. New AsterixDB users are encouraged to read and work through the (much friendlier) guide &#x201c;<a href="primer-sqlpp.html">AsterixDB 101: An ADM and SQL++ Primer</a>&#x201d; before attempting to make use of this document. In addition, readers are advised to read through the <a href="../datamodel.html">Asterix Data Model (ADM) reference guide</a> first as well, as an understanding of the data model is a prerequisite to understanding the query language.</p>
260<p>In what follows, we detail the features of the query language in a grammar-guided manner. We list and briefly explain each of the productions in the query grammar, offering examples (and results) for clarity.</p><!--
261 ! Licensed to the Apache Software Foundation (ASF) under one
262 ! or more contributor license agreements. See the NOTICE file
263 ! distributed with this work for additional information
264 ! regarding copyright ownership. The ASF licenses this file
265 ! to you under the Apache License, Version 2.0 (the
266 ! "License"); you may not use this file except in compliance
267 ! with the License. You may obtain a copy of the License at
268 !
269 ! http://www.apache.org/licenses/LICENSE-2.0
270 !
271 ! Unless required by applicable law or agreed to in writing,
272 ! software distributed under the License is distributed on an
273 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
274 ! KIND, either express or implied. See the License for the
275 ! specific language governing permissions and limitations
276 ! under the License.
277 !-->
278
279<h1><a name="Expressions" id="Expressions">2. Expressions</a></h1><!--
280 ! Licensed to the Apache Software Foundation (ASF) under one
281 ! or more contributor license agreements. See the NOTICE file
282 ! distributed with this work for additional information
283 ! regarding copyright ownership. The ASF licenses this file
284 ! to you under the Apache License, Version 2.0 (the
285 ! "License"); you may not use this file except in compliance
286 ! with the License. You may obtain a copy of the License at
287 !
288 ! http://www.apache.org/licenses/LICENSE-2.0
289 !
290 ! Unless required by applicable law or agreed to in writing,
291 ! software distributed under the License is distributed on an
292 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
293 ! KIND, either express or implied. See the License for the
294 ! specific language governing permissions and limitations
295 ! under the License.
296 !-->
297
298<p>The query language is a highly composable expression language. Each expression in the query language returns zero or more data model instances. There are three major kinds of expressions. At the topmost level, an expression can be an OperatorExpression (similar to a mathematical expression) or a QuantifiedExpression (which yields a boolean value). Each will be detailed as we explore the full grammar of the language.</p>
299
300<div>
301<div>
302<pre class="source">Expression ::= OperatorExpression | QuantifiedExpression
303</pre></div></div>
304
305<p>Note that in the following text, words enclosed in angle brackets denote keywords that are not case-sensitive.</p>
306<div class="section">
307<h2><a name="Operator_Expressions"></a><a name="Operator_expressions" id="Operator_expressions">Operator Expressions</a></h2>
308<p>Operators perform a specific operation on the input values or expressions. The syntax of an operator expression is as follows:</p>
309
310<div>
311<div>
312<pre class="source">OperatorExpression ::= PathExpression
313 | Operator OperatorExpression
314 | OperatorExpression Operator (OperatorExpression)?
315 | OperatorExpression &lt;BETWEEN&gt; OperatorExpression &lt;AND&gt; OperatorExpression
316</pre></div></div>
317
318<p>The language provides a full set of operators that you can use within its statements. Here are the categories of operators:</p>
319<ul>
320
321<li><a href="#Arithmetic_operators">Arithmetic Operators</a>, to perform basic mathematical operations;</li>
322<li><a href="#Collection_operators">Collection Operators</a>, to evaluate expressions on collections or objects;</li>
323<li><a href="#Comparison_operators">Comparison Operators</a>, to compare two expressions;</li>
324<li><a href="#Logical_operators">Logical Operators</a>, to combine operators using Boolean logic.</li>
325</ul>
326<p>The following table summarizes the precedence order (from higher to lower) of the major unary and binary operators:</p>
327<table border="0" class="table table-striped">
328<thead>
329
330<tr class="a">
331<th> Operator </th>
332<th> Operation </th></tr>
333</thead><tbody>
334
335<tr class="b">
336<td> EXISTS, NOT EXISTS </td>
337<td> Collection emptiness testing </td></tr>
338<tr class="a">
339<td> ^ </td>
340<td> Exponentiation </td></tr>
341<tr class="b">
342<td> *, /, DIV, MOD (%) </td>
343<td> Multiplication, division, modulo </td></tr>
344<tr class="a">
345<td> +, - </td>
346<td> Addition, subtraction </td></tr>
347<tr class="b">
348<td> || </td>
349<td> String concatenation </td></tr>
350<tr class="a">
351<td> IS NULL, IS NOT NULL, IS MISSING, IS NOT MISSING, <br />IS UNKNOWN, IS NOT UNKNOWN, IS VALUED, IS NOT VALUED </td>
352<td> Unknown value comparison </td></tr>
353<tr class="b">
354<td> BETWEEN, NOT BETWEEN </td>
355<td> Range comparison (inclusive on both sides) </td></tr>
356<tr class="a">
357<td> =, !=, &lt;&gt;, &lt;, &gt;, &lt;=, &gt;=, LIKE, NOT LIKE, IN, NOT IN </td>
358<td> Comparison </td></tr>
359<tr class="b">
360<td> NOT </td>
361<td> Logical negation </td></tr>
362<tr class="a">
363<td> AND </td>
364<td> Conjunction </td></tr>
365<tr class="b">
366<td> OR </td>
367<td> Disjunction </td></tr>
368</tbody>
369</table>
370<p>In general, if any operand evaluates to a <tt>MISSING</tt> value, the enclosing operator will return <tt>MISSING</tt>; if none of operands evaluates to a <tt>MISSING</tt> value but there is an operand evaluates to a <tt>NULL</tt> value, the enclosing operator will return <tt>NULL</tt>. However, there are a few exceptions listed in <a href="#Comparison_operators">comparison operators</a> and <a href="#Logical_operators">logical operators</a>.</p>
371<div class="section">
372<h3><a name="Arithmetic_Operators"></a><a name="Arithmetic_operators" id="Arithmetic_operators">Arithmetic Operators</a></h3>
373<p>Arithmetic operators are used to exponentiate, add, subtract, multiply, and divide numeric values, or concatenate string values.</p>
374<table border="0" class="table table-striped">
375<thead>
376
377<tr class="a">
378<th> Operator </th>
379<th> Purpose </th>
380<th> Example </th></tr>
381</thead><tbody>
382
383<tr class="b">
384<td> +, - </td>
385<td> As unary operators, they denote a <br />positive or negative expression </td>
386<td> SELECT VALUE -1; </td></tr>
387<tr class="a">
388<td> +, - </td>
389<td> As binary operators, they add or subtract </td>
390<td> SELECT VALUE 1 + 2; </td></tr>
391<tr class="b">
392<td> * </td>
393<td> Multiply </td>
394<td> SELECT VALUE 4 * 2; </td></tr>
395<tr class="a">
396<td> / </td>
397<td> Divide (returns a value of type <tt>double</tt> if both operands are integers)</td>
398<td> SELECT VALUE 5 / 2; </td></tr>
399<tr class="b">
400<td> DIV </td>
401<td> Divide (returns an integer value if both operands are integers) </td>
402<td> SELECT VALUE 5 DIV 2; </td></tr>
403<tr class="a">
404<td> MOD (%) </td>
405<td> Modulo </td>
406<td> SELECT VALUE 5 % 2; </td></tr>
407<tr class="b">
408<td> ^ </td>
409<td> Exponentiation </td>
410<td> SELECT VALUE 2^3; </td></tr>
411<tr class="a">
412<td> || </td>
413<td> String concatenation </td>
414<td> SELECT VALUE &#x201c;ab&#x201d;||&#x201c;c&#x201d;||&#x201c;d&#x201d;; </td></tr>
415</tbody>
416</table></div>
417<div class="section">
418<h3><a name="Collection_Operators"></a><a name="Collection_operators" id="Collection_operators">Collection Operators</a></h3>
419<p>Collection operators are used for membership tests (IN, NOT IN) or empty collection tests (EXISTS, NOT EXISTS).</p>
420<table border="0" class="table table-striped">
421<thead>
422
423<tr class="a">
424<th> Operator </th>
425<th> Purpose </th>
426<th> Example </th></tr>
427</thead><tbody>
428
429<tr class="b">
430<td> IN </td>
431<td> Membership test </td>
432<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.lang IN [&#x201c;en&#x201d;, &#x201c;de&#x201d;]; </td></tr>
433<tr class="a">
434<td> NOT IN </td>
435<td> Non-membership test </td>
436<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.lang NOT IN [&#x201c;en&#x201d;]; </td></tr>
437<tr class="b">
438<td> EXISTS </td>
439<td> Check whether a collection is not empty </td>
440<td> SELECT * FROM ChirpMessages cm <br />WHERE EXISTS cm.referredTopics; </td></tr>
441<tr class="a">
442<td> NOT EXISTS </td>
443<td> Check whether a collection is empty </td>
444<td> SELECT * FROM ChirpMessages cm <br />WHERE NOT EXISTS cm.referredTopics; </td></tr>
445</tbody>
446</table></div>
447<div class="section">
448<h3><a name="Comparison_Operators"></a><a name="Comparison_operators" id="Comparison_operators">Comparison Operators</a></h3>
449<p>Comparison operators are used to compare values. The comparison operators fall into one of two sub-categories: missing value comparisons and regular value comparisons. The query language (and JSON) has two ways of representing missing information in a object - the presence of the field with a NULL for its value (as in SQL), and the absence of the field (which JSON permits). For example, the first of the following objects represents Jack, whose friend is Jill. In the other examples, Jake is friendless a la SQL, with a friend field that is NULL, while Joe is friendless in a more natural (for JSON) way, i.e., by not having a friend field.</p>
450<div class="section">
451<div class="section">
452<h5><a name="Examples"></a>Examples</h5>
453<p>{&#x201c;name&#x201d;: &#x201c;Jack&#x201d;, &#x201c;friend&#x201d;: &#x201c;Jill&#x201d;}</p>
454<p>{&#x201c;name&#x201d;: &#x201c;Jake&#x201d;, &#x201c;friend&#x201d;: NULL}</p>
455<p>{&#x201c;name&#x201d;: &#x201c;Joe&#x201d;}</p>
456<p>The following table enumerates all of the query language&#x2019;s comparison operators.</p>
457<table border="0" class="table table-striped">
458<thead>
459
460<tr class="a">
461<th> Operator </th>
462<th> Purpose </th>
463<th> Example </th></tr>
464</thead><tbody>
465
466<tr class="b">
467<td> IS NULL </td>
468<td> Test if a value is NULL </td>
469<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS NULL; </td></tr>
470<tr class="a">
471<td> IS NOT NULL </td>
472<td> Test if a value is not NULL </td>
473<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS NOT NULL; </td></tr>
474<tr class="b">
475<td> IS MISSING </td>
476<td> Test if a value is MISSING </td>
477<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS MISSING; </td></tr>
478<tr class="a">
479<td> IS NOT MISSING </td>
480<td> Test if a value is not MISSING </td>
481<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS NOT MISSING;</td></tr>
482<tr class="b">
483<td> IS UNKNOWN </td>
484<td> Test if a value is NULL or MISSING </td>
485<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS UNKNOWN; </td></tr>
486<tr class="a">
487<td> IS NOT UNKNOWN </td>
488<td> Test if a value is neither NULL nor MISSING </td>
489<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS NOT UNKNOWN;</td></tr>
490<tr class="b">
491<td> IS KNOWN (IS VALUED) </td>
492<td> Test if a value is neither NULL nor MISSING </td>
493<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS KNOWN; </td></tr>
494<tr class="a">
495<td> IS NOT KNOWN (IS NOT VALUED) </td>
496<td> Test if a value is NULL or MISSING </td>
497<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS NOT KNOWN; </td></tr>
498<tr class="b">
499<td> BETWEEN </td>
500<td> Test if a value is between a start value and <br />a end value. The comparison is inclusive <br />to both start and end values. </td>
501<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId BETWEEN 10 AND 20;</td></tr>
502<tr class="a">
503<td> = </td>
504<td> Equality test </td>
505<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId=10; </td></tr>
506<tr class="b">
507<td> != </td>
508<td> Inequality test </td>
509<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId!=10;</td></tr>
510<tr class="a">
511<td> &lt;&gt; </td>
512<td> Inequality test </td>
513<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId&lt;&gt;10;</td></tr>
514<tr class="b">
515<td> &lt; </td>
516<td> Less than </td>
517<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId&lt;10; </td></tr>
518<tr class="a">
519<td> &gt; </td>
520<td> Greater than </td>
521<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId&gt;10; </td></tr>
522<tr class="b">
523<td> &lt;= </td>
524<td> Less than or equal to </td>
525<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId&lt;=10; </td></tr>
526<tr class="a">
527<td> &gt;= </td>
528<td> Greater than or equal to </td>
529<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId&gt;=10; </td></tr>
530<tr class="b">
531<td> LIKE </td>
532<td> Test if the left side matches a<br /> pattern defined on the right<br /> side; in the pattern, &#x201c;%&#x201d; matches <br />any string while &#x201c;_&#x201d; matches <br /> any character. </td>
533<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name LIKE &#x201c;%Giesen%&#x201d;;</td></tr>
534<tr class="a">
535<td> NOT LIKE </td>
536<td> Test if the left side does not <br />match a pattern defined on the right<br /> side; in the pattern, &#x201c;%&#x201d; matches <br />any string while &#x201c;_&#x201d; matches <br /> any character. </td>
537<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name NOT LIKE &#x201c;%Giesen%&#x201d;;</td></tr>
538</tbody>
539</table>
540<p>The following table summarizes how the missing value comparison operators work.</p>
541<table border="0" class="table table-striped">
542<thead>
543
544<tr class="a">
545<th> Operator </th>
546<th> Non-NULL/Non-MISSING value </th>
547<th> NULL </th>
548<th> MISSING </th></tr>
549</thead><tbody>
550
551<tr class="b">
552<td> IS NULL </td>
553<td> FALSE </td>
554<td> TRUE </td>
555<td> MISSING </td></tr>
556<tr class="a">
557<td> IS NOT NULL </td>
558<td> TRUE </td>
559<td> FALSE </td>
560<td> MISSING </td></tr>
561<tr class="b">
562<td> IS MISSING </td>
563<td> FALSE </td>
564<td> FALSE </td>
565<td> TRUE </td></tr>
566<tr class="a">
567<td> IS NOT MISSING </td>
568<td> TRUE </td>
569<td> TRUE </td>
570<td> FALSE </td></tr>
571<tr class="b">
572<td> IS UNKNOWN </td>
573<td> FALSE </td>
574<td> TRUE </td>
575<td> TRUE </td></tr>
576<tr class="a">
577<td> IS NOT UNKNOWN </td>
578<td> TRUE </td>
579<td> FALSE </td>
580<td> FALSE</td></tr>
581<tr class="b">
582<td> IS KNOWN (IS VALUED) </td>
583<td> TRUE </td>
584<td> FALSE </td>
585<td> FALSE </td></tr>
586<tr class="a">
587<td> IS NOT KNOWN (IS NOT VALUED) </td>
588<td> FALSE </td>
589<td> TRUE </td>
590<td> TRUE </td></tr>
591</tbody>
592</table></div></div></div>
593<div class="section">
594<h3><a name="Logical_Operators"></a><a name="Logical_operators" id="Logical_operators">Logical Operators</a></h3>
595<p>Logical operators perform logical <tt>NOT</tt>, <tt>AND</tt>, and <tt>OR</tt> operations over Boolean values (<tt>TRUE</tt> and <tt>FALSE</tt>) plus <tt>NULL</tt> and <tt>MISSING</tt>.</p>
596<table border="0" class="table table-striped">
597<thead>
598
599<tr class="a">
600<th> Operator </th>
601<th> Purpose </th>
602<th> Example </th></tr>
603</thead><tbody>
604
605<tr class="b">
606<td> NOT </td>
607<td> Returns true if the following condition is false, otherwise returns false </td>
608<td> SELECT VALUE NOT TRUE; </td></tr>
609<tr class="a">
610<td> AND </td>
611<td> Returns true if both branches are true, otherwise returns false </td>
612<td> SELECT VALUE TRUE AND FALSE; </td></tr>
613<tr class="b">
614<td> OR </td>
615<td> Returns true if one branch is true, otherwise returns false </td>
616<td> SELECT VALUE FALSE OR FALSE; </td></tr>
617</tbody>
618</table>
619<p>The following table is the truth table for <tt>AND</tt> and <tt>OR</tt>.</p>
620<table border="0" class="table table-striped">
621<thead>
622
623<tr class="a">
624<th> A </th>
625<th> B </th>
626<th> A AND B </th>
627<th> A OR B </th></tr>
628</thead><tbody>
629
630<tr class="b">
631<td> TRUE </td>
632<td> TRUE </td>
633<td> TRUE </td>
634<td> TRUE </td></tr>
635<tr class="a">
636<td> TRUE </td>
637<td> FALSE </td>
638<td> FALSE </td>
639<td> TRUE </td></tr>
640<tr class="b">
641<td> TRUE </td>
642<td> NULL </td>
643<td> NULL </td>
644<td> TRUE </td></tr>
645<tr class="a">
646<td> TRUE </td>
647<td> MISSING </td>
648<td> MISSING </td>
649<td> TRUE </td></tr>
650<tr class="b">
651<td> FALSE </td>
652<td> FALSE </td>
653<td> FALSE </td>
654<td> FALSE </td></tr>
655<tr class="a">
656<td> FALSE </td>
657<td> NULL </td>
658<td> FALSE </td>
659<td> NULL </td></tr>
660<tr class="b">
661<td> FALSE </td>
662<td> MISSING </td>
663<td> FALSE </td>
664<td> MISSING </td></tr>
665<tr class="a">
666<td> NULL </td>
667<td> NULL </td>
668<td> NULL </td>
669<td> NULL </td></tr>
670<tr class="b">
671<td> NULL </td>
672<td> MISSING </td>
673<td> MISSING </td>
674<td> NULL </td></tr>
675<tr class="a">
676<td> MISSING </td>
677<td> MISSING </td>
678<td> MISSING </td>
679<td> MISSING </td></tr>
680</tbody>
681</table>
682<p>The following table demonstrates the results of <tt>NOT</tt> on all possible inputs.</p>
683<table border="0" class="table table-striped">
684<thead>
685
686<tr class="a">
687<th> A </th>
688<th> NOT A </th></tr>
689</thead><tbody>
690
691<tr class="b">
692<td> TRUE </td>
693<td> FALSE </td></tr>
694<tr class="a">
695<td> FALSE </td>
696<td> TRUE </td></tr>
697<tr class="b">
698<td> NULL </td>
699<td> NULL </td></tr>
700<tr class="a">
701<td> MISSING </td>
702<td> MISSING </td></tr>
703</tbody>
704</table></div></div>
705<div class="section">
706<h2><a name="Quantified_Expressions"></a><a name="Quantified_expressions" id="Quantified_expressions">Quantified Expressions</a></h2>
707
708<div>
709<div>
710<pre class="source">QuantifiedExpression ::= ( (&lt;ANY&gt;|&lt;SOME&gt;) | &lt;EVERY&gt; ) Variable &lt;IN&gt; Expression ( &quot;,&quot; Variable &quot;in&quot; Expression )*
711 &lt;SATISFIES&gt; Expression (&lt;END&gt;)?
712</pre></div></div>
713
714<p>Quantified expressions are used for expressing existential or universal predicates involving the elements of a collection.</p>
715<p>The following pair of examples illustrate the use of a quantified expression to test that every (or some) element in the set [1, 2, 3] of integers is less than three. The first example yields <tt>FALSE</tt> and second example yields <tt>TRUE</tt>.</p>
716<p>It is useful to note that if the set were instead the empty set, the first expression would yield <tt>TRUE</tt> (&#x201c;every&#x201d; value in an empty set satisfies the condition) while the second expression would yield <tt>FALSE</tt> (since there isn&#x2019;t &#x201c;some&#x201d; value, as there are no values in the set, that satisfies the condition).</p>
717<p>A quantified expression will return a <tt>NULL</tt> (or <tt>MISSING</tt>) if the first expression in it evaluates to <tt>NULL</tt> (or <tt>MISSING</tt>). A type error will be raised if the first expression in a quantified expression does not return a collection.</p>
718<div class="section">
719<div class="section">
720<div class="section">
721<h5><a name="Examples"></a>Examples</h5>
722
723<div>
724<div>
725<pre class="source">EVERY x IN [ 1, 2, 3 ] SATISFIES x &lt; 3
726SOME x IN [ 1, 2, 3 ] SATISFIES x &lt; 3
727</pre></div></div>
728</div></div></div></div>
729<div class="section">
730<h2><a name="Path_Expressions"></a><a name="Path_expressions" id="Path_expressions">Path Expressions</a></h2>
731
732<div>
733<div>
734<pre class="source">PathExpression ::= PrimaryExpression ( Field | Index )*
735Field ::= &quot;.&quot; Identifier
736Index ::= &quot;[&quot; Expression (&quot;:&quot; ( Expression )? )? &quot;]&quot;
737</pre></div></div>
738
739<p>Components of complex types in the data model are accessed via path expressions. Path access can be applied to the result of a query expression that yields an instance of a complex type, for example, an object or an array instance.</p>
740<p>For objects, path access is based on field names, and it accesses the field whose name was specified.<br /> For arrays, path access is based on (zero-based) array-style indexing. Array indexes can be used to retrieve either a single element from an array, or a whole subset of an array. Accessing a single element is achieved by providing a single index argument (zero-based element position), while obtaining a subset of an array is achieved by providing the <tt>start</tt> and <tt>end</tt> (zero-based) index positions; the returned subset is from position <tt>start</tt> to position <tt>end - 1</tt>; the <tt>end</tt> position argument is optional. If a position argument is negative then the element position is counted from the end of the array (<tt>-1</tt> addresses the last element, <tt>-2</tt> next to last, and so on). Multisets have similar behavior to arrays, except for retrieving arbitrary items as the order of items is not fixed in multisets.</p>
741<p>Attempts to access non-existent fields or out-of-bound array elements produce the special value <tt>MISSING</tt>. Type errors will be raised for inappropriate use of a path expression, such as applying a field accessor to a numeric value.</p>
742<p>The following examples illustrate field access for an object, index-based element access or subset retrieval of an array, and also a composition thereof.</p>
743<div class="section">
744<div class="section">
745<div class="section">
746<h5><a name="Examples"></a>Examples</h5>
747
748<div>
749<div>
750<pre class="source">({&quot;name&quot;: &quot;MyABCs&quot;, &quot;array&quot;: [ &quot;a&quot;, &quot;b&quot;, &quot;c&quot;]}).array
751
752([&quot;a&quot;, &quot;b&quot;, &quot;c&quot;])[2]
753
754([&quot;a&quot;, &quot;b&quot;, &quot;c&quot;])[-1]
755
756({&quot;name&quot;: &quot;MyABCs&quot;, &quot;array&quot;: [ &quot;a&quot;, &quot;b&quot;, &quot;c&quot;]}).array[2]
757
758([&quot;a&quot;, &quot;b&quot;, &quot;c&quot;])[0:2]
759
760([&quot;a&quot;, &quot;b&quot;, &quot;c&quot;])[0:]
761
762([&quot;a&quot;, &quot;b&quot;, &quot;c&quot;])[-2:-1]
763</pre></div></div>
764</div></div></div></div>
765<div class="section">
766<h2><a name="Primary_Expressions"></a><a name="Primary_expressions" id="Primary_expressions">Primary Expressions</a></h2>
767
768<div>
769<div>
770<pre class="source">PrimaryExpr ::= Literal
771 | VariableReference
772 | ParameterReference
773 | ParenthesizedExpression
774 | FunctionCallExpression
775 | CaseExpression
776 | Constructor
777</pre></div></div>
778
779<p>The most basic building block for any expression in the query language is PrimaryExpression. This can be a simple literal (constant) value, a reference to a query variable that is in scope, a parenthesized expression, a function call, or a newly constructed instance of the data model (such as a newly constructed object, array, or multiset of data model instances).</p></div>
780<div class="section">
781<h2><a name="Literals" id="Literals">Literals</a></h2>
782
783<div>
784<div>
785<pre class="source">Literal ::= StringLiteral
786 | IntegerLiteral
787 | FloatLiteral
788 | DoubleLiteral
789 | &lt;NULL&gt;
790 | &lt;MISSING&gt;
791 | &lt;TRUE&gt;
792 | &lt;FALSE&gt;
793StringLiteral ::= &quot;\&quot;&quot; (
794 &lt;EscapeQuot&gt;
795 | &lt;EscapeBslash&gt;
796 | &lt;EscapeSlash&gt;
797 | &lt;EscapeBspace&gt;
798 | &lt;EscapeFormf&gt;
799 | &lt;EscapeNl&gt;
800 | &lt;EscapeCr&gt;
801 | &lt;EscapeTab&gt;
802 | ~[&quot;\&quot;&quot;,&quot;\\&quot;])*
803 &quot;\&quot;&quot;
804 | &quot;\'&quot;(
805 &lt;EscapeApos&gt;
806 | &lt;EscapeBslash&gt;
807 | &lt;EscapeSlash&gt;
808 | &lt;EscapeBspace&gt;
809 | &lt;EscapeFormf&gt;
810 | &lt;EscapeNl&gt;
811 | &lt;EscapeCr&gt;
812 | &lt;EscapeTab&gt;
813 | ~[&quot;\'&quot;,&quot;\\&quot;])*
814 &quot;\'&quot;
815&lt;ESCAPE_Apos&gt; ::= &quot;\\\'&quot;
816&lt;ESCAPE_Quot&gt; ::= &quot;\\\&quot;&quot;
817&lt;EscapeBslash&gt; ::= &quot;\\\\&quot;
818&lt;EscapeSlash&gt; ::= &quot;\\/&quot;
819&lt;EscapeBspace&gt; ::= &quot;\\b&quot;
820&lt;EscapeFormf&gt; ::= &quot;\\f&quot;
821&lt;EscapeNl&gt; ::= &quot;\\n&quot;
822&lt;EscapeCr&gt; ::= &quot;\\r&quot;
823&lt;EscapeTab&gt; ::= &quot;\\t&quot;
824
825IntegerLiteral ::= &lt;DIGITS&gt;
826&lt;DIGITS&gt; ::= [&quot;0&quot; - &quot;9&quot;]+
827FloatLiteral ::= &lt;DIGITS&gt; ( &quot;f&quot; | &quot;F&quot; )
828 | &lt;DIGITS&gt; ( &quot;.&quot; &lt;DIGITS&gt; ( &quot;f&quot; | &quot;F&quot; ) )?
829 | &quot;.&quot; &lt;DIGITS&gt; ( &quot;f&quot; | &quot;F&quot; )
830DoubleLiteral ::= &lt;DIGITS&gt; &quot;.&quot; &lt;DIGITS&gt;
831 | &quot;.&quot; &lt;DIGITS&gt;
832</pre></div></div>
833
834<p>Literals (constants) in a query can be strings, integers, floating point values, double values, boolean constants, or special constant values like <tt>NULL</tt> and <tt>MISSING</tt>. The <tt>NULL</tt> value is like a <tt>NULL</tt> in SQL; it is used to represent an unknown field value. The special value <tt>MISSING</tt> is only meaningful in the context of field accesses; it occurs when the accessed field simply does not exist at all in a object being accessed.</p>
835<p>The following are some simple examples of literals.</p>
836<div class="section">
837<div class="section">
838<div class="section">
839<h5><a name="Examples"></a>Examples</h5>
840
841<div>
842<div>
843<pre class="source">'a string'
844&quot;test string&quot;
84542
846</pre></div></div>
847
848<p>Different from standard SQL, double quotes play the same role as single quotes and may be used for string literals in queries as well.</p></div></div></div>
849<div class="section">
850<h3><a name="Variable_References"></a><a name="Variable_references" id="Variable_references">Variable References</a></h3>
851
852<div>
853<div>
854<pre class="source">VariableReference ::= &lt;IDENTIFIER&gt; | &lt;DelimitedIdentifier&gt;
855&lt;IDENTIFIER&gt; ::= (&lt;LETTER&gt; | &quot;_&quot;) (&lt;LETTER&gt; | &lt;DIGIT&gt; | &quot;_&quot; | &quot;$&quot;)*
856&lt;LETTER&gt; ::= [&quot;A&quot; - &quot;Z&quot;, &quot;a&quot; - &quot;z&quot;]
857DelimitedIdentifier ::= &quot;`&quot; (&lt;EscapeQuot&gt;
858 | &lt;EscapeBslash&gt;
859 | &lt;EscapeSlash&gt;
860 | &lt;EscapeBspace&gt;
861 | &lt;EscapeFormf&gt;
862 | &lt;EscapeNl&gt;
863 | &lt;EscapeCr&gt;
864 | &lt;EscapeTab&gt;
865 | ~[&quot;`&quot;,&quot;\\&quot;])*
866 &quot;`&quot;
867</pre></div></div>
868
869<p>A variable in a query can be bound to any legal data model value. A variable reference refers to the value to which an in-scope variable is bound. (E.g., a variable binding may originate from one of the <tt>FROM</tt>, <tt>WITH</tt> or <tt>LET</tt> clauses of a <tt>SELECT</tt> statement or from an input parameter in the context of a function body.) Backticks, for example, `id`, are used for delimited identifiers. Delimiting is needed when a variable&#x2019;s desired name clashes with a keyword or includes characters not allowed in regular identifiers. More information on exactly how variable references are resolved can be found in the appendix section on Variable Resolution.</p>
870<div class="section">
871<div class="section">
872<h5><a name="Examples"></a>Examples</h5>
873
874<div>
875<div>
876<pre class="source">tweet
877id
878`SELECT`
879`my-function`
880</pre></div></div>
881</div></div></div>
882<div class="section">
883<h3><a name="Parameter_References"></a><a name="Parameter_references" id="Parameter_references">Parameter References</a></h3>
884
885<div>
886<div>
887<pre class="source">ParameterReference ::= NamedParameterReference | PositionalParameterReference
888NamedParameterReference ::= &quot;$&quot; (&lt;IDENTIFIER&gt; | &lt;DelimitedIdentifier&gt;)
889PositionalParameterReference ::= (&quot;$&quot; &lt;DIGITS&gt;) | &quot;?&quot;
890</pre></div></div>
891
892<p>A statement parameter is an external variable which value is provided through the <a href="../api.html#queryservice">statement execution API</a>. An error will be raised if the parameter is not bound at the query execution time. Positional parameter numbering starts at 1. &#x201c;?&#x201d; parameters are interpreted as $1, .. $N in the order in which they appear in the statement.</p>
893<div class="section">
894<div class="section">
895<h5><a name="Examples"></a>Examples</h5>
896
897<div>
898<div>
899<pre class="source">$id
900$1
901?
902</pre></div></div>
903</div></div></div>
904<div class="section">
905<h3><a name="Parenthesized_Expressions"></a><a name="Parenthesized_expressions" id="Parenthesized_expressions">Parenthesized Expressions</a></h3>
906
907<div>
908<div>
909<pre class="source">ParenthesizedExpression ::= &quot;(&quot; Expression &quot;)&quot; | Subquery
910</pre></div></div>
911
912<p>An expression can be parenthesized to control the precedence order or otherwise clarify a query. For composability, a subquery is also an parenthesized expression.</p>
913<p>The following expression evaluates to the value 2.</p>
914<div class="section">
915<div class="section">
916<h5><a name="Example"></a>Example</h5>
917
918<div>
919<div>
920<pre class="source">( 1 + 1 )
921</pre></div></div>
922</div></div></div>
923<div class="section">
924<h3><a name="Function_Call_Expressions"></a><a name="Function_call_expressions" id="Function_call_expressions">Function Call Expressions</a></h3>
925
926<div>
927<div>
928<pre class="source">FunctionCallExpression ::= ( FunctionName &quot;(&quot; ( Expression ( &quot;,&quot; Expression )* )? &quot;)&quot; ) | WindowFunctionCall
929</pre></div></div>
930
931<p>Functions are included in the query language, like most languages, as a way to package useful functionality or to componentize complicated or reusable computations. A function call is a legal query expression that represents the value resulting from the evaluation of its body expression with the given parameter bindings; the parameter value bindings can themselves be any expressions in the query language.</p>
932<p>Note that Window functions, and aggregate functions used as window functions, have a more complex syntax. Window function calls are described in the section on <a href="#Over_clauses">OVER Clauses</a>.</p>
933<p>The following example is a (built-in) function call expression whose value is 8.</p>
934<div class="section">
935<div class="section">
936<h5><a name="Example"></a>Example</h5>
937
938<div>
939<div>
940<pre class="source">length('a string')
941</pre></div></div>
942</div></div></div></div>
943<div class="section">
944<h2><a name="Case_Expressions"></a><a name="Case_expressions" id="Case_expressions">Case Expressions</a></h2>
945
946<div>
947<div>
948<pre class="source">CaseExpression ::= SimpleCaseExpression | SearchedCaseExpression
949SimpleCaseExpression ::= &lt;CASE&gt; Expression ( &lt;WHEN&gt; Expression &lt;THEN&gt; Expression )+ ( &lt;ELSE&gt; Expression )? &lt;END&gt;
950SearchedCaseExpression ::= &lt;CASE&gt; ( &lt;WHEN&gt; Expression &lt;THEN&gt; Expression )+ ( &lt;ELSE&gt; Expression )? &lt;END&gt;
951</pre></div></div>
952
953<p>In a simple <tt>CASE</tt> expression, the query evaluator searches for the first <tt>WHEN</tt> &#x2026; <tt>THEN</tt> pair in which the <tt>WHEN</tt> expression is equal to the expression following <tt>CASE</tt> and returns the expression following <tt>THEN</tt>. If none of the <tt>WHEN</tt> &#x2026; <tt>THEN</tt> pairs meet this condition, and an <tt>ELSE</tt> branch exists, it returns the <tt>ELSE</tt> expression. Otherwise, <tt>NULL</tt> is returned.</p>
954<p>In a searched CASE expression, the query evaluator searches from left to right until it finds a <tt>WHEN</tt> expression that is evaluated to <tt>TRUE</tt>, and then returns its corresponding <tt>THEN</tt> expression. If no condition is found to be <tt>TRUE</tt>, and an <tt>ELSE</tt> branch exists, it returns the <tt>ELSE</tt> expression. Otherwise, it returns <tt>NULL</tt>.</p>
955<p>The following example illustrates the form of a case expression.</p>
956<div class="section">
957<div class="section">
958<div class="section">
959<h5><a name="Example"></a>Example</h5>
960
961<div>
962<div>
963<pre class="source">CASE (2 &lt; 3) WHEN true THEN &quot;yes&quot; ELSE &quot;no&quot; END
964</pre></div></div>
965</div></div></div>
966<div class="section">
967<h3><a name="Constructors" id="Constructors">Constructors</a></h3>
968
969<div>
970<div>
971<pre class="source">Constructor ::= ArrayConstructor | MultisetConstructor | ObjectConstructor
972ArrayConstructor ::= &quot;[&quot; ( Expression ( &quot;,&quot; Expression )* )? &quot;]&quot;
973MultisetConstructor ::= &quot;{{&quot; ( Expression ( &quot;,&quot; Expression )* )? &quot;}}&quot;
974ObjectConstructor ::= &quot;{&quot; ( FieldBinding ( &quot;,&quot; FieldBinding )* )? &quot;}&quot;
975FieldBinding ::= Expression ( &quot;:&quot; Expression )?
976</pre></div></div>
977
978<p>A major feature of the query language is its ability to construct new data model instances. This is accomplished using its constructors for each of the model&#x2019;s complex object structures, namely arrays, multisets, and objects. Arrays are like JSON arrays, while multisets have bag semantics. Objects are built from fields that are field-name/field-value pairs, again like JSON.</p>
979<p>The following examples illustrate how to construct a new array with 4 items and a new object with 2 fields respectively. Array elements can be homogeneous (as in the first example), which is the common case, or they may be heterogeneous (as in the second example). The data values and field name values used to construct arrays, multisets, and objects in constructors are all simply query expressions. Thus, the collection elements, field names, and field values used in constructors can be simple literals or they can come from query variable references or even arbitrarily complex query expressions (subqueries). Type errors will be raised if the field names in an object are not strings, and duplicate field errors will be raised if they are not distinct.</p>
980<div class="section">
981<div class="section">
982<h5><a name="Examples"></a>Examples</h5>
983
984<div>
985<div>
986<pre class="source">[ 'a', 'b', 'c', 'c' ]
987
988[ 42, &quot;forty-two!&quot;, { &quot;rank&quot; : &quot;Captain&quot;, &quot;name&quot;: &quot;America&quot; }, 3.14159 ]
989
990{
991 'project name': 'Hyracks',
992 'project members': [ 'vinayakb', 'dtabass', 'chenli', 'tsotras', 'tillw' ]
993}
994</pre></div></div>
995
996<p>If only one expression is specified instead of the field-name/field-value pair in an object constructor then this expression is supposed to provide the field value. The field name is then automatically generated based on the kind of the value expression:</p>
997<ul>
998
999<li>If it is a variable reference expression then generated field name is the name of that variable.</li>
1000<li>If it is a field access expression then generated field name is the last identifier in that expression.</li>
1001<li>For all other cases, a compilation error will be raised.</li>
1002</ul></div>
1003<div class="section">
1004<h5><a name="Example"></a>Example</h5>
1005
1006<div>
1007<div>
1008<pre class="source">SELECT VALUE { user.alias, user.userSince }
1009FROM GleambookUsers user
1010WHERE user.id = 1;
1011</pre></div></div>
1012
1013<p>This query outputs:</p>
1014
1015<div>
1016<div>
1017<pre class="source">[ {
1018 &quot;alias&quot;: &quot;Margarita&quot;,
1019 &quot;userSince&quot;: &quot;2012-08-20T10:10:00&quot;
1020} ]
1021</pre></div></div>
1022<!--
1023 ! Licensed to the Apache Software Foundation (ASF) under one
1024 ! or more contributor license agreements. See the NOTICE file
1025 ! distributed with this work for additional information
1026 ! regarding copyright ownership. The ASF licenses this file
1027 ! to you under the Apache License, Version 2.0 (the
1028 ! "License"); you may not use this file except in compliance
1029 ! with the License. You may obtain a copy of the License at
1030 !
1031 ! http://www.apache.org/licenses/LICENSE-2.0
1032 !
1033 ! Unless required by applicable law or agreed to in writing,
1034 ! software distributed under the License is distributed on an
1035 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
1036 ! KIND, either express or implied. See the License for the
1037 ! specific language governing permissions and limitations
1038 ! under the License.
1039 !-->
1040
1041<h1><a name="Queries" id="Queries">3. Queries</a></h1>
1042<p>A query can be any legal expression or <tt>SELECT</tt> statement. A query always ends with a semicolon.</p>
1043
1044<div>
1045<div>
1046<pre class="source">Query ::= (Expression | SelectStatement) &quot;;&quot;
1047</pre></div></div>
1048<!--
1049 ! Licensed to the Apache Software Foundation (ASF) under one
1050 ! or more contributor license agreements. See the NOTICE file
1051 ! distributed with this work for additional information
1052 ! regarding copyright ownership. The ASF licenses this file
1053 ! to you under the Apache License, Version 2.0 (the
1054 ! "License"); you may not use this file except in compliance
1055 ! with the License. You may obtain a copy of the License at
1056 !
1057 ! http://www.apache.org/licenses/LICENSE-2.0
1058 !
1059 ! Unless required by applicable law or agreed to in writing,
1060 ! software distributed under the License is distributed on an
1061 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
1062 ! KIND, either express or implied. See the License for the
1063 ! specific language governing permissions and limitations
1064 ! under the License.
1065 !-->
1066</div></div></div></div>
1067<div class="section">
1068<h2><a name="Declarations" id="Declarations">Declarations</a></h2>
1069
1070<div>
1071<div>
1072<pre class="source">DatabaseDeclaration ::= &quot;USE&quot; Identifier
1073</pre></div></div>
1074
1075<p>At the uppermost level, the world of data is organized into data namespaces called <b>dataverses</b>. To set the default dataverse for statements, the USE statement is provided.</p>
1076<p>As an example, the following statement sets the default dataverse to be &#x201c;TinySocial&#x201d;.</p>
1077<div class="section">
1078<div class="section">
1079<div class="section">
1080<h5><a name="Example"></a>Example</h5>
1081
1082<div>
1083<div>
1084<pre class="source">USE TinySocial;
1085</pre></div></div>
1086<!--
1087 ! Licensed to the Apache Software Foundation (ASF) under one
1088 ! or more contributor license agreements. See the NOTICE file
1089 ! distributed with this work for additional information
1090 ! regarding copyright ownership. The ASF licenses this file
1091 ! to you under the Apache License, Version 2.0 (the
1092 ! "License"); you may not use this file except in compliance
1093 ! with the License. You may obtain a copy of the License at
1094 !
1095 ! http://www.apache.org/licenses/LICENSE-2.0
1096 !
1097 ! Unless required by applicable law or agreed to in writing,
1098 ! software distributed under the License is distributed on an
1099 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
1100 ! KIND, either express or implied. See the License for the
1101 ! specific language governing permissions and limitations
1102 ! under the License.
1103 !-->
1104
1105<p>When writing a complex query, it can sometimes be helpful to define one or more auxilliary functions that each address a sub-piece of the overall query. The declare function statement supports the creation of such helper functions. In general, the function body (expression) can be any legal query expression.</p>
1106
1107<div>
1108<div>
1109<pre class="source">FunctionDeclaration ::= &quot;DECLARE&quot; &quot;FUNCTION&quot; Identifier ParameterList &quot;{&quot; Expression &quot;}&quot;
1110ParameterList ::= &quot;(&quot; ( &lt;VARIABLE&gt; ( &quot;,&quot; &lt;VARIABLE&gt; )* )? &quot;)&quot;
1111</pre></div></div>
1112
1113<p>The following is a simple example of a temporary function definition and its use.</p></div>
1114<div class="section">
1115<h5><a name="Example"></a>Example</h5>
1116
1117<div>
1118<div>
1119<pre class="source">DECLARE FUNCTION friendInfo(userId) {
1120 (SELECT u.id, u.name, len(u.friendIds) AS friendCount
1121 FROM GleambookUsers u
1122 WHERE u.id = userId)[0]
1123 };
1124
1125SELECT VALUE friendInfo(2);
1126</pre></div></div>
1127
1128<p>For our sample data set, this returns:</p>
1129
1130<div>
1131<div>
1132<pre class="source">[
1133 { &quot;id&quot;: 2, &quot;name&quot;: &quot;IsbelDull&quot;, &quot;friendCount&quot;: 2 }
1134]
1135</pre></div></div>
1136<!--
1137 ! Licensed to the Apache Software Foundation (ASF) under one
1138 ! or more contributor license agreements. See the NOTICE file
1139 ! distributed with this work for additional information
1140 ! regarding copyright ownership. The ASF licenses this file
1141 ! to you under the Apache License, Version 2.0 (the
1142 ! "License"); you may not use this file except in compliance
1143 ! with the License. You may obtain a copy of the License at
1144 !
1145 ! http://www.apache.org/licenses/LICENSE-2.0
1146 !
1147 ! Unless required by applicable law or agreed to in writing,
1148 ! software distributed under the License is distributed on an
1149 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
1150 ! KIND, either express or implied. See the License for the
1151 ! specific language governing permissions and limitations
1152 ! under the License.
1153 !-->
1154</div></div></div></div>
1155<div class="section">
1156<h2><a name="SELECT_Statements"></a><a name="SELECT_statements" id="SELECT_statements">SELECT Statements</a></h2>
1157<p>The following shows the (rich) grammar for the <tt>SELECT</tt> statement in the query language.</p>
1158
1159<div>
1160<div>
1161<pre class="source">SelectStatement ::= ( WithClause )?
1162 SelectSetOperation (OrderbyClause )? ( LimitClause )?
1163SelectSetOperation ::= SelectBlock (&lt;UNION&gt; &lt;ALL&gt; ( SelectBlock | Subquery ) )*
1164Subquery ::= &quot;(&quot; SelectStatement &quot;)&quot;
1165
1166SelectBlock ::= SelectClause
1167 ( FromClause ( LetClause )?)?
1168 ( WhereClause )?
1169 ( GroupbyClause ( LetClause )? ( HavingClause )? )?
1170 |
1171 FromClause ( LetClause )?
1172 ( WhereClause )?
1173 ( GroupbyClause ( LetClause )? ( HavingClause )? )?
1174 SelectClause
1175
1176SelectClause ::= &lt;SELECT&gt; ( &lt;ALL&gt; | &lt;DISTINCT&gt; )? ( SelectRegular | SelectValue )
1177SelectRegular ::= Projection ( &quot;,&quot; Projection )*
1178SelectValue ::= ( &lt;VALUE&gt; | &lt;ELEMENT&gt; | &lt;RAW&gt; ) Expression
1179Projection ::= ( Expression ( &lt;AS&gt; )? Identifier | &quot;*&quot; | Identifier &quot;.&quot; &quot;*&quot; )
1180
1181FromClause ::= &lt;FROM&gt; FromTerm ( &quot;,&quot; FromTerm )*
1182FromTerm ::= Expression (( &lt;AS&gt; )? Variable)?
1183 ( ( JoinType )? ( JoinClause | UnnestClause ) )*
1184
1185JoinClause ::= &lt;JOIN&gt; Expression (( &lt;AS&gt; )? Variable)? &lt;ON&gt; Expression
1186UnnestClause ::= ( &lt;UNNEST&gt; ) Expression
1187 ( &lt;AS&gt; )? Variable ( &lt;AT&gt; Variable )?
1188JoinType ::= ( &lt;INNER&gt; | &lt;LEFT&gt; ( &lt;OUTER&gt; )? )
1189
1190WithClause ::= &lt;WITH&gt; WithElement ( &quot;,&quot; WithElement )*
1191LetClause ::= (&lt;LET&gt; | &lt;LETTING&gt;) LetElement ( &quot;,&quot; LetElement )*
1192LetElement ::= Variable &quot;=&quot; Expression
1193WithElement ::= Variable &lt;AS&gt; Expression
1194
1195WhereClause ::= &lt;WHERE&gt; Expression
1196
1197GroupbyClause ::= &lt;GROUP&gt; &lt;BY&gt; Expression ( ( (&lt;AS&gt;)? Variable )?
1198 ( &quot;,&quot; Expression ( (&lt;AS&gt;)? Variable )? )* )
1199 ( &lt;GROUP&gt; &lt;AS&gt; Variable
1200 (&quot;(&quot; VariableReference &lt;AS&gt; Identifier
1201 (&quot;,&quot; VariableReference &lt;AS&gt; Identifier )* &quot;)&quot;)?
1202 )?
1203HavingClause ::= &lt;HAVING&gt; Expression
1204
1205OrderbyClause ::= &lt;ORDER&gt; &lt;BY&gt; Expression ( &lt;ASC&gt; | &lt;DESC&gt; )?
1206 ( &quot;,&quot; Expression ( &lt;ASC&gt; | &lt;DESC&gt; )? )*
1207LimitClause ::= &lt;LIMIT&gt; Expression ( &lt;OFFSET&gt; Expression )?
1208</pre></div></div>
1209
1210<p>In this section, we will make use of two stored collections of objects (datasets), <tt>GleambookUsers</tt> and <tt>GleambookMessages</tt>, in a series of running examples to explain <tt>SELECT</tt> queries. The contents of the example collections are as follows:</p>
1211<p><tt>GleambookUsers</tt> collection (or, dataset):</p>
1212
1213<div>
1214<div>
1215<pre class="source">[ {
1216 &quot;id&quot;:1,
1217 &quot;alias&quot;:&quot;Margarita&quot;,
1218 &quot;name&quot;:&quot;MargaritaStoddard&quot;,
1219 &quot;nickname&quot;:&quot;Mags&quot;,
1220 &quot;userSince&quot;:&quot;2012-08-20T10:10:00&quot;,
1221 &quot;friendIds&quot;:[2,3,6,10],
1222 &quot;employment&quot;:[{
1223 &quot;organizationName&quot;:&quot;Codetechno&quot;,
1224 &quot;start-date&quot;:&quot;2006-08-06&quot;
1225 },
1226 {
1227 &quot;organizationName&quot;:&quot;geomedia&quot;,
1228 &quot;start-date&quot;:&quot;2010-06-17&quot;,
1229 &quot;end-date&quot;:&quot;2010-01-26&quot;
1230 }],
1231 &quot;gender&quot;:&quot;F&quot;
1232},
1233{
1234 &quot;id&quot;:2,
1235 &quot;alias&quot;:&quot;Isbel&quot;,
1236 &quot;name&quot;:&quot;IsbelDull&quot;,
1237 &quot;nickname&quot;:&quot;Izzy&quot;,
1238 &quot;userSince&quot;:&quot;2011-01-22T10:10:00&quot;,
1239 &quot;friendIds&quot;:[1,4],
1240 &quot;employment&quot;:[{
1241 &quot;organizationName&quot;:&quot;Hexviafind&quot;,
1242 &quot;startDate&quot;:&quot;2010-04-27&quot;
1243 }]
1244},
1245{
1246 &quot;id&quot;:3,
1247 &quot;alias&quot;:&quot;Emory&quot;,
1248 &quot;name&quot;:&quot;EmoryUnk&quot;,
1249 &quot;userSince&quot;:&quot;2012-07-10T10:10:00&quot;,
1250 &quot;friendIds&quot;:[1,5,8,9],
1251 &quot;employment&quot;:[{
1252 &quot;organizationName&quot;:&quot;geomedia&quot;,
1253 &quot;startDate&quot;:&quot;2010-06-17&quot;,
1254 &quot;endDate&quot;:&quot;2010-01-26&quot;
1255 }]
1256} ]
1257</pre></div></div>
1258
1259<p><tt>GleambookMessages</tt> collection (or, dataset):</p>
1260
1261<div>
1262<div>
1263<pre class="source">[ {
1264 &quot;messageId&quot;:2,
1265 &quot;authorId&quot;:1,
1266 &quot;inResponseTo&quot;:4,
1267 &quot;senderLocation&quot;:[41.66,80.87],
1268 &quot;message&quot;:&quot; dislike x-phone its touch-screen is horrible&quot;
1269},
1270{
1271 &quot;messageId&quot;:3,
1272 &quot;authorId&quot;:2,
1273 &quot;inResponseTo&quot;:4,
1274 &quot;senderLocation&quot;:[48.09,81.01],
1275 &quot;message&quot;:&quot; like product-y the plan is amazing&quot;
1276},
1277{
1278 &quot;messageId&quot;:4,
1279 &quot;authorId&quot;:1,
1280 &quot;inResponseTo&quot;:2,
1281 &quot;senderLocation&quot;:[37.73,97.04],
1282 &quot;message&quot;:&quot; can't stand acast the network is horrible:(&quot;
1283},
1284{
1285 &quot;messageId&quot;:6,
1286 &quot;authorId&quot;:2,
1287 &quot;inResponseTo&quot;:1,
1288 &quot;senderLocation&quot;:[31.5,75.56],
1289 &quot;message&quot;:&quot; like product-z its platform is mind-blowing&quot;
1290}
1291{
1292 &quot;messageId&quot;:8,
1293 &quot;authorId&quot;:1,
1294 &quot;inResponseTo&quot;:11,
1295 &quot;senderLocation&quot;:[40.33,80.87],
1296 &quot;message&quot;:&quot; like ccast the 3G is awesome:)&quot;
1297},
1298{
1299 &quot;messageId&quot;:10,
1300 &quot;authorId&quot;:1,
1301 &quot;inResponseTo&quot;:12,
1302 &quot;senderLocation&quot;:[42.5,70.01],
1303 &quot;message&quot;:&quot; can't stand product-w the touch-screen is terrible&quot;
1304},
1305{
1306 &quot;messageId&quot;:11,
1307 &quot;authorId&quot;:1,
1308 &quot;inResponseTo&quot;:1,
1309 &quot;senderLocation&quot;:[38.97,77.49],
1310 &quot;message&quot;:&quot; can't stand acast its plan is terrible&quot;
1311} ]
1312</pre></div></div>
1313</div>
1314<div class="section">
1315<h2><a name="SELECT_Clause"></a><a name="Select_clauses" id="Select_clauses">SELECT Clause</a></h2>
1316<p>The <tt>SELECT</tt> clause always returns a collection value as its result (even if the result is empty or a singleton).</p>
1317<div class="section">
1318<h3><a name="Select_Element.2FValue.2FRaw"></a><a name="Select_element" id="Select_element">Select Element/Value/Raw</a></h3>
1319<p>The <tt>SELECT VALUE</tt> clause returns an array or multiset that contains the results of evaluating the <tt>VALUE</tt> expression, with one evaluation being performed per &#x201c;binding tuple&#x201d; (i.e., per <tt>FROM</tt> clause item) satisfying the statement&#x2019;s selection criteria. For historical reasons the query language also allows the keywords <tt>ELEMENT</tt> or <tt>RAW</tt> to be used in place of <tt>VALUE</tt> (not recommended).</p>
1320<p>If there is no FROM clause, the expression after <tt>VALUE</tt> is evaluated once with no binding tuples (except those inherited from an outer environment).</p>
1321<div class="section">
1322<div class="section">
1323<h5><a name="Example"></a>Example</h5>
1324
1325<div>
1326<div>
1327<pre class="source">SELECT VALUE 1;
1328</pre></div></div>
1329
1330<p>This query returns:</p>
1331
1332<div>
1333<div>
1334<pre class="source">[
1335 1
1336]
1337</pre></div></div>
1338
1339<p>The following example shows a query that selects one user from the GleambookUsers collection.</p></div>
1340<div class="section">
1341<h5><a name="Example"></a>Example</h5>
1342
1343<div>
1344<div>
1345<pre class="source">SELECT VALUE user
1346FROM GleambookUsers user
1347WHERE user.id = 1;
1348</pre></div></div>
1349
1350<p>This query returns:</p>
1351
1352<div>
1353<div>
1354<pre class="source">[{
1355 &quot;userSince&quot;: &quot;2012-08-20T10:10:00.000Z&quot;,
1356 &quot;friendIds&quot;: [
1357 2,
1358 3,
1359 6,
1360 10
1361 ],
1362 &quot;gender&quot;: &quot;F&quot;,
1363 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1364 &quot;nickname&quot;: &quot;Mags&quot;,
1365 &quot;alias&quot;: &quot;Margarita&quot;,
1366 &quot;id&quot;: 1,
1367 &quot;employment&quot;: [
1368 {
1369 &quot;organizationName&quot;: &quot;Codetechno&quot;,
1370 &quot;start-date&quot;: &quot;2006-08-06&quot;
1371 },
1372 {
1373 &quot;end-date&quot;: &quot;2010-01-26&quot;,
1374 &quot;organizationName&quot;: &quot;geomedia&quot;,
1375 &quot;start-date&quot;: &quot;2010-06-17&quot;
1376 }
1377 ]
1378} ]
1379</pre></div></div>
1380</div></div></div>
1381<div class="section">
1382<h3><a name="SQL-style_SELECT"></a><a name="SQL_select" id="SQL_select">SQL-style SELECT</a></h3>
1383<p>The traditional SQL-style <tt>SELECT</tt> syntax is also supported in the query language. This syntax can also be reformulated in a <tt>SELECT VALUE</tt> based manner. (E.g., <tt>SELECT expA AS fldA, expB AS fldB</tt> is syntactic sugar for <tt>SELECT VALUE { 'fldA': expA, 'fldB': expB }</tt>.) Unlike in SQL, the result of a query does not preserve the order of expressions in the <tt>SELECT</tt> clause.</p>
1384<div class="section">
1385<div class="section">
1386<h5><a name="Example"></a>Example</h5>
1387
1388<div>
1389<div>
1390<pre class="source">SELECT user.alias user_alias, user.name user_name
1391FROM GleambookUsers user
1392WHERE user.id = 1;
1393</pre></div></div>
1394
1395<p>Returns:</p>
1396
1397<div>
1398<div>
1399<pre class="source">[ {
1400 &quot;user_name&quot;: &quot;MargaritaStoddard&quot;,
1401 &quot;user_alias&quot;: &quot;Margarita&quot;
1402} ]
1403</pre></div></div>
1404</div></div></div>
1405<div class="section">
1406<h3><a name="SELECT_.2A"></a><a name="Select_star" id="Select_star">SELECT *</a></h3>
1407<p><tt>SELECT *</tt> returns an object with a nested field for each input tuple. Each field has as its field name the name of a binding variable generated by either the <tt>FROM</tt> clause or <tt>GROUP BY</tt> clause in the current enclosing <tt>SELECT</tt> statement, and its field value is the value of that binding variable.</p>
1408<p>Note that the result of <tt>SELECT *</tt> is different from the result of query that selects all the fields of an object.</p>
1409<div class="section">
1410<div class="section">
1411<h5><a name="Example"></a>Example</h5>
1412
1413<div>
1414<div>
1415<pre class="source">SELECT *
1416FROM GleambookUsers user;
1417</pre></div></div>
1418
1419<p>Since <tt>user</tt> is the only binding variable generated in the <tt>FROM</tt> clause, this query returns:</p>
1420
1421<div>
1422<div>
1423<pre class="source">[ {
1424 &quot;user&quot;: {
1425 &quot;userSince&quot;: &quot;2012-08-20T10:10:00.000Z&quot;,
1426 &quot;friendIds&quot;: [
1427 2,
1428 3,
1429 6,
1430 10
1431 ],
1432 &quot;gender&quot;: &quot;F&quot;,
1433 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1434 &quot;nickname&quot;: &quot;Mags&quot;,
1435 &quot;alias&quot;: &quot;Margarita&quot;,
1436 &quot;id&quot;: 1,
1437 &quot;employment&quot;: [
1438 {
1439 &quot;organizationName&quot;: &quot;Codetechno&quot;,
1440 &quot;start-date&quot;: &quot;2006-08-06&quot;
1441 },
1442 {
1443 &quot;end-date&quot;: &quot;2010-01-26&quot;,
1444 &quot;organizationName&quot;: &quot;geomedia&quot;,
1445 &quot;start-date&quot;: &quot;2010-06-17&quot;
1446 }
1447 ]
1448 }
1449}, {
1450 &quot;user&quot;: {
1451 &quot;userSince&quot;: &quot;2011-01-22T10:10:00.000Z&quot;,
1452 &quot;friendIds&quot;: [
1453 1,
1454 4
1455 ],
1456 &quot;name&quot;: &quot;IsbelDull&quot;,
1457 &quot;nickname&quot;: &quot;Izzy&quot;,
1458 &quot;alias&quot;: &quot;Isbel&quot;,
1459 &quot;id&quot;: 2,
1460 &quot;employment&quot;: [
1461 {
1462 &quot;organizationName&quot;: &quot;Hexviafind&quot;,
1463 &quot;startDate&quot;: &quot;2010-04-27&quot;
1464 }
1465 ]
1466 }
1467}, {
1468 &quot;user&quot;: {
1469 &quot;userSince&quot;: &quot;2012-07-10T10:10:00.000Z&quot;,
1470 &quot;friendIds&quot;: [
1471 1,
1472 5,
1473 8,
1474 9
1475 ],
1476 &quot;name&quot;: &quot;EmoryUnk&quot;,
1477 &quot;alias&quot;: &quot;Emory&quot;,
1478 &quot;id&quot;: 3,
1479 &quot;employment&quot;: [
1480 {
1481 &quot;organizationName&quot;: &quot;geomedia&quot;,
1482 &quot;endDate&quot;: &quot;2010-01-26&quot;,
1483 &quot;startDate&quot;: &quot;2010-06-17&quot;
1484 }
1485 ]
1486 }
1487} ]
1488</pre></div></div>
1489</div>
1490<div class="section">
1491<h5><a name="Example"></a>Example</h5>
1492
1493<div>
1494<div>
1495<pre class="source">SELECT *
1496FROM GleambookUsers u, GleambookMessages m
1497WHERE m.authorId = u.id and u.id = 2;
1498</pre></div></div>
1499
1500<p>This query does an inner join that we will discuss in <a href="#Multiple_from_terms">multiple from terms</a>. Since both <tt>u</tt> and <tt>m</tt> are binding variables generated in the <tt>FROM</tt> clause, this query returns:</p>
1501
1502<div>
1503<div>
1504<pre class="source">[ {
1505 &quot;u&quot;: {
1506 &quot;userSince&quot;: &quot;2011-01-22T10:10:00&quot;,
1507 &quot;friendIds&quot;: [
1508 1,
1509 4
1510 ],
1511 &quot;name&quot;: &quot;IsbelDull&quot;,
1512 &quot;nickname&quot;: &quot;Izzy&quot;,
1513 &quot;alias&quot;: &quot;Isbel&quot;,
1514 &quot;id&quot;: 2,
1515 &quot;employment&quot;: [
1516 {
1517 &quot;organizationName&quot;: &quot;Hexviafind&quot;,
1518 &quot;startDate&quot;: &quot;2010-04-27&quot;
1519 }
1520 ]
1521 },
1522 &quot;m&quot;: {
1523 &quot;senderLocation&quot;: [
1524 31.5,
1525 75.56
1526 ],
1527 &quot;inResponseTo&quot;: 1,
1528 &quot;messageId&quot;: 6,
1529 &quot;authorId&quot;: 2,
1530 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
1531 }
1532}, {
1533 &quot;u&quot;: {
1534 &quot;userSince&quot;: &quot;2011-01-22T10:10:00&quot;,
1535 &quot;friendIds&quot;: [
1536 1,
1537 4
1538 ],
1539 &quot;name&quot;: &quot;IsbelDull&quot;,
1540 &quot;nickname&quot;: &quot;Izzy&quot;,
1541 &quot;alias&quot;: &quot;Isbel&quot;,
1542 &quot;id&quot;: 2,
1543 &quot;employment&quot;: [
1544 {
1545 &quot;organizationName&quot;: &quot;Hexviafind&quot;,
1546 &quot;startDate&quot;: &quot;2010-04-27&quot;
1547 }
1548 ]
1549 },
1550 &quot;m&quot;: {
1551 &quot;senderLocation&quot;: [
1552 48.09,
1553 81.01
1554 ],
1555 &quot;inResponseTo&quot;: 4,
1556 &quot;messageId&quot;: 3,
1557 &quot;authorId&quot;: 2,
1558 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
1559 }
1560} ]
1561</pre></div></div>
1562</div></div></div>
1563<div class="section">
1564<h3><a name="SELECT_variable..2A"></a><a name="Select_variable_star" id="Select_variable_star">SELECT <i>variable</i>.*</a></h3>
1565<p>Whereas <tt>SELECT *</tt> returns all the fields bound to all the variables which are currently defined, the notation <tt>SELECT c.*</tt> returns all the fields of the object bound to variable <tt>c</tt>. The variable <tt>c</tt> must be bound to an object for this to work.</p>
1566<div class="section">
1567<div class="section">
1568<h5><a name="Example"></a>Example</h5>
1569
1570<div>
1571<div>
1572<pre class="source">SELECT user.*
1573FROM GleambookUsers user;
1574</pre></div></div>
1575
1576<p>Compare this query with the first example given under <a href="#Select_star">SELECT *</a>. This query returns all users from the <tt>GleambookUsers</tt> dataset, but the <tt>user</tt> variable name is omitted from the results:</p>
1577
1578<div>
1579<div>
1580<pre class="source">[
1581 {
1582 &quot;id&quot;: 1,
1583 &quot;alias&quot;: &quot;Margarita&quot;,
1584 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1585 &quot;nickname&quot;: &quot;Mags&quot;,
1586 &quot;userSince&quot;: &quot;2012-08-20T10:10:00&quot;,
1587 &quot;friendIds&quot;: [
1588 2,
1589 3,
1590 6,
1591 10
1592 ],
1593 &quot;employment&quot;: [
1594 {
1595 &quot;organizationName&quot;: &quot;Codetechno&quot;,
1596 &quot;start-date&quot;: &quot;2006-08-06&quot;
1597 },
1598 {
1599 &quot;organizationName&quot;: &quot;geomedia&quot;,
1600 &quot;start-date&quot;: &quot;2010-06-17&quot;,
1601 &quot;end-date&quot;: &quot;2010-01-26&quot;
1602 }
1603 ],
1604 &quot;gender&quot;: &quot;F&quot;
1605 },
1606 {
1607 &quot;id&quot;: 2,
1608 &quot;alias&quot;: &quot;Isbel&quot;,
1609 &quot;name&quot;: &quot;IsbelDull&quot;,
1610 &quot;nickname&quot;: &quot;Izzy&quot;,
1611 &quot;userSince&quot;: &quot;2011-01-22T10:10:00&quot;,
1612 &quot;friendIds&quot;: [
1613 1,
1614 4
1615 ],
1616 &quot;employment&quot;: [
1617 {
1618 &quot;organizationName&quot;: &quot;Hexviafind&quot;,
1619 &quot;startDate&quot;: &quot;2010-04-27&quot;
1620 }
1621 ]
1622 },
1623 {
1624 &quot;id&quot;: 3,
1625 &quot;alias&quot;: &quot;Emory&quot;,
1626 &quot;name&quot;: &quot;EmoryUnk&quot;,
1627 &quot;userSince&quot;: &quot;2012-07-10T10:10:00&quot;,
1628 &quot;friendIds&quot;: [
1629 1,
1630 5,
1631 8,
1632 9
1633 ],
1634 &quot;employment&quot;: [
1635 {
1636 &quot;organizationName&quot;: &quot;geomedia&quot;,
1637 &quot;startDate&quot;: &quot;2010-06-17&quot;,
1638 &quot;endDate&quot;: &quot;2010-01-26&quot;
1639 }
1640 ]
1641 }
1642]
1643</pre></div></div>
1644</div></div></div>
1645<div class="section">
1646<h3><a name="SELECT_DISTINCT"></a><a name="Select_distinct" id="Select_distinct">SELECT DISTINCT</a></h3>
1647<p>The <tt>DISTINCT</tt> keyword is used to eliminate duplicate items in results. The following example shows how it works.</p>
1648<div class="section">
1649<div class="section">
1650<h5><a name="Example"></a>Example</h5>
1651
1652<div>
1653<div>
1654<pre class="source">SELECT DISTINCT * FROM [1, 2, 2, 3] AS foo;
1655</pre></div></div>
1656
1657<p>This query returns:</p>
1658
1659<div>
1660<div>
1661<pre class="source">[ {
1662 &quot;foo&quot;: 1
1663}, {
1664 &quot;foo&quot;: 2
1665}, {
1666 &quot;foo&quot;: 3
1667} ]
1668</pre></div></div>
1669</div>
1670<div class="section">
1671<h5><a name="Example"></a>Example</h5>
1672
1673<div>
1674<div>
1675<pre class="source">SELECT DISTINCT VALUE foo FROM [1, 2, 2, 3] AS foo;
1676</pre></div></div>
1677
1678<p>This version of the query returns:</p>
1679
1680<div>
1681<div>
1682<pre class="source">[ 1
1683, 2
1684, 3
1685 ]
1686</pre></div></div>
1687</div></div></div>
1688<div class="section">
1689<h3><a name="Unnamed_Projections"></a><a name="Unnamed_projections" id="Unnamed_projections">Unnamed Projections</a></h3>
1690<p>Similar to standard SQL, the query language supports unnamed projections (a.k.a, unnamed <tt>SELECT</tt> clause items), for which names are generated. Name generation has three cases:</p>
1691<ul>
1692
1693<li>If a projection expression is a variable reference expression, its generated name is the name of the variable.</li>
1694<li>If a projection expression is a field access expression, its generated name is the last identifier in the expression.</li>
1695<li>For all other cases, the query processor will generate a unique name.</li>
1696</ul>
1697<div class="section">
1698<div class="section">
1699<h5><a name="Example"></a>Example</h5>
1700
1701<div>
1702<div>
1703<pre class="source">SELECT substr(user.name, 10), user.alias
1704FROM GleambookUsers user
1705WHERE user.id = 1;
1706</pre></div></div>
1707
1708<p>This query outputs:</p>
1709
1710<div>
1711<div>
1712<pre class="source">[ {
1713 &quot;alias&quot;: &quot;Margarita&quot;,
1714 &quot;$1&quot;: &quot;Stoddard&quot;
1715} ]
1716</pre></div></div>
1717
1718<p>In the result, <tt>$1</tt> is the generated name for <tt>substr(user.name, 1)</tt>, while <tt>alias</tt> is the generated name for <tt>user.alias</tt>.</p></div></div></div>
1719<div class="section">
1720<h3><a name="Abbreviated_Field_Access_Expressions"></a><a name="Abbreviated_field_access_expressions" id="Abbreviated_field_access_expressions">Abbreviated Field Access Expressions</a></h3>
1721<p>As in standard SQL, field access expressions can be abbreviated (not recommended!) when there is no ambiguity. In the next example, the variable <tt>user</tt> is the only possible variable reference for fields <tt>id</tt>, <tt>name</tt> and <tt>alias</tt> and thus could be omitted in the query. More information on abbbreviated field access can be found in the appendix section on Variable Resolution.</p>
1722<div class="section">
1723<div class="section">
1724<h5><a name="Example"></a>Example</h5>
1725
1726<div>
1727<div>
1728<pre class="source">SELECT substr(name, 10) AS lname, alias
1729FROM GleambookUsers user
1730WHERE id = 1;
1731</pre></div></div>
1732
1733<p>Outputs:</p>
1734
1735<div>
1736<div>
1737<pre class="source">[ {
1738 &quot;lname&quot;: &quot;Stoddard&quot;,
1739 &quot;alias&quot;: &quot;Margarita&quot;
1740} ]
1741</pre></div></div>
1742</div></div></div></div>
1743<div class="section">
1744<h2><a name="UNNEST_Clause"></a><a name="Unnest_clauses" id="Unnest_clauses">UNNEST Clause</a></h2>
1745<p>For each of its input tuples, the <tt>UNNEST</tt> clause flattens a collection-valued expression into individual items, producing multiple tuples, each of which is one of the expression&#x2019;s original input tuples augmented with a flattened item from its collection.</p>
1746<div class="section">
1747<h3><a name="Inner_UNNEST"></a><a name="Inner_unnests" id="Inner_unnests">Inner UNNEST</a></h3>
1748<p>The following example is a query that retrieves the names of the organizations that a selected user has worked for. It uses the <tt>UNNEST</tt> clause to unnest the nested collection <tt>employment</tt> in the user&#x2019;s object.</p>
1749<div class="section">
1750<div class="section">
1751<h5><a name="Example"></a>Example</h5>
1752
1753<div>
1754<div>
1755<pre class="source">SELECT u.id AS userId, e.organizationName AS orgName
1756FROM GleambookUsers u
1757UNNEST u.employment e
1758WHERE u.id = 1;
1759</pre></div></div>
1760
1761<p>This query returns:</p>
1762
1763<div>
1764<div>
1765<pre class="source">[ {
1766 &quot;orgName&quot;: &quot;Codetechno&quot;,
1767 &quot;userId&quot;: 1
1768}, {
1769 &quot;orgName&quot;: &quot;geomedia&quot;,
1770 &quot;userId&quot;: 1
1771} ]
1772</pre></div></div>
1773
1774<p>Note that <tt>UNNEST</tt> has SQL&#x2019;s inner join semantics &#x2014; that is, if a user has no employment history, no tuple corresponding to that user will be emitted in the result.</p></div></div></div>
1775<div class="section">
1776<h3><a name="Left_Outer_UNNEST"></a><a name="Left_outer_unnests" id="Left_outer_unnests">Left Outer UNNEST</a></h3>
1777<p>As an alternative, the <tt>LEFT OUTER UNNEST</tt> clause offers SQL&#x2019;s left outer join semantics. For example, no collection-valued field named <tt>hobbies</tt> exists in the object for the user whose id is 1, but the following query&#x2019;s result still includes user 1.</p>
1778<div class="section">
1779<div class="section">
1780<h5><a name="Example"></a>Example</h5>
1781
1782<div>
1783<div>
1784<pre class="source">SELECT u.id AS userId, h.hobbyName AS hobby
1785FROM GleambookUsers u
1786LEFT OUTER UNNEST u.hobbies h
1787WHERE u.id = 1;
1788</pre></div></div>
1789
1790<p>Returns:</p>
1791
1792<div>
1793<div>
1794<pre class="source">[ {
1795 &quot;userId&quot;: 1
1796} ]
1797</pre></div></div>
1798
1799<p>Note that if <tt>u.hobbies</tt> is an empty collection or leads to a <tt>MISSING</tt> (as above) or <tt>NULL</tt> value for a given input tuple, there is no corresponding binding value for variable <tt>h</tt> for an input tuple. A <tt>MISSING</tt> value will be generated for <tt>h</tt> so that the input tuple can still be propagated.</p></div></div></div>
1800<div class="section">
1801<h3><a name="Expressing_Joins_Using_UNNEST"></a><a name="Expressing_joins_using_unnests" id="Expressing_joins_using_unnests">Expressing Joins Using UNNEST</a></h3>
1802<p>The <tt>UNNEST</tt> clause is similar to SQL&#x2019;s <tt>JOIN</tt> clause except that it allows its right argument to be correlated to its left argument, as in the examples above &#x2014; i.e., think &#x201c;correlated cross-product&#x201d;. The next example shows this via a query that joins two data sets, GleambookUsers and GleambookMessages, returning user/message pairs. The results contain one object per pair, with result objects containing the user&#x2019;s name and an entire message. The query can be thought of as saying &#x201c;for each Gleambook user, unnest the <tt>GleambookMessages</tt> collection and filter the output with the condition <tt>message.authorId = user.id</tt>&#x201d;.</p>
1803<div class="section">
1804<div class="section">
1805<h5><a name="Example"></a>Example</h5>
1806
1807<div>
1808<div>
1809<pre class="source">SELECT u.name AS uname, m.message AS message
1810FROM GleambookUsers u
1811UNNEST GleambookMessages m
1812WHERE m.authorId = u.id;
1813</pre></div></div>
1814
1815<p>This returns:</p>
1816
1817<div>
1818<div>
1819<pre class="source">[ {
1820 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1821 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
1822}, {
1823 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1824 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
1825}, {
1826 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1827 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
1828}, {
1829 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1830 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
1831}, {
1832 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1833 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
1834}, {
1835 &quot;uname&quot;: &quot;IsbelDull&quot;,
1836 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
1837}, {
1838 &quot;uname&quot;: &quot;IsbelDull&quot;,
1839 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
1840} ]
1841</pre></div></div>
1842
1843<p>Similarly, the above query can also be expressed as the <tt>UNNEST</tt>ing of a correlated subquery:</p></div>
1844<div class="section">
1845<h5><a name="Example"></a>Example</h5>
1846
1847<div>
1848<div>
1849<pre class="source">SELECT u.name AS uname, m.message AS message
1850FROM GleambookUsers u
1851UNNEST (
1852 SELECT VALUE msg
1853 FROM GleambookMessages msg
1854 WHERE msg.authorId = u.id
1855) AS m;
1856</pre></div></div>
1857</div></div></div></div>
1858<div class="section">
1859<h2><a name="FROM_clauses"></a><a name="From_clauses" id="From_clauses">FROM clauses</a></h2>
1860<p>A <tt>FROM</tt> clause is used for enumerating (i.e., conceptually iterating over) the contents of collections, as in SQL.</p>
1861<div class="section">
1862<h3><a name="Binding_expressions" id="Binding_expressions">Binding expressions</a></h3>
1863<p>In addition to stored collections, a <tt>FROM</tt> clause can iterate over any intermediate collection returned by a valid query expression. In the tuple stream generated by a <tt>FROM</tt> clause, the ordering of the input tuples are not guaranteed to be preserved.</p>
1864<div class="section">
1865<div class="section">
1866<h5><a name="Example"></a>Example</h5>
1867
1868<div>
1869<div>
1870<pre class="source">SELECT VALUE foo
1871FROM [1, 2, 2, 3] AS foo
1872WHERE foo &gt; 2;
1873</pre></div></div>
1874
1875<p>Returns:</p>
1876
1877<div>
1878<div>
1879<pre class="source">[
1880 3
1881]
1882</pre></div></div>
1883</div></div></div>
1884<div class="section">
1885<h3><a name="Multiple_FROM_Terms"></a><a name="Multiple_from_terms" id="Multiple_from_terms">Multiple FROM Terms</a></h3>
1886<p>The query language permits correlations among <tt>FROM</tt> terms. Specifically, a <tt>FROM</tt> binding expression can refer to variables defined to its left in the given <tt>FROM</tt> clause. Thus, the first unnesting example above could also be expressed as follows:</p>
1887<div class="section">
1888<div class="section">
1889<h5><a name="Example"></a>Example</h5>
1890
1891<div>
1892<div>
1893<pre class="source">SELECT u.id AS userId, e.organizationName AS orgName
1894FROM GleambookUsers u, u.employment e
1895WHERE u.id = 1;
1896</pre></div></div>
1897</div></div></div>
1898<div class="section">
1899<h3><a name="Expressing_Joins_Using_FROM_Terms"></a><a name="Expressing_joins_using_from_terms" id="Expressing_joins_using_from_terms">Expressing Joins Using FROM Terms</a></h3>
1900<p>Similarly, the join intentions of the other <tt>UNNEST</tt>-based join examples above could be expressed as:</p>
1901<div class="section">
1902<div class="section">
1903<h5><a name="Example"></a>Example</h5>
1904
1905<div>
1906<div>
1907<pre class="source">SELECT u.name AS uname, m.message AS message
1908FROM GleambookUsers u, GleambookMessages m
1909WHERE m.authorId = u.id;
1910</pre></div></div>
1911</div>
1912<div class="section">
1913<h5><a name="Example"></a>Example</h5>
1914
1915<div>
1916<div>
1917<pre class="source">SELECT u.name AS uname, m.message AS message
1918FROM GleambookUsers u,
1919 (
1920 SELECT VALUE msg
1921 FROM GleambookMessages msg
1922 WHERE msg.authorId = u.id
1923 ) AS m;
1924</pre></div></div>
1925
1926<p>Note that the first alternative is one of the SQL-92 approaches to expressing a join.</p></div></div></div>
1927<div class="section">
1928<h3><a name="Implicit_Binding_Variables"></a><a name="Implicit_binding_variables" id="Implicit_binding_variables">Implicit Binding Variables</a></h3>
1929<p>Similar to standard SQL, the query language supports implicit <tt>FROM</tt> binding variables (i.e., aliases), for which a binding variable is generated. Variable generation falls into three cases:</p>
1930<ul>
1931
1932<li>If the binding expression is a variable reference expression, the generated variable&#x2019;s name will be the name of the referenced variable itself.</li>
1933<li>If the binding expression is a field access expression (or a fully qualified name for a dataset), the generated variable&#x2019;s name will be the last identifier (or the dataset name) in the expression.</li>
1934<li>For all other cases, a compilation error will be raised.</li>
1935</ul>
1936<p>The next two examples show queries that do not provide binding variables in their <tt>FROM</tt> clauses.</p>
1937<div class="section">
1938<div class="section">
1939<h5><a name="Example"></a>Example</h5>
1940
1941<div>
1942<div>
1943<pre class="source">SELECT GleambookUsers.name, GleambookMessages.message
1944FROM GleambookUsers, GleambookMessages
1945WHERE GleambookMessages.authorId = GleambookUsers.id;
1946</pre></div></div>
1947
1948<p>Returns:</p>
1949
1950<div>
1951<div>
1952<pre class="source">[ {
1953 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1954 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
1955}, {
1956 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1957 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
1958}, {
1959 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1960 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
1961}, {
1962 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1963 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
1964}, {
1965 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1966 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
1967}, {
1968 &quot;name&quot;: &quot;IsbelDull&quot;,
1969 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
1970}, {
1971 &quot;name&quot;: &quot;IsbelDull&quot;,
1972 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
1973} ]
1974</pre></div></div>
1975</div>
1976<div class="section">
1977<h5><a name="Example"></a>Example</h5>
1978
1979<div>
1980<div>
1981<pre class="source">SELECT GleambookUsers.name, GleambookMessages.message
1982FROM GleambookUsers,
1983 (
1984 SELECT VALUE GleambookMessages
1985 FROM GleambookMessages
1986 WHERE GleambookMessages.authorId = GleambookUsers.id
1987 );
1988</pre></div></div>
1989
1990<p>Returns:</p>
1991
1992<div>
1993<div>
1994<pre class="source">Error: &quot;Syntax error: Need an alias for the enclosed expression:\n(select element GleambookMessages\n from GleambookMessages as GleambookMessages\n where (GleambookMessages.authorId = GleambookUsers.id)\n )&quot;,
1995 &quot;query_from_user&quot;: &quot;use TinySocial;\n\nSELECT GleambookUsers.name, GleambookMessages.message\n FROM GleambookUsers,\n (\n SELECT VALUE GleambookMessages\n FROM GleambookMessages\n WHERE GleambookMessages.authorId = GleambookUsers.id\n );&quot;
1996</pre></div></div>
1997
1998<p>More information on implicit binding variables can be found in the appendix section on Variable Resolution.</p></div></div></div></div>
1999<div class="section">
2000<h2><a name="JOIN_Clauses"></a><a name="Join_clauses" id="Join_clauses">JOIN Clauses</a></h2>
2001<p>The join clause in the query language supports both inner joins and left outer joins from standard SQL.</p>
2002<div class="section">
2003<h3><a name="Inner_joins" id="Inner_joins">Inner joins</a></h3>
2004<p>Using a <tt>JOIN</tt> clause, the inner join intent from the preceding examples can also be expressed as follows:</p>
2005<div class="section">
2006<div class="section">
2007<h5><a name="Example"></a>Example</h5>
2008
2009<div>
2010<div>
2011<pre class="source">SELECT u.name AS uname, m.message AS message
2012FROM GleambookUsers u JOIN GleambookMessages m ON m.authorId = u.id;
2013</pre></div></div>
2014</div></div></div>
2015<div class="section">
2016<h3><a name="Left_Outer_Joins"></a><a name="Left_outer_joins" id="Left_outer_joins">Left Outer Joins</a></h3>
2017<p>The query language supports SQL&#x2019;s notion of left outer join. The following query is an example:</p>
2018
2019<div>
2020<div>
2021<pre class="source">SELECT u.name AS uname, m.message AS message
2022FROM GleambookUsers u LEFT OUTER JOIN GleambookMessages m ON m.authorId = u.id;
2023</pre></div></div>
2024
2025<p>Returns:</p>
2026
2027<div>
2028<div>
2029<pre class="source">[ {
2030 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
2031 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
2032}, {
2033 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
2034 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
2035}, {
2036 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
2037 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
2038}, {
2039 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
2040 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
2041}, {
2042 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
2043 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
2044}, {
2045 &quot;uname&quot;: &quot;IsbelDull&quot;,
2046 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
2047}, {
2048 &quot;uname&quot;: &quot;IsbelDull&quot;,
2049 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
2050}, {
2051 &quot;uname&quot;: &quot;EmoryUnk&quot;
2052} ]
2053</pre></div></div>
2054
2055<p>For non-matching left-side tuples, the query language produces <tt>MISSING</tt> values for the right-side binding variables; that is why the last object in the above result doesn&#x2019;t have a <tt>message</tt> field. Note that this is slightly different from standard SQL, which instead would fill in <tt>NULL</tt> values for the right-side fields. The reason for this difference is that, for non-matches in its join results, the query language views fields from the right-side as being &#x201c;not there&#x201d; (a.k.a. <tt>MISSING</tt>) instead of as being &#x201c;there but unknown&#x201d; (i.e., <tt>NULL</tt>).</p>
2056<p>The left-outer join query can also be expressed using <tt>LEFT OUTER UNNEST</tt>:</p>
2057
2058<div>
2059<div>
2060<pre class="source">SELECT u.name AS uname, m.message AS message
2061FROM GleambookUsers u
2062LEFT OUTER UNNEST (
2063 SELECT VALUE message
2064 FROM GleambookMessages message
2065 WHERE message.authorId = u.id
2066 ) m;
2067</pre></div></div>
2068
2069<p>In general, SQL-style join queries can also be expressed by <tt>UNNEST</tt> clauses and left outer join queries can be expressed by <tt>LEFT OUTER UNNESTs</tt>.</p></div>
2070<div class="section">
2071<h3><a name="Variable_scope_in_JOIN_clauses"></a><a name="Join_variable_scope" id="Join_variable_scope">Variable scope in JOIN clauses</a></h3>
2072<p>Variables defined by <tt>JOIN</tt> subclauses are not visible to other subclauses in the same <tt>FROM</tt> clause. This also applies to the <tt>FROM</tt> variable that starts the <tt>JOIN</tt> subclause.</p>
2073<div class="section">
2074<div class="section">
2075<h5><a name="Example"></a>Example</h5>
2076
2077<div>
2078<div>
2079<pre class="source">SELECT * FROM GleambookUsers u
2080JOIN (SELECT VALUE m
2081 FROM GleambookMessages m
2082 WHERE m.authorId = u.id) m
2083ON u.id = m.authorId;
2084</pre></div></div>
2085
2086<p>The variable <tt>u</tt> defined by the <tt>FROM</tt> clause is not visible inside the <tt>JOIN</tt> subclause, so this query returns no results.</p></div></div></div></div>
2087<div class="section">
2088<h2><a name="GROUP_BY_Clauses"></a><a name="Group_By_clauses" id="Group_By_clauses">GROUP BY Clauses</a></h2>
2089<p>The <tt>GROUP BY</tt> clause generalizes standard SQL&#x2019;s grouping and aggregation semantics, but it also retains backward compatibility with the standard (relational) SQL <tt>GROUP BY</tt> and aggregation features.</p>
2090<div class="section">
2091<h3><a name="Group_variables" id="Group_variables">Group variables</a></h3>
2092<p>In a <tt>GROUP BY</tt> clause, in addition to the binding variable(s) defined for the grouping key(s), the query language allows a user to define a <i>group variable</i> by using the clause&#x2019;s <tt>GROUP AS</tt> extension to denote the resulting group. After grouping, then, the query&#x2019;s in-scope variables include the grouping key&#x2019;s binding variables as well as this group variable which will be bound to one collection value for each group. This per-group collection (i.e., multiset) value will be a set of nested objects in which each field of the object is the result of a renamed variable defined in parentheses following the group variable&#x2019;s name. The <tt>GROUP AS</tt> syntax is as follows:</p>
2093
2094<div>
2095<div>
2096<pre class="source">&lt;GROUP&gt; &lt;AS&gt; Variable (&quot;(&quot; VariableReference &lt;AS&gt; Identifier (&quot;,&quot; VariableReference &lt;AS&gt; Identifier )* &quot;)&quot;)?
2097</pre></div></div>
2098
2099<div class="section">
2100<div class="section">
2101<h5><a name="Example"></a>Example</h5>
2102
2103<div>
2104<div>
2105<pre class="source">SELECT *
2106FROM GleambookMessages message
2107GROUP BY message.authorId AS uid GROUP AS msgs(message AS msg);
2108</pre></div></div>
2109
2110<p>This first example query returns:</p>
2111
2112<div>
2113<div>
2114<pre class="source">[ {
2115 &quot;msgs&quot;: [
2116 {
2117 &quot;msg&quot;: {
2118 &quot;senderLocation&quot;: [
2119 38.97,
2120 77.49
2121 ],
2122 &quot;inResponseTo&quot;: 1,
2123 &quot;messageId&quot;: 11,
2124 &quot;authorId&quot;: 1,
2125 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
2126 }
2127 },
2128 {
2129 &quot;msg&quot;: {
2130 &quot;senderLocation&quot;: [
2131 41.66,
2132 80.87
2133 ],
2134 &quot;inResponseTo&quot;: 4,
2135 &quot;messageId&quot;: 2,
2136 &quot;authorId&quot;: 1,
2137 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
2138 }
2139 },
2140 {
2141 &quot;msg&quot;: {
2142 &quot;senderLocation&quot;: [
2143 37.73,
2144 97.04
2145 ],
2146 &quot;inResponseTo&quot;: 2,
2147 &quot;messageId&quot;: 4,
2148 &quot;authorId&quot;: 1,
2149 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
2150 }
2151 },
2152 {
2153 &quot;msg&quot;: {
2154 &quot;senderLocation&quot;: [
2155 40.33,
2156 80.87
2157 ],
2158 &quot;inResponseTo&quot;: 11,
2159 &quot;messageId&quot;: 8,
2160 &quot;authorId&quot;: 1,
2161 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
2162 }
2163 },
2164 {
2165 &quot;msg&quot;: {
2166 &quot;senderLocation&quot;: [
2167 42.5,
2168 70.01
2169 ],
2170 &quot;inResponseTo&quot;: 12,
2171 &quot;messageId&quot;: 10,
2172 &quot;authorId&quot;: 1,
2173 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
2174 }
2175 }
2176 ],
2177 &quot;uid&quot;: 1
2178}, {
2179 &quot;msgs&quot;: [
2180 {
2181 &quot;msg&quot;: {
2182 &quot;senderLocation&quot;: [
2183 31.5,
2184 75.56
2185 ],
2186 &quot;inResponseTo&quot;: 1,
2187 &quot;messageId&quot;: 6,
2188 &quot;authorId&quot;: 2,
2189 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
2190 }
2191 },
2192 {
2193 &quot;msg&quot;: {
2194 &quot;senderLocation&quot;: [
2195 48.09,
2196 81.01
2197 ],
2198 &quot;inResponseTo&quot;: 4,
2199 &quot;messageId&quot;: 3,
2200 &quot;authorId&quot;: 2,
2201 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
2202 }
2203 }
2204 ],
2205 &quot;uid&quot;: 2
2206} ]
2207</pre></div></div>
2208
2209<p>As we can see from the above query result, each group in the example query&#x2019;s output has an associated group variable value called <tt>msgs</tt> that appears in the <tt>SELECT *</tt>&#x2019;s result. This variable contains a collection of objects associated with the group; each of the group&#x2019;s <tt>message</tt> values appears in the <tt>msg</tt> field of the objects in the <tt>msgs</tt> collection.</p>
2210<p>The group variable in the query language makes more complex, composable, nested subqueries over a group possible, which is important given the language&#x2019;s more complex data model (relative to SQL). As a simple example of this, as we really just want the messages associated with each user, we might wish to avoid the &#x201c;extra wrapping&#x201d; of each message as the <tt>msg</tt> field of an object. (That wrapping is useful in more complex cases, but is essentially just in the way here.) We can use a subquery in the <tt>SELECT</tt> clause to tunnel through the extra nesting and produce the desired result.</p></div>
2211<div class="section">
2212<h5><a name="Example"></a>Example</h5>
2213
2214<div>
2215<div>
2216<pre class="source">SELECT uid, (SELECT VALUE g.msg FROM g) AS msgs
2217FROM GleambookMessages gbm
2218GROUP BY gbm.authorId AS uid
2219GROUP AS g(gbm as msg);
2220</pre></div></div>
2221
2222<p>This variant of the example query returns:</p>
2223
2224<div>
2225<div>
2226<pre class="source"> [ {
2227 &quot;msgs&quot;: [
2228 {
2229 &quot;senderLocation&quot;: [
2230 38.97,
2231 77.49
2232 ],
2233 &quot;inResponseTo&quot;: 1,
2234 &quot;messageId&quot;: 11,
2235 &quot;authorId&quot;: 1,
2236 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
2237 },
2238 {
2239 &quot;senderLocation&quot;: [
2240 41.66,
2241 80.87
2242 ],
2243 &quot;inResponseTo&quot;: 4,
2244 &quot;messageId&quot;: 2,
2245 &quot;authorId&quot;: 1,
2246 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
2247 },
2248 {
2249 &quot;senderLocation&quot;: [
2250 37.73,
2251 97.04
2252 ],
2253 &quot;inResponseTo&quot;: 2,
2254 &quot;messageId&quot;: 4,
2255 &quot;authorId&quot;: 1,
2256 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
2257 },
2258 {
2259 &quot;senderLocation&quot;: [
2260 40.33,
2261 80.87
2262 ],
2263 &quot;inResponseTo&quot;: 11,
2264 &quot;messageId&quot;: 8,
2265 &quot;authorId&quot;: 1,
2266 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
2267 },
2268 {
2269 &quot;senderLocation&quot;: [
2270 42.5,
2271 70.01
2272 ],
2273 &quot;inResponseTo&quot;: 12,
2274 &quot;messageId&quot;: 10,
2275 &quot;authorId&quot;: 1,
2276 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
2277 }
2278 ],
2279 &quot;uid&quot;: 1
2280 }, {
2281 &quot;msgs&quot;: [
2282 {
2283 &quot;senderLocation&quot;: [
2284 31.5,
2285 75.56
2286 ],
2287 &quot;inResponseTo&quot;: 1,
2288 &quot;messageId&quot;: 6,
2289 &quot;authorId&quot;: 2,
2290 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
2291 },
2292 {
2293 &quot;senderLocation&quot;: [
2294 48.09,
2295 81.01
2296 ],
2297 &quot;inResponseTo&quot;: 4,
2298 &quot;messageId&quot;: 3,
2299 &quot;authorId&quot;: 2,
2300 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
2301 }
2302 ],
2303 &quot;uid&quot;: 2
2304 } ]
2305</pre></div></div>
2306
2307<p>The next example shows a more interesting case involving the use of a subquery in the <tt>SELECT</tt> list. Here the subquery further processes the groups. There is no renaming in the declaration of the group variable <tt>g</tt> such that <tt>g</tt> only has one field <tt>gbm</tt> which comes from the <tt>FROM</tt> clause.</p></div>
2308<div class="section">
2309<h5><a name="Example"></a>Example</h5>
2310
2311<div>
2312<div>
2313<pre class="source">SELECT uid,
2314 (SELECT VALUE g.gbm
2315 FROM g
2316 WHERE g.gbm.message LIKE '% like%'
2317 ORDER BY g.gbm.messageId
2318 LIMIT 2) AS msgs
2319FROM GleambookMessages gbm
2320GROUP BY gbm.authorId AS uid
2321GROUP AS g;
2322</pre></div></div>
2323
2324<p>This example query returns:</p>
2325
2326<div>
2327<div>
2328<pre class="source">[ {
2329 &quot;msgs&quot;: [
2330 {
2331 &quot;senderLocation&quot;: [
2332 40.33,
2333 80.87
2334 ],
2335 &quot;inResponseTo&quot;: 11,
2336 &quot;messageId&quot;: 8,
2337 &quot;authorId&quot;: 1,
2338 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
2339 }
2340 ],
2341 &quot;uid&quot;: 1
2342}, {
2343 &quot;msgs&quot;: [
2344 {
2345 &quot;senderLocation&quot;: [
2346 48.09,
2347 81.01
2348 ],
2349 &quot;inResponseTo&quot;: 4,
2350 &quot;messageId&quot;: 3,
2351 &quot;authorId&quot;: 2,
2352 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
2353 },
2354 {
2355 &quot;senderLocation&quot;: [
2356 31.5,
2357 75.56
2358 ],
2359 &quot;inResponseTo&quot;: 1,
2360 &quot;messageId&quot;: 6,
2361 &quot;authorId&quot;: 2,
2362 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
2363 }
2364 ],
2365 &quot;uid&quot;: 2
2366} ]
2367</pre></div></div>
2368</div></div></div>
2369<div class="section">
2370<h3><a name="Implicit_Grouping_Key_Variables"></a><a name="Implicit_group_key_variables" id="Implicit_group_key_variables">Implicit Grouping Key Variables</a></h3>
2371<p>In the query language syntax, providing named binding variables for <tt>GROUP BY</tt> key expressions is optional. If a grouping key is missing a user-provided binding variable, the underlying compiler will generate one. Automatic grouping key variable naming falls into three cases, much like the treatment of unnamed projections:</p>
2372<ul>
2373
2374<li>If the grouping key expression is a variable reference expression, the generated variable gets the same name as the referred variable;</li>
2375<li>If the grouping key expression is a field access expression, the generated variable gets the same name as the last identifier in the expression;</li>
2376<li>For all other cases, the compiler generates a unique variable (but the user query is unable to refer to this generated variable).</li>
2377</ul>
2378<p>The next example illustrates a query that doesn&#x2019;t provide binding variables for its grouping key expressions.</p>
2379<div class="section">
2380<div class="section">
2381<h5><a name="Example"></a>Example</h5>
2382
2383<div>
2384<div>
2385<pre class="source">SELECT authorId,
2386 (SELECT VALUE g.gbm
2387 FROM g
2388 WHERE g.gbm.message LIKE '% like%'
2389 ORDER BY g.gbm.messageId
2390 LIMIT 2) AS msgs
2391FROM GleambookMessages gbm
2392GROUP BY gbm.authorId
2393GROUP AS g;
2394</pre></div></div>
2395
2396<p>This query returns:</p>
2397
2398<div>
2399<div>
2400<pre class="source"> [ {
2401 &quot;msgs&quot;: [
2402 {
2403 &quot;senderLocation&quot;: [
2404 40.33,
2405 80.87
2406 ],
2407 &quot;inResponseTo&quot;: 11,
2408 &quot;messageId&quot;: 8,
2409 &quot;authorId&quot;: 1,
2410 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
2411 }
2412 ],
2413 &quot;authorId&quot;: 1
2414}, {
2415 &quot;msgs&quot;: [
2416 {
2417 &quot;senderLocation&quot;: [
2418 48.09,
2419 81.01
2420 ],
2421 &quot;inResponseTo&quot;: 4,
2422 &quot;messageId&quot;: 3,
2423 &quot;authorId&quot;: 2,
2424 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
2425 },
2426 {
2427 &quot;senderLocation&quot;: [
2428 31.5,
2429 75.56
2430 ],
2431 &quot;inResponseTo&quot;: 1,
2432 &quot;messageId&quot;: 6,
2433 &quot;authorId&quot;: 2,
2434 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
2435 }
2436 ],
2437 &quot;authorId&quot;: 2
2438} ]
2439</pre></div></div>
2440
2441<p>Based on the three variable generation rules, the generated variable for the grouping key expression <tt>message.authorId</tt> is <tt>authorId</tt> (which is how it is referred to in the example&#x2019;s <tt>SELECT</tt> clause).</p></div></div></div>
2442<div class="section">
2443<h3><a name="Implicit_Group_Variables"></a><a name="Implicit_group_variables" id="Implicit_group_variables">Implicit Group Variables</a></h3>
2444<p>The group variable itself is also optional in the <tt>GROUP BY</tt> syntax. If a user&#x2019;s query does not declare the name and structure of the group variable using <tt>GROUP AS</tt>, the query compiler will generate a unique group variable whose fields include all of the binding variables defined in the <tt>FROM</tt> clause of the current enclosing <tt>SELECT</tt> statement. In this case the user&#x2019;s query will not be able to refer to the generated group variable, but is able to call SQL-92 aggregation functions as in SQL-92.</p></div>
2445<div class="section">
2446<h3><a name="Aggregation_Functions"></a><a name="Aggregation_functions" id="Aggregation_functions">Aggregation Functions</a></h3>
2447<p>In the traditional SQL, which doesn&#x2019;t support nested data, grouping always also involves the use of aggregation to compute properties of the groups (for example, the average number of messages per user rather than the actual set of messages per user). Each aggregation function in the query language takes a collection (for example, the group of messages) as its input and produces a scalar value as its output. These aggregation functions, being truly functional in nature (unlike in SQL), can be used anywhere in a query where an expression is allowed. The following table catalogs the built-in aggregation functions of the query language and also indicates how each one handles <tt>NULL</tt>/<tt>MISSING</tt> values in the input collection or a completely empty input collection:</p>
2448<table border="0" class="table table-striped">
2449<thead>
2450
2451<tr class="a">
2452<th> Function </th>
2453<th> NULL </th>
2454<th> MISSING </th>
2455<th> Empty Collection </th></tr>
2456</thead><tbody>
2457
2458<tr class="b">
2459<td> STRICT_COUNT </td>
2460<td> counted </td>
2461<td> counted </td>
2462<td> 0 </td></tr>
2463<tr class="a">
2464<td> STRICT_SUM </td>
2465<td> returns NULL </td>
2466<td> returns NULL </td>
2467<td> returns NULL </td></tr>
2468<tr class="b">
2469<td> STRICT_MAX </td>
2470<td> returns NULL </td>
2471<td> returns NULL </td>
2472<td> returns NULL </td></tr>
2473<tr class="a">
2474<td> STRICT_MIN </td>
2475<td> returns NULL </td>
2476<td> returns NULL </td>
2477<td> returns NULL </td></tr>
2478<tr class="b">
2479<td> STRICT_AVG </td>
2480<td> returns NULL </td>
2481<td> returns NULL </td>
2482<td> returns NULL </td></tr>
2483<tr class="a">
2484<td> STRICT_STDDEV_SAMP </td>
2485<td> returns NULL </td>
2486<td> returns NULL </td>
2487<td> returns NULL </td></tr>
2488<tr class="b">
2489<td> STRICT_STDDEV_POP </td>
2490<td> returns NULL </td>
2491<td> returns NULL </td>
2492<td> returns NULL </td></tr>
2493<tr class="a">
2494<td> STRICT_VAR_SAMP </td>
2495<td> returns NULL </td>
2496<td> returns NULL </td>
2497<td> returns NULL </td></tr>
2498<tr class="b">
2499<td> STRICT_VAR_POP </td>
2500<td> returns NULL </td>
2501<td> returns NULL </td>
2502<td> returns NULL </td></tr>
2503<tr class="a">
2504<td> STRICT_SKEWNESS </td>
2505<td> returns NULL </td>
2506<td> returns NULL </td>
2507<td> returns NULL </td></tr>
2508<tr class="b">
2509<td> STRICT_KURTOSIS </td>
2510<td> returns NULL </td>
2511<td> returns NULL </td>
2512<td> returns NULL </td></tr>
2513<tr class="a">
2514<td> ARRAY_COUNT </td>
2515<td> not counted </td>
2516<td> not counted </td>
2517<td> 0 </td></tr>
2518<tr class="b">
2519<td> ARRAY_SUM </td>
2520<td> ignores NULL </td>
2521<td> ignores NULL </td>
2522<td> returns NULL </td></tr>
2523<tr class="a">
2524<td> ARRAY_MAX </td>
2525<td> ignores NULL </td>
2526<td> ignores NULL </td>
2527<td> returns NULL </td></tr>
2528<tr class="b">
2529<td> ARRAY_MIN </td>
2530<td> ignores NULL </td>
2531<td> ignores NULL </td>
2532<td> returns NULL </td></tr>
2533<tr class="a">
2534<td> ARRAY_AVG </td>
2535<td> ignores NULL </td>
2536<td> ignores NULL </td>
2537<td> returns NULL </td></tr>
2538<tr class="b">
2539<td> ARRAY_STDDEV_SAMP </td>
2540<td> ignores NULL </td>
2541<td> ignores NULL </td>
2542<td> returns NULL </td></tr>
2543<tr class="a">
2544<td> ARRAY_STDDEV_POP </td>
2545<td> ignores NULL </td>
2546<td> ignores NULL </td>
2547<td> returns NULL </td></tr>
2548<tr class="b">
2549<td> ARRAY_VAR_SAMP </td>
2550<td> ignores NULL </td>
2551<td> ignores NULL </td>
2552<td> returns NULL </td></tr>
2553<tr class="a">
2554<td> ARRAY_VAR_POP </td>
2555<td> ignores NULL </td>
2556<td> ignores NULL </td>
2557<td> returns NULL </td></tr>
2558<tr class="b">
2559<td> ARRAY_SKEWNESS </td>
2560<td> ignores NULL </td>
2561<td> ignores NULL </td>
2562<td> returns NULL </td></tr>
2563<tr class="a">
2564<td> ARRAY_KURTOSIS </td>
2565<td> ignores NULL </td>
2566<td> ignores NULL </td>
2567<td> returns NULL </td></tr>
2568</tbody>
2569</table>
2570<p>Notice that the query language offers two versions for each of the aggregate functions listed above. For each function, the STRICT version handles <tt>UNKNOWN</tt> values in a semantically strict fashion, where unknown values in the input result in unknown values in the output; and the ARRAY version handles them in the ad hoc &#x201c;just ignore the unknown values&#x201d; fashion that the SQL standard chose to adopt.</p>
2571<div class="section">
2572<div class="section">
2573<h5><a name="Example"></a>Example</h5>
2574
2575<div>
2576<div>
2577<pre class="source">ARRAY_AVG(
2578 (
2579 SELECT VALUE ARRAY_COUNT(friendIds) FROM GleambookUsers
2580 )
2581);
2582</pre></div></div>
2583
2584<p>This example returns:</p>
2585
2586<div>
2587<div>
2588<pre class="source">3.3333333333333335
2589</pre></div></div>
2590</div>
2591<div class="section">
2592<h5><a name="Example"></a>Example</h5>
2593
2594<div>
2595<div>
2596<pre class="source">SELECT uid AS uid, ARRAY_COUNT(grp) AS msgCnt
2597FROM GleambookMessages message
2598GROUP BY message.authorId AS uid
2599GROUP AS grp(message AS msg);
2600</pre></div></div>
2601
2602<p>This query returns:</p>
2603
2604<div>
2605<div>
2606<pre class="source">[ {
2607 &quot;uid&quot;: 1,
2608 &quot;msgCnt&quot;: 5
2609}, {
2610 &quot;uid&quot;: 2,
2611 &quot;msgCnt&quot;: 2
2612} ]
2613</pre></div></div>
2614
2615<p>Notice how the query forms groups where each group involves a message author and their messages. (SQL cannot do this because the grouped intermediate result is non-1NF in nature.) The query then uses the collection aggregate function ARRAY_COUNT to get the cardinality of each group of messages.</p>
2616<p>Each aggregation function in the query language supports the DISTINCT modifier that removes duplicate values from the input collection.</p></div>
2617<div class="section">
2618<h5><a name="Example"></a>Example</h5>
2619
2620<div>
2621<div>
2622<pre class="source">ARRAY_SUM(DISTINCT [1, 1, 2, 2, 3])
2623</pre></div></div>
2624
2625<p>This query returns:</p>
2626
2627<div>
2628<div>
2629<pre class="source">6
2630</pre></div></div>
2631</div></div></div>
2632<div class="section">
2633<h3><a name="SQL-92_Aggregation_Functions"></a><a name="SQL-92_aggregation_functions" id="SQL-92_aggregation_functions">SQL-92 Aggregation Functions</a></h3>
2634<p>For compatibility with the traditional SQL aggregation functions, the query language also offers SQL-92&#x2019;s aggregation function symbols (<tt>COUNT</tt>, <tt>SUM</tt>, <tt>MAX</tt>, <tt>MIN</tt>, <tt>AVG</tt>, <tt>ARRAY_AGG</tt>, <tt>STDDEV_SAMP</tt>, <tt>STDDEV_POP</tt>, <tt>VAR_SAMP</tt>, <tt>VAR_POP</tt>) as supported syntactic sugar. The query compiler rewrites queries that utilize these function symbols into queries that only use the collection aggregate functions of the query language. The following example uses the SQL-92 syntax approach to compute a result that is identical to that of the more explicit example above:</p>
2635<div class="section">
2636<div class="section">
2637<h5><a name="Example"></a>Example</h5>
2638
2639<div>
2640<div>
2641<pre class="source">SELECT uid, COUNT(*) AS msgCnt
2642FROM GleambookMessages msg
2643GROUP BY msg.authorId AS uid;
2644</pre></div></div>
2645
2646<p>It is important to realize that <tt>COUNT</tt> is actually <b>not</b> a built-in aggregation function. Rather, the <tt>COUNT</tt> query above is using a special &#x201c;sugared&#x201d; function symbol that the query compiler will rewrite as follows:</p>
2647
2648<div>
2649<div>
2650<pre class="source">SELECT uid AS uid, ARRAY_COUNT( (SELECT VALUE 1 FROM `$1` AS g) ) AS msgCnt
2651FROM GleambookMessages msg
2652GROUP BY msg.authorId AS uid
2653GROUP AS `$1`(msg AS msg);
2654</pre></div></div>
2655
2656<p>The same sort of rewritings apply to the function symbols <tt>SUM</tt>, <tt>MAX</tt>, <tt>MIN</tt>, <tt>AVG</tt>, <tt>ARRAY_AGG</tt>,<tt>STDDEV_SAMP</tt>, <tt>STDDEV_POP</tt>, <tt>VAR_SAMP</tt>, and <tt>VAR_POP</tt>. In contrast to the collection aggregate functions of the query language, these special SQL-92 function symbols can only be used in the same way they are in standard SQL (i.e., with the same restrictions).</p>
2657<p>The DISTINCT modifier is also supported for these aggregate functions.</p>
2658<p>The following table shows the SQL-92 functions supported by the query language, their aliases where available, and their corresponding built-in functions.</p>
2659<table border="0" class="table table-striped">
2660<thead>
2661
2662<tr class="a">
2663<th> SQL-92 Function </th>
2664<th> Aliases </th>
2665<th> Corresponding Built-in Function </th></tr>
2666</thead><tbody>
2667
2668<tr class="b">
2669<td> COUNT </td>
2670<td> </td>
2671<td> ARRAY_COUNT </td></tr>
2672<tr class="a">
2673<td> SUM </td>
2674<td> </td>
2675<td> ARRAY_SUM </td></tr>
2676<tr class="b">
2677<td> MAX </td>
2678<td> </td>
2679<td> ARRAY_MAX </td></tr>
2680<tr class="a">
2681<td> MIN </td>
2682<td> </td>
2683<td> ARRAY_MIN </td></tr>
2684<tr class="b">
2685<td> AVG </td>
2686<td> </td>
2687<td> ARRAY_AVG </td></tr>
2688<tr class="a">
2689<td> ARRAY_AGG </td>
2690<td> </td>
2691<td> (none) </td></tr>
2692<tr class="b">
2693<td> STDDEV_SAMP </td>
2694<td> STDDEV </td>
2695<td> ARRAY_STDDEV_SAMP </td></tr>
2696<tr class="a">
2697<td> STDDEV_POP </td>
2698<td> </td>
2699<td> ARRAY_STDDEV_POP </td></tr>
2700<tr class="b">
2701<td> VAR_SAMP </td>
2702<td> VARIANCE, VARIANCE_SAMP </td>
2703<td> ARRAY_VAR_SAMP </td></tr>
2704<tr class="a">
2705<td> VAR_POP </td>
2706<td> VARIANCE_POP </td>
2707<td> ARRAY_VAR_POP </td></tr>
2708</tbody>
2709</table>
2710<p>Note that the <tt>ARRAY_AGG</tt> function symbol is rewritten simply to return the result of the generated subquery, without applying any built-in function.</p>
2711<p>SQL aggregate function calls optionally support a FILTER subclause.</p></div>
2712<div class="section">
2713<h5><a name="Example"></a>Example</h5>
2714
2715<div>
2716<div>
2717<pre class="source">SELECT uid, COUNT(*) FILTER (WHERE msg.message LIKE &quot;%awesome%&quot;) AS msgCnt
2718FROM GleambookMessages msg
2719GROUP BY msg.authorId AS uid;
2720</pre></div></div>
2721
2722<p>The query compiler rewrites this query to use the built-in aggregate as follows:</p>
2723
2724<div>
2725<div>
2726<pre class="source">SELECT uid AS uid, ARRAY_COUNT( (SELECT VALUE 1 FROM `$1` AS g WHERE g.msg.message LIKE &quot;%awesome%&quot;) ) AS msgCnt
2727FROM GleambookMessages msg
2728GROUP BY msg.authorId AS uid
2729GROUP AS `$1`(msg AS msg);
2730</pre></div></div>
2731
2732<p>Note that the FILTER subclause is not supported for built-in aggregate function calls.</p></div></div></div>
2733<div class="section">
2734<h3><a name="SQL-92_Compliant_GROUP_BY_Aggregations"></a><a name="SQL-92_compliant_gby" id="SQL-92_compliant_gby">SQL-92 Compliant GROUP BY Aggregations</a></h3>
2735<p>The query language provides full support for SQL-92 <tt>GROUP BY</tt> aggregation queries. The following query is such an example:</p>
2736<div class="section">
2737<div class="section">
2738<h5><a name="Example"></a>Example</h5>
2739
2740<div>
2741<div>
2742<pre class="source">SELECT msg.authorId, COUNT(*)
2743FROM GleambookMessages msg
2744GROUP BY msg.authorId;
2745</pre></div></div>
2746
2747<p>This query outputs:</p>
2748
2749<div>
2750<div>
2751<pre class="source">[ {
2752 &quot;authorId&quot;: 1,
2753 &quot;$1&quot;: 5
2754}, {
2755 &quot;authorId&quot;: 2,
2756 &quot;$1&quot;: 2
2757} ]
2758</pre></div></div>
2759
2760<p>In principle, a <tt>msg</tt> reference in the query&#x2019;s <tt>SELECT</tt> clause would be &#x201c;sugarized&#x201d; as a collection (as described in <a href="#Implicit_group_variables">Implicit Group Variables</a>). However, since the SELECT expression <tt>msg.authorId</tt> is syntactically identical to a GROUP BY key expression, it will be internally replaced by the generated group key variable. The following is the equivalent rewritten query that will be generated by the compiler for the query above:</p>
2761
2762<div>
2763<div>
2764<pre class="source">SELECT authorId AS authorId, ARRAY_COUNT( (SELECT g.msg FROM `$1` AS g) )
2765FROM GleambookMessages msg
2766GROUP BY msg.authorId AS authorId
2767GROUP AS `$1`(msg AS msg);
2768</pre></div></div>
2769</div></div></div>
2770<div class="section">
2771<h3><a name="Column_Aliases"></a><a name="Column_aliases" id="Column_aliases">Column Aliases</a></h3>
2772<p>The query language also allows column aliases to be used as <tt>ORDER BY</tt> keys.</p>
2773<div class="section">
2774<div class="section">
2775<h5><a name="Example"></a>Example</h5>
2776
2777<div>
2778<div>
2779<pre class="source">SELECT msg.authorId AS aid, COUNT(*)
2780FROM GleambookMessages msg
2781GROUP BY msg.authorId;
2782ORDER BY aid;
2783</pre></div></div>
2784
2785<p>This query returns:</p>
2786
2787<div>
2788<div>
2789<pre class="source">[ {
2790 &quot;$1&quot;: 5,
2791 &quot;aid&quot;: 1
2792}, {
2793 &quot;$1&quot;: 2,
2794 &quot;aid&quot;: 2
2795} ]
2796</pre></div></div>
2797</div></div></div></div>
2798<div class="section">
2799<h2><a name="WHERE_Clauses_and_HAVING_Clauses"></a><a name="Where_having_clauses" id="Where_having_clauses">WHERE Clauses and HAVING Clauses</a></h2>
2800<p>Both <tt>WHERE</tt> clauses and <tt>HAVING</tt> clauses are used to filter input data based on a condition expression. Only tuples for which the condition expression evaluates to <tt>TRUE</tt> are propagated. Note that if the condition expression evaluates to <tt>NULL</tt> or <tt>MISSING</tt> the input tuple will be discarded.</p></div>
2801<div class="section">
2802<h2><a name="ORDER_BY_Clauses"></a><a name="Order_By_clauses" id="Order_By_clauses">ORDER BY Clauses</a></h2>
2803<p>The <tt>ORDER BY</tt> clause is used to globally sort data in either ascending order (i.e., <tt>ASC</tt>) or descending order (i.e., <tt>DESC</tt>). During ordering, <tt>MISSING</tt> and <tt>NULL</tt> are treated as being smaller than any other value if they are encountered in the ordering key(s). <tt>MISSING</tt> is treated as smaller than <tt>NULL</tt> if both occur in the data being sorted. The ordering of values of a given type is consistent with its type&#x2019;s &lt;= ordering; the ordering of values across types is implementation-defined but stable. The following example returns all <tt>GleambookUsers</tt> in descending order by their number of friends.</p>
2804<div class="section">
2805<div class="section">
2806<div class="section">
2807<h5><a name="Example"></a>Example</h5>
2808
2809<div>
2810<div>
2811<pre class="source"> SELECT VALUE user
2812 FROM GleambookUsers AS user
2813 ORDER BY ARRAY_COUNT(user.friendIds) DESC;
2814</pre></div></div>
2815
2816<p>This query returns:</p>
2817
2818<div>
2819<div>
2820<pre class="source"> [ {
2821 &quot;userSince&quot;: &quot;2012-08-20T10:10:00.000Z&quot;,
2822 &quot;friendIds&quot;: [
2823 2,
2824 3,
2825 6,
2826 10
2827 ],
2828 &quot;gender&quot;: &quot;F&quot;,
2829 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
2830 &quot;nickname&quot;: &quot;Mags&quot;,
2831 &quot;alias&quot;: &quot;Margarita&quot;,
2832 &quot;id&quot;: 1,
2833 &quot;employment&quot;: [
2834 {
2835 &quot;organizationName&quot;: &quot;Codetechno&quot;,
2836 &quot;start-date&quot;: &quot;2006-08-06&quot;
2837 },
2838 {
2839 &quot;end-date&quot;: &quot;2010-01-26&quot;,
2840 &quot;organizationName&quot;: &quot;geomedia&quot;,
2841 &quot;start-date&quot;: &quot;2010-06-17&quot;
2842 }
2843 ]
2844 }, {
2845 &quot;userSince&quot;: &quot;2012-07-10T10:10:00.000Z&quot;,
2846 &quot;friendIds&quot;: [
2847 1,
2848 5,
2849 8,
2850 9
2851 ],
2852 &quot;name&quot;: &quot;EmoryUnk&quot;,
2853 &quot;alias&quot;: &quot;Emory&quot;,
2854 &quot;id&quot;: 3,
2855 &quot;employment&quot;: [
2856 {
2857 &quot;organizationName&quot;: &quot;geomedia&quot;,
2858 &quot;endDate&quot;: &quot;2010-01-26&quot;,
2859 &quot;startDate&quot;: &quot;2010-06-17&quot;
2860 }
2861 ]
2862 }, {
2863 &quot;userSince&quot;: &quot;2011-01-22T10:10:00.000Z&quot;,
2864 &quot;friendIds&quot;: [
2865 1,
2866 4
2867 ],
2868 &quot;name&quot;: &quot;IsbelDull&quot;,
2869 &quot;nickname&quot;: &quot;Izzy&quot;,
2870 &quot;alias&quot;: &quot;Isbel&quot;,
2871 &quot;id&quot;: 2,
2872 &quot;employment&quot;: [
2873 {
2874 &quot;organizationName&quot;: &quot;Hexviafind&quot;,
2875 &quot;startDate&quot;: &quot;2010-04-27&quot;
2876 }
2877 ]
2878 } ]
2879</pre></div></div>
2880</div></div></div></div>
2881<div class="section">
2882<h2><a name="LIMIT_Clauses"></a><a name="Limit_clauses" id="Limit_clauses">LIMIT Clauses</a></h2>
2883<p>The <tt>LIMIT</tt> clause is used to limit the result set to a specified constant size. The use of the <tt>LIMIT</tt> clause is illustrated in the next example.</p>
2884<div class="section">
2885<div class="section">
2886<div class="section">
2887<h5><a name="Example"></a>Example</h5>
2888
2889<div>
2890<div>
2891<pre class="source"> SELECT VALUE user
2892 FROM GleambookUsers AS user
2893 ORDER BY len(user.friendIds) DESC
2894 LIMIT 1;
2895</pre></div></div>
2896
2897<p>This query returns:</p>
2898
2899<div>
2900<div>
2901<pre class="source"> [ {
2902 &quot;userSince&quot;: &quot;2012-08-20T10:10:00.000Z&quot;,
2903 &quot;friendIds&quot;: [
2904 2,
2905 3,
2906 6,
2907 10
2908 ],
2909 &quot;gender&quot;: &quot;F&quot;,
2910 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
2911 &quot;nickname&quot;: &quot;Mags&quot;,
2912 &quot;alias&quot;: &quot;Margarita&quot;,
2913 &quot;id&quot;: 1,
2914 &quot;employment&quot;: [
2915 {
2916 &quot;organizationName&quot;: &quot;Codetechno&quot;,
2917 &quot;start-date&quot;: &quot;2006-08-06&quot;
2918 },
2919 {
2920 &quot;end-date&quot;: &quot;2010-01-26&quot;,
2921 &quot;organizationName&quot;: &quot;geomedia&quot;,
2922 &quot;start-date&quot;: &quot;2010-06-17&quot;
2923 }
2924 ]
2925 } ]
2926</pre></div></div>
2927</div></div></div></div>
2928<div class="section">
2929<h2><a name="WITH_Clauses"></a><a name="With_clauses" id="With_clauses">WITH Clauses</a></h2>
2930<p>As in standard SQL, <tt>WITH</tt> clauses are available to improve the modularity of a query. The next query shows an example.</p>
2931<div class="section">
2932<div class="section">
2933<div class="section">
2934<h5><a name="Example"></a>Example</h5>
2935
2936<div>
2937<div>
2938<pre class="source">WITH avgFriendCount AS (
2939 SELECT VALUE AVG(ARRAY_COUNT(user.friendIds))
2940 FROM GleambookUsers AS user
2941)[0]
2942SELECT VALUE user
2943FROM GleambookUsers user
2944WHERE ARRAY_COUNT(user.friendIds) &gt; avgFriendCount;
2945</pre></div></div>
2946
2947<p>This query returns:</p>
2948
2949<div>
2950<div>
2951<pre class="source">[ {
2952 &quot;userSince&quot;: &quot;2012-08-20T10:10:00.000Z&quot;,
2953 &quot;friendIds&quot;: [
2954 2,
2955 3,
2956 6,
2957 10
2958 ],
2959 &quot;gender&quot;: &quot;F&quot;,
2960 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
2961 &quot;nickname&quot;: &quot;Mags&quot;,
2962 &quot;alias&quot;: &quot;Margarita&quot;,
2963 &quot;id&quot;: 1,
2964 &quot;employment&quot;: [
2965 {
2966 &quot;organizationName&quot;: &quot;Codetechno&quot;,
2967 &quot;start-date&quot;: &quot;2006-08-06&quot;
2968 },
2969 {
2970 &quot;end-date&quot;: &quot;2010-01-26&quot;,
2971 &quot;organizationName&quot;: &quot;geomedia&quot;,
2972 &quot;start-date&quot;: &quot;2010-06-17&quot;
2973 }
2974 ]
2975}, {
2976 &quot;userSince&quot;: &quot;2012-07-10T10:10:00.000Z&quot;,
2977 &quot;friendIds&quot;: [
2978 1,
2979 5,
2980 8,
2981 9
2982 ],
2983 &quot;name&quot;: &quot;EmoryUnk&quot;,
2984 &quot;alias&quot;: &quot;Emory&quot;,
2985 &quot;id&quot;: 3,
2986 &quot;employment&quot;: [
2987 {
2988 &quot;organizationName&quot;: &quot;geomedia&quot;,
2989 &quot;endDate&quot;: &quot;2010-01-26&quot;,
2990 &quot;startDate&quot;: &quot;2010-06-17&quot;
2991 }
2992 ]
2993} ]
2994</pre></div></div>
2995
2996<p>The query is equivalent to the following, more complex, inlined form of the query:</p>
2997
2998<div>
2999<div>
3000<pre class="source">SELECT *
3001FROM GleambookUsers user
3002WHERE ARRAY_COUNT(user.friendIds) &gt;
3003 ( SELECT VALUE AVG(ARRAY_COUNT(user.friendIds))
3004 FROM GleambookUsers AS user
3005 ) [0];
3006</pre></div></div>
3007
3008<p>WITH can be particularly useful when a value needs to be used several times in a query.</p>
3009<p>Before proceeding further, notice that both the WITH query and its equivalent inlined variant include the syntax &#x201c;[0]&#x201d; &#x2013; this is due to a noteworthy difference between the query language and SQL-92. In SQL-92, whenever a scalar value is expected and it is being produced by a query expression, the SQL-92 query processor will evaluate the expression, check that there is only one row and column in the result at runtime, and then coerce the one-row/one-column tabular result into a scalar value. A JSON query language, being designed to deal with nested data and schema-less data, should not do this. Collection-valued data is perfectly legal in most contexts, and its data is schema-less, so the query processor rarely knows exactly what to expect where and such automatic conversion would often not be desirable. Thus, in the queries above, the use of &#x201c;[0]&#x201d; extracts the first (i.e., 0th) element of an array-valued query expression&#x2019;s result; this is needed above, even though the result is an array of one element, to extract the only element in the singleton array and obtain the desired scalar for the comparison.</p></div></div></div></div>
3010<div class="section">
3011<h2><a name="LET_Clauses"></a><a name="Let_clauses" id="Let_clauses">LET Clauses</a></h2>
3012<p>Similar to <tt>WITH</tt> clauses, <tt>LET</tt> clauses can be useful when a (complex) expression is used several times within a query, allowing it to be written once to make the query more concise. The next query shows an example.</p>
3013<div class="section">
3014<div class="section">
3015<div class="section">
3016<h5><a name="Example"></a>Example</h5>
3017
3018<div>
3019<div>
3020<pre class="source">SELECT u.name AS uname, messages AS messages
3021FROM GleambookUsers u
3022LET messages = (SELECT VALUE m
3023 FROM GleambookMessages m
3024 WHERE m.authorId = u.id)
3025WHERE EXISTS messages;
3026</pre></div></div>
3027
3028<p>This query lists <tt>GleambookUsers</tt> that have posted <tt>GleambookMessages</tt> and shows all authored messages for each listed user. It returns:</p>
3029
3030<div>
3031<div>
3032<pre class="source">[ {
3033 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
3034 &quot;messages&quot;: [
3035 {
3036 &quot;senderLocation&quot;: [
3037 38.97,
3038 77.49
3039 ],
3040 &quot;inResponseTo&quot;: 1,
3041 &quot;messageId&quot;: 11,
3042 &quot;authorId&quot;: 1,
3043 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
3044 },
3045 {
3046 &quot;senderLocation&quot;: [
3047 41.66,
3048 80.87
3049 ],
3050 &quot;inResponseTo&quot;: 4,
3051 &quot;messageId&quot;: 2,
3052 &quot;authorId&quot;: 1,
3053 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
3054 },
3055 {
3056 &quot;senderLocation&quot;: [
3057 37.73,
3058 97.04
3059 ],
3060 &quot;inResponseTo&quot;: 2,
3061 &quot;messageId&quot;: 4,
3062 &quot;authorId&quot;: 1,
3063 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
3064 },
3065 {
3066 &quot;senderLocation&quot;: [
3067 40.33,
3068 80.87
3069 ],
3070 &quot;inResponseTo&quot;: 11,
3071 &quot;messageId&quot;: 8,
3072 &quot;authorId&quot;: 1,
3073 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
3074 },
3075 {
3076 &quot;senderLocation&quot;: [
3077 42.5,
3078 70.01
3079 ],
3080 &quot;inResponseTo&quot;: 12,
3081 &quot;messageId&quot;: 10,
3082 &quot;authorId&quot;: 1,
3083 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
3084 }
3085 ]
3086}, {
3087 &quot;uname&quot;: &quot;IsbelDull&quot;,
3088 &quot;messages&quot;: [
3089 {
3090 &quot;senderLocation&quot;: [
3091 31.5,
3092 75.56
3093 ],
3094 &quot;inResponseTo&quot;: 1,
3095 &quot;messageId&quot;: 6,
3096 &quot;authorId&quot;: 2,
3097 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
3098 },
3099 {
3100 &quot;senderLocation&quot;: [
3101 48.09,
3102 81.01
3103 ],
3104 &quot;inResponseTo&quot;: 4,
3105 &quot;messageId&quot;: 3,
3106 &quot;authorId&quot;: 2,
3107 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
3108 }
3109 ]
3110} ]
3111</pre></div></div>
3112
3113<p>This query is equivalent to the following query that does not use the <tt>LET</tt> clause:</p>
3114
3115<div>
3116<div>
3117<pre class="source">SELECT u.name AS uname, ( SELECT VALUE m
3118 FROM GleambookMessages m
3119 WHERE m.authorId = u.id
3120 ) AS messages
3121FROM GleambookUsers u
3122WHERE EXISTS ( SELECT VALUE m
3123 FROM GleambookMessages m
3124 WHERE m.authorId = u.id
3125 );
3126</pre></div></div>
3127</div></div></div></div>
3128<div class="section">
3129<h2><a name="UNION_ALL"></a><a name="Union_all" id="Union_all">UNION ALL</a></h2>
3130<p>UNION ALL can be used to combine two input arrays or multisets into one. As in SQL, there is no ordering guarantee on the contents of the output stream. However, unlike SQL, the query language does not constrain what the data looks like on the input streams; in particular, it allows heterogeneity on the input and output streams. A type error will be raised if one of the inputs is not a collection. The following odd but legal query is an example:</p>
3131<div class="section">
3132<div class="section">
3133<div class="section">
3134<h5><a name="Example"></a>Example</h5>
3135
3136<div>
3137<div>
3138<pre class="source">SELECT u.name AS uname
3139FROM GleambookUsers u
3140WHERE u.id = 2
3141 UNION ALL
3142SELECT VALUE m.message
3143FROM GleambookMessages m
3144WHERE authorId=2;
3145</pre></div></div>
3146
3147<p>This query returns:</p>
3148
3149<div>
3150<div>
3151<pre class="source">[
3152 &quot; like product-z its platform is mind-blowing&quot;
3153 , {
3154 &quot;uname&quot;: &quot;IsbelDull&quot;
3155}, &quot; like product-y the plan is amazing&quot;
3156 ]
3157</pre></div></div>
3158</div></div></div></div>
3159<div class="section">
3160<h2><a name="OVER_Clauses"></a><a name="Over_clauses" id="Over_clauses">OVER Clauses</a></h2>
3161<p>All window functions must have an OVER clause to define the window partitions, the order of tuples within those partitions, and the extent of the window frame. Some window functions take additional window options, which are specified by modifiers before the OVER clause.</p>
3162<p>The query language has a dedicated set of window functions. Aggregate functions can also be used as window functions, when they are used with an OVER clause.</p>
3163<div class="section">
3164<h3><a name="Window_Function_Call"></a><a name="Window_function_call" id="Window_function_call">Window Function Call</a></h3>
3165
3166<div>
3167<div>
3168<pre class="source">WindowFunctionCall ::= WindowFunctionType &quot;(&quot; WindowFunctionArguments &quot;)&quot;
3169(WindowFunctionOptions)? &lt;OVER&gt; (Variable &lt;AS&gt;)? &quot;(&quot; WindowDefinition &quot;)&quot;
3170</pre></div></div>
3171
3172<div class="section">
3173<h4><a name="Window_Function_Type"></a><a name="Window_function_type" id="Window_function_type">Window Function Type</a></h4>
3174
3175<div>
3176<div>
3177<pre class="source">WindowFunctionType ::= AggregateFunction | WindowFunction
3178</pre></div></div>
3179
3180<p>Refer to the <a href="builtins.html#AggregateFunctions">Aggregate Functions</a> section for a list of aggregate functions.</p>
3181<p>Refer to the <a href="builtins.html#WindowFunctions">Window Functions</a> section for a list of window functions.</p></div>
3182<div class="section">
3183<h4><a name="Window_Function_Arguments"></a><a name="Window_function_arguments" id="Window_function_arguments">Window Function Arguments</a></h4>
3184
3185<div>
3186<div>
3187<pre class="source">WindowFunctionArguments ::= ( (&lt;DISTINCT&gt;)? Expression |
3188(Expression (&quot;,&quot; Expression (&quot;,&quot; Expression)? )? )? )
3189</pre></div></div>
3190
3191<p>Refer to the <a href="builtins.html#AggregateFunctions">Aggregate Functions</a> section or the <a href="builtins.html#WindowFunctions">Window Functions</a> section for details of the arguments for individual functions.</p></div></div>
3192<div class="section">
3193<h3><a name="Window_Function_Options"></a><a name="Window_function_options" id="Window_function_options">Window Function Options</a></h3>
3194
3195<div>
3196<div>
3197<pre class="source">WindowFunctionOptions ::= (NthValFrom)? (NullsTreatment)?
3198</pre></div></div>
3199
3200<p>Window function options cannot be used with <a href="builtins.html#AggregateFunctions">aggregate functions</a>.</p>
3201<p>Window function options can only be used with some <a href="builtins.html#WindowFunctions">window functions</a>, as described below.</p>
3202<div class="section">
3203<h4><a name="Nth_Val_From"></a><a name="Nth_val_from" id="Nth_val_from">Nth Val From</a></h4>
3204
3205<div>
3206<div>
3207<pre class="source">NthValFrom ::= &lt;FROM&gt; ( &lt;FIRST&gt; | &lt;LAST&gt; )
3208</pre></div></div>
3209
3210<p>The <b>nth val from</b> modifier determines whether the computation begins at the first or last tuple in the window.</p>
3211<p>This modifier can only be used with the <tt>nth_value()</tt> function.</p>
3212<p>This modifier is optional. If omitted, the default setting is <tt>FROM FIRST</tt>.</p></div>
3213<div class="section">
3214<h4><a name="Nulls_Treatment"></a><a name="Nulls_treatment" id="Nulls_treatment">Nulls Treatment</a></h4>
3215
3216<div>
3217<div>
3218<pre class="source">NullsTreatment ::= ( &lt;RESPECT&gt; | &lt;IGNORE&gt; ) &lt;NULLS&gt;
3219</pre></div></div>
3220
3221<p>The <b>nulls treatment</b> modifier determines whether NULL values are included in the computation, or ignored. MISSING values are treated the same way as NULL values.</p>
3222<p>This modifier can only be used with the <tt>first_value()</tt>, <tt>last_value()</tt>, <tt>nth_value()</tt>, <tt>lag()</tt>, and <tt>lead()</tt> functions.</p>
3223<p>This modifier is optional. If omitted, the default setting is <tt>RESPECT NULLS</tt>.</p></div></div>
3224<div class="section">
3225<h3><a name="Window_Frame_Variable"></a><a name="Window_frame_variable" id="Window_frame_variable">Window Frame Variable</a></h3>
3226<p>The AS keyword enables you to specify an alias for the window frame contents. It introduces a variable which will be bound to the contents of the frame. When using a built-in <a href="builtins.html#AggregateFunctions">aggregate function</a> as a window function, the function&#x2019;s argument must be a subquery which refers to this alias, for example:</p>
3227
3228<div>
3229<div>
3230<pre class="source">SELECT ARRAY_COUNT(DISTINCT (FROM alias SELECT VALUE alias.src.field))
3231OVER alias AS (PARTITION BY &#x2026; ORDER BY &#x2026;)
3232FROM source AS src
3233</pre></div></div>
3234
3235<p>The alias is not necessary when using a <a href="builtins.html#WindowFunctions">window function</a>, or when using a standard SQL aggregate function with the OVER clause.</p>
3236<div class="section">
3237<h4><a name="Standard_SQL_Aggregate_Functions_with_the_OVER_Clause"></a><a name="SQL-92_over_clause" id="SQL-92_over_clause">Standard SQL Aggregate Functions with the OVER Clause</a></h4>
3238<p>A standard SQL aggregate function with an OVER clause is rewritten by the query compiler using a built-in aggregate function over a frame variable. For example, the following query with the <tt>sum()</tt> function:</p>
3239
3240<div>
3241<div>
3242<pre class="source">SELECT SUM(field) OVER (PARTITION BY &#x2026; ORDER BY &#x2026;)
3243FROM source AS src
3244</pre></div></div>
3245
3246<p>Is rewritten as the following query using the <tt>array_sum()</tt> function:</p>
3247
3248<div>
3249<div>
3250<pre class="source">SELECT ARRAY_SUM( (SELECT VALUE alias.src.field FROM alias) )
3251 OVER alias AS (PARTITION BY &#x2026; ORDER BY &#x2026;)
3252FROM source AS src
3253</pre></div></div>
3254
3255<p>This is similar to the way that standard SQL aggregate functions are rewritten as built-in aggregate functions in the presence of the GROUP BY clause.</p></div></div>
3256<div class="section">
3257<h3><a name="Window_Definition"></a><a name="Window_definition" id="Window_definition">Window Definition</a></h3>
3258
3259<div>
3260<div>
3261<pre class="source">WindowDefinition ::= (WindowPartitionClause)? (WindowOrderClause
3262(WindowFrameClause (WindowFrameExclusion)? )? )?
3263</pre></div></div>
3264
3265<p>The <b>window definition</b> specifies the partitioning, ordering, and framing for window functions.</p>
3266<div class="section">
3267<h4><a name="Window_Partition_Clause"></a><a name="Window_partition_clause" id="Window_partition_clause">Window Partition Clause</a></h4>
3268
3269<div>
3270<div>
3271<pre class="source">WindowPartitionClause ::= &lt;PARTITION&gt; &lt;BY&gt; Expression (&quot;,&quot; Expression)*
3272</pre></div></div>
3273
3274<p>The <b>window partition clause</b> divides the tuples into logical partitions using one or more expressions.</p>
3275<p>This clause may be used with any <a href="builtins.html#WindowFunctions">window function</a>, or any <a href="builtins.html#AggregateFunctions">aggregate function</a> used as a window function.</p>
3276<p>This clause is optional. If omitted, all tuples are united in a single partition.</p></div>
3277<div class="section">
3278<h4><a name="Window_Order_Clause"></a><a name="Window_order_clause" id="Window_order_clause">Window Order Clause</a></h4>
3279
3280<div>
3281<div>
3282<pre class="source">WindowOrderClause ::= &lt;ORDER&gt; &lt;BY&gt; OrderingTerm (&quot;,&quot; OrderingTerm)*
3283</pre></div></div>
3284
3285<p>The <b>window order clause</b> determines how tuples are ordered within each partition. The window function works on tuples in the order specified by this clause.</p>
3286<p>This clause may be used with any <a href="builtins.html#WindowFunctions">window function</a>, or any <a href="builtins.html#AggregateFunctions">aggregate function</a> used as a window function.</p>
3287<p>This clause is optional. If omitted, all tuples are considered peers, i.e. their order is tied. When tuples in the window partition are tied, each window function behaves differently.</p>
3288<ul>
3289
3290<li>
3291
3292<p>The <tt>row_number()</tt> function returns a distinct number for each tuple. If tuples are tied, the results may be unpredictable.</p>
3293</li>
3294<li>
3295
3296<p>The <tt>rank()</tt>, <tt>dense_rank()</tt>, <tt>percent_rank()</tt>, and <tt>cume_dist()</tt> functions return the same result for each tuple.</p>
3297</li>
3298<li>
3299
3300<p>For other functions, if the <a href="#Window_frame_clause">window frame</a> is defined by <tt>ROWS</tt>, the results may be unpredictable. If the window frame is defined by <tt>RANGE</tt> or <tt>GROUPS</tt>, the results are same for each tuple.</p>
3301</li>
3302</ul>
3303<p>This clause may have multiple <a href="#Ordering_term">ordering terms</a>. To reduce the number of ties, add additional <a href="#Ordering_term">ordering terms</a>.</p>
3304<div class="section">
3305<h5><a name="Note"></a>Note</h5>
3306<p>This clause does not guarantee the overall order of the query results. To guarantee the order of the final results, use the query ORDER BY clause.</p></div></div>
3307<div class="section">
3308<h4><a name="Ordering_Term"></a><a name="Ordering_term" id="Ordering_term">Ordering Term</a></h4>
3309
3310<div>
3311<div>
3312<pre class="source">OrderingTerm ::= Expression ( &lt;ASC&gt; | &lt;DESC&gt; )?
3313</pre></div></div>
3314
3315<p>The <b>ordering term</b> specifies an ordering expression and collation.</p>
3316<p>This clause has the same syntax and semantics as the ordering term for queries. Refer to the <a href="#Order_By_clauses">ORDER BY Clauses</a> section for details.</p></div>
3317<div class="section">
3318<h4><a name="Window_Frame_Clause"></a><a name="Window_frame_clause" id="Window_frame_clause">Window Frame Clause</a></h4>
3319
3320<div>
3321<div>
3322<pre class="source">WindowFrameClause ::= ( &lt;ROWS&gt; | &lt;RANGE&gt; | &lt;GROUPS&gt; ) WindowFrameExtent
3323</pre></div></div>
3324
3325<p>The <b>window frame clause</b> defines the window frame.</p>
3326<p>This clause can be used with all <a href="builtins.html#AggregateFunctions">aggregate functions</a> and some <a href="builtins.html#WindowFunctions">window functions</a> &#x2014; refer to the descriptions of individual functions for more details.</p>
3327<p>This clause is allowed only when the <a href="#Window_order_clause">window order clause</a> is present.</p>
3328<p>This clause is optional.</p>
3329<ul>
3330
3331<li>
3332
3333<p>If this clause is omitted and there is no <a href="#Window_order_clause">window order clause</a>, the window frame is the entire partition.</p>
3334</li>
3335<li>
3336
3337<p>If this clause is omitted but there is a <a href="#Window_order_clause">window order clause</a>, the window frame becomes all tuples in the partition preceding the current tuple and its peers &#x2014; the same as <tt>RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</tt>.</p>
3338</li>
3339</ul>
3340<p>The window frame can be defined in the following ways:</p>
3341<ul>
3342
3343<li>
3344
3345<p><tt>ROWS</tt>: Counts the exact number of tuples within the frame. If window ordering doesn&#x2019;t result in unique ordering, the function may produce unpredictable results. You can add a unique expression or more window ordering expressions to produce unique ordering.</p>
3346</li>
3347<li>
3348
3349<p><tt>RANGE</tt>: Looks for a value offset within the frame. The function produces deterministic results.</p>
3350</li>
3351<li>
3352
3353<p><tt>GROUPS</tt>: Counts all groups of tied rows within the frame. The function produces deterministic results.</p>
3354</li>
3355</ul>
3356<div class="section">
3357<h5><a name="Note"></a>Note</h5>
3358<p>If this clause uses <tt>RANGE</tt> with either <tt>Expression PRECEDING</tt> or <tt>Expression FOLLOWING</tt>, the <a href="#Window_order_clause">window order clause</a> must have only a single ordering term.</p>
3359<p>The ordering term expression must evaluate to a number.</p><!--
3360The ordering term expression must evaluate to a number, a date, a time, or a
3361datetime.
3362If the ordering term expression evaluates to a date, a time, or a datetime, the
3363expression in `Expression PRECEDING` or `Expression FOLLOWING` must evaluate to
3364a duration.
3365-->
3366
3367<p>If these conditions are not met, the window frame will be empty, which means the window function will return its default value: in most cases this is NULL, except for <tt>strict_count()</tt> or <tt>array_count()</tt>, whose default value is 0.</p>
3368<p>This restriction does not apply when the window frame uses <tt>ROWS</tt> or <tt>GROUPS</tt>.</p></div>
3369<div class="section">
3370<h5><a name="Tip"></a>Tip</h5>
3371<p>The <tt>RANGE</tt> window frame is commonly used to define window frames based on date or time.</p>
3372<p>If you want to use <tt>RANGE</tt> with either <tt>Expression PRECEDING</tt> or <tt>Expression FOLLOWING</tt>, and you want to use an ordering expression based on date or time, the expression in <tt>Expression PRECEDING</tt> or <tt>Expression FOLLOWING</tt> must use a data type that can be added to the ordering expression.</p></div></div>
3373<div class="section">
3374<h4><a name="Window_Frame_Extent"></a><a name="Window_frame_extent" id="Window_frame_extent">Window Frame Extent</a></h4>
3375
3376<div>
3377<div>
3378<pre class="source">WindowFrameExtent ::= ( ( &lt;UNBOUNDED&gt; | Expression ) &lt;PRECEDING&gt; | &lt;CURRENT&gt; &lt;ROW&gt; ) |
3379&lt;BETWEEN&gt;
3380 ( &lt;UNBOUNDED&gt; &lt;PRECEDING&gt; | &lt;CURRENT&gt; &lt;ROW&gt; | Expression ( &lt;PRECEDING&gt; | &lt;FOLLOWING&gt; ) )
3381&lt;AND&gt;
3382 ( &lt;UNBOUNDED&gt; &lt;FOLLOWING&gt; | &lt;CURRENT&gt; &lt;ROW&gt; | Expression ( &lt;PRECEDING&gt; | &lt;FOLLOWING&gt; ) )
3383</pre></div></div>
3384
3385<p>The <b>window frame extent clause</b> specifies the start point and end point of the window frame. The expression before <tt>AND</tt> is the start point and the expression after <tt>AND</tt> is the end point. If <tt>BETWEEN</tt> is omitted, you can only specify the start point; the end point becomes <tt>CURRENT ROW</tt>.</p>
3386<p>The window frame end point can&#x2019;t be before the start point. If this clause violates this restriction explicitly, an error will result. If it violates this restriction implicitly, the window frame will be empty, which means the window function will return its default value: in most cases this is NULL, except for <tt>strict_count()</tt> or <tt>array_count()</tt>, whose default value is 0.</p>
3387<p>Window frame extents that result in an explicit violation are:</p>
3388<ul>
3389
3390<li>
3391
3392<p><tt>BETWEEN CURRENT ROW AND Expression PRECEDING</tt></p>
3393</li>
3394<li>
3395
3396<p><tt>BETWEEN Expression FOLLOWING AND Expression PRECEDING</tt></p>
3397</li>
3398<li>
3399
3400<p><tt>BETWEEN Expression FOLLOWING AND CURRENT ROW</tt></p>
3401</li>
3402</ul>
3403<p>Window frame extents that result in an implicit violation are:</p>
3404<ul>
3405
3406<li>
3407
3408<p><tt>BETWEEN UNBOUNDED PRECEDING AND Expression PRECEDING</tt> &#x2014; if <tt>Expression</tt> is too high, some tuples may generate an empty window frame.</p>
3409</li>
3410<li>
3411
3412<p><tt>BETWEEN Expression PRECEDING AND Expression PRECEDING</tt> &#x2014; if the second <tt>Expression</tt> is greater than or equal to the first <tt>Expression</tt>, all result sets will generate an empty window frame.</p>
3413</li>
3414<li>
3415
3416<p><tt>BETWEEN Expression FOLLOWING AND Expression FOLLOWING</tt> &#x2014; if the first <tt>Expression</tt> is greater than or equal to the second <tt>Expression</tt>, all result sets will generate an empty window frame.</p>
3417</li>
3418<li>
3419
3420<p><tt>BETWEEN Expression FOLLOWING AND UNBOUNDED FOLLOWING</tt> &#x2014; if <tt>Expression</tt> is too high, some tuples may generate an empty window frame.</p>
3421</li>
3422<li>
3423
3424<p>If the <a href="#Window_frame_exclusion">window frame exclusion clause</a> is present, any window frame specification may result in empty window frame.</p>
3425</li>
3426</ul>
3427<p>The <tt>Expression</tt> must be a positive constant or an expression that evaluates as a positive number. For <tt>ROWS</tt> or <tt>GROUPS</tt>, the <tt>Expression</tt> must be an integer.</p></div>
3428<div class="section">
3429<h4><a name="Window_Frame_Exclusion"></a><a name="Window_frame_exclusion" id="Window_frame_exclusion">Window Frame Exclusion</a></h4>
3430
3431<div>
3432<div>
3433<pre class="source">WindowFrameExclusion ::= &lt;EXCLUDE&gt; ( &lt;CURRENT&gt; &lt;ROW&gt; | &lt;GROUP&gt; | &lt;TIES&gt; |
3434&lt;NO&gt; &lt;OTHERS&gt; )
3435</pre></div></div>
3436
3437<p>The <b>window frame exclusion clause</b> enables you to exclude specified tuples from the window frame.</p>
3438<p>This clause can be used with all <a href="builtins.html#AggregateFunctions">aggregate functions</a> and some <a href="builtins.html#WindowFunctions">window functions</a> &#x2014; refer to the descriptions of individual functions for more details.</p>
3439<p>This clause is allowed only when the <a href="#Window_frame_clause">window frame clause</a> is present.</p>
3440<p>This clause is optional. If this clause is omitted, the default is no exclusion &#x2014; the same as <tt>EXCLUDE NO OTHERS</tt>.</p>
3441<ul>
3442
3443<li>
3444
3445<p><tt>EXCLUDE CURRENT ROW</tt>: If the current tuple is still part of the window frame, it is removed from the window frame.</p>
3446</li>
3447<li>
3448
3449<p><tt>EXCLUDE GROUP</tt>: The current tuple and any peers of the current tuple are removed from the window frame.</p>
3450</li>
3451<li>
3452
3453<p><tt>EXCLUDE TIES</tt>: Any peers of the current tuple, but not the current tuple itself, are removed from the window frame.</p>
3454</li>
3455<li>
3456
3457<p><tt>EXCLUDE NO OTHERS</tt>: No additional tuples are removed from the window frame.</p>
3458</li>
3459</ul>
3460<p>If the current tuple is already removed from the window frame, then it remains removed from the window frame.</p></div></div></div>
3461<div class="section">
3462<h2><a name="Subqueries" id="Subqueries">Subqueries</a></h2>
3463<p>In the query language, an arbitrary subquery can appear anywhere that an expression can appear. Unlike SQL-92, as was just alluded to, the subqueries in a SELECT list or a boolean predicate need not return singleton, single-column relations. Instead, they may return arbitrary collections. For example, the following query is a variant of the prior group-by query examples; it retrieves an array of up to two &#x201c;dislike&#x201d; messages per user.</p>
3464<div class="section">
3465<div class="section">
3466<div class="section">
3467<h5><a name="Example"></a>Example</h5>
3468
3469<div>
3470<div>
3471<pre class="source">SELECT uid,
3472 (SELECT VALUE m.msg
3473 FROM msgs m
3474 WHERE m.msg.message LIKE '%dislike%'
3475 ORDER BY m.msg.messageId
3476 LIMIT 2) AS msgs
3477FROM GleambookMessages message
3478GROUP BY message.authorId AS uid GROUP AS msgs(message AS msg);
3479</pre></div></div>
3480
3481<p>For our sample data set, this query returns:</p>
3482
3483<div>
3484<div>
3485<pre class="source">[ {
3486 &quot;msgs&quot;: [
3487 {
3488 &quot;senderLocation&quot;: [
3489 41.66,
3490 80.87
3491 ],
3492 &quot;inResponseTo&quot;: 4,
3493 &quot;messageId&quot;: 2,
3494 &quot;authorId&quot;: 1,
3495 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
3496 }
3497 ],
3498 &quot;uid&quot;: 1
3499}, {
3500 &quot;msgs&quot;: [
3501
3502 ],
3503 &quot;uid&quot;: 2
3504} ]
3505</pre></div></div>
3506
3507<p>Note that a subquery, like a top-level <tt>SELECT</tt> statment, always returns a collection &#x2013; regardless of where within a query the subquery occurs &#x2013; and again, its result is never automatically cast into a scalar.</p></div></div></div></div>
3508<div class="section">
3509<h2><a name="Differences_from_SQL-92"></a><a name="Vs_SQL-92" id="Vs_SQL-92">Differences from SQL-92</a></h2>
3510<p>The query language offers the following additional features beyond SQL-92:</p>
3511<ul>
3512
3513<li>Fully composable and functional: A subquery can iterate over any intermediate collection and can appear anywhere in a query.</li>
3514<li>Schema-free: The query language does not assume the existence of a static schema for any data that it processes.</li>
3515<li>Correlated FROM terms: A right-side FROM term expression can refer to variables defined by FROM terms on its left.</li>
3516<li>Powerful GROUP BY: In addition to a set of aggregate functions as in standard SQL, the groups created by the <tt>GROUP BY</tt> clause are directly usable in nested queries and/or to obtain nested results.</li>
3517<li>Generalized SELECT clause: A SELECT clause can return any type of collection, while in SQL-92, a <tt>SELECT</tt> clause has to return a (homogeneous) collection of objects.</li>
3518</ul>
3519<p>The following matrix is a quick &#x201c;SQL-92 compatibility cheat sheet&#x201d; for the query language.</p>
3520<table border="0" class="table table-striped">
3521<thead>
3522
3523<tr class="a">
3524<th> Feature </th>
3525<th> The query language </th>
3526<th> SQL-92 </th>
3527<th> Why different? </th></tr>
3528</thead><tbody>
3529
3530<tr class="b">
3531<td> SELECT * </td>
3532<td> Returns nested objects </td>
3533<td> Returns flattened concatenated objects </td>
3534<td> Nested collections are 1st class citizens </td></tr>
3535<tr class="a">
3536<td> SELECT list </td>
3537<td> order not preserved </td>
3538<td> order preserved </td>
3539<td> Fields in a JSON object are not ordered </td></tr>
3540<tr class="b">
3541<td> Subquery </td>
3542<td> Returns a collection </td>
3543<td> The returned collection is cast into a scalar value if the subquery appears in a SELECT list or on one side of a comparison or as input to a function </td>
3544<td> Nested collections are 1st class citizens </td></tr>
3545<tr class="a">
3546<td> LEFT OUTER JOIN </td>
3547<td> Fills in <tt>MISSING</tt>(s) for non-matches </td>
3548<td> Fills in <tt>NULL</tt>(s) for non-matches </td>
3549<td> &#x201c;Absence&#x201d; is more appropriate than &#x201c;unknown&#x201d; here </td></tr>
3550<tr class="b">
3551<td> UNION ALL </td>
3552<td> Allows heterogeneous inputs and output </td>
3553<td> Input streams must be UNION-compatible and output field names are drawn from the first input stream </td>
3554<td> Heterogenity and nested collections are common </td></tr>
3555<tr class="a">
3556<td> IN constant_expr </td>
3557<td> The constant expression has to be an array or multiset, i.e., [..,..,&#x2026;] </td>
3558<td> The constant collection can be represented as comma-separated items in a paren pair </td>
3559<td> Nested collections are 1st class citizens </td></tr>
3560<tr class="b">
3561<td> String literal </td>
3562<td> Double quotes or single quotes </td>
3563<td> Single quotes only </td>
3564<td> Double quoted strings are pervasive </td></tr>
3565<tr class="a">
3566<td> Delimited identifiers </td>
3567<td> Backticks </td>
3568<td> Double quotes </td>
3569<td> Double quoted strings are pervasive </td></tr>
3570</tbody>
3571</table>
3572<p>The following SQL-92 features are not implemented yet. However, the query language does not conflict with these features:</p>
3573<ul>
3574
3575<li>CROSS JOIN, NATURAL JOIN, UNION JOIN</li>
3576<li>RIGHT and FULL OUTER JOIN</li>
3577<li>INTERSECT, EXCEPT, UNION with set semantics</li>
3578<li>CAST expression</li>
3579<li>COALESCE expression</li>
3580<li>ALL and SOME predicates for linking to subqueries</li>
3581<li>UNIQUE predicate (tests a collection for duplicates)</li>
3582<li>MATCH predicate (tests for referential integrity)</li>
3583<li>Row and Table constructors</li>
3584<li>Preserved order for expressions in a SELECT list</li>
3585</ul><!--
3586 ! Licensed to the Apache Software Foundation (ASF) under one
3587 ! or more contributor license agreements. See the NOTICE file
3588 ! distributed with this work for additional information
3589 ! regarding copyright ownership. The ASF licenses this file
3590 ! to you under the Apache License, Version 2.0 (the
3591 ! "License"); you may not use this file except in compliance
3592 ! with the License. You may obtain a copy of the License at
3593 !
3594 ! http://www.apache.org/licenses/LICENSE-2.0
3595 !
3596 ! Unless required by applicable law or agreed to in writing,
3597 ! software distributed under the License is distributed on an
3598 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3599 ! KIND, either express or implied. See the License for the
3600 ! specific language governing permissions and limitations
3601 ! under the License.
3602 !-->
3603
3604<h1><a name="Errors" id="Errors">4. Errors</a></h1><!--
3605 ! Licensed to the Apache Software Foundation (ASF) under one
3606 ! or more contributor license agreements. See the NOTICE file
3607 ! distributed with this work for additional information
3608 ! regarding copyright ownership. The ASF licenses this file
3609 ! to you under the Apache License, Version 2.0 (the
3610 ! "License"); you may not use this file except in compliance
3611 ! with the License. You may obtain a copy of the License at
3612 !
3613 ! http://www.apache.org/licenses/LICENSE-2.0
3614 !
3615 ! Unless required by applicable law or agreed to in writing,
3616 ! software distributed under the License is distributed on an
3617 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3618 ! KIND, either express or implied. See the License for the
3619 ! specific language governing permissions and limitations
3620 ! under the License.
3621 !-->
3622
3623<p>A query can potentially result in one of the following errors:</p>
3624<ul>
3625
3626<li>syntax error,</li>
3627<li>identifier resolution error,</li>
3628<li>type error,</li>
3629<li>resource error.</li>
3630</ul>
3631<p>If the query processor runs into any error, it will terminate the ongoing processing of the query and immediately return an error message to the client.</p></div>
3632<div class="section">
3633<h2><a name="Syntax_Errors"></a><a name="Syntax_errors" id="Syntax_errors">Syntax Errors</a></h2>
3634<p>A valid query must satisfy the grammar rules of the query language. Otherwise, a syntax error will be raised.</p>
3635<div class="section">
3636<div class="section">
3637<div class="section">
3638<h5><a name="Example"></a>Example</h5>
3639
3640<div>
3641<div>
3642<pre class="source">SELECT *
3643GleambookUsers user
3644</pre></div></div>
3645
3646<p>Since the query misses a <tt>FROM</tt> keyword before the dataset <tt>GleambookUsers</tt>, we will get a syntax error as follows:</p>
3647
3648<div>
3649<div>
3650<pre class="source">Syntax error: In line 2 &gt;&gt;GleambookUsers user;&lt;&lt; Encountered &lt;IDENTIFIER&gt; \&quot;GleambookUsers\&quot; at column 1.
3651</pre></div></div>
3652</div>
3653<div class="section">
3654<h5><a name="Example"></a>Example</h5>
3655
3656<div>
3657<div>
3658<pre class="source">SELECT *
3659FROM GleambookUsers user
3660WHERE type=&quot;advertiser&quot;;
3661</pre></div></div>
3662
3663<p>Since &#x201c;type&#x201d; is a reserved keyword in the query parser, we will get a syntax error as follows:</p>
3664
3665<div>
3666<div>
3667<pre class="source">Error: Syntax error: In line 3 &gt;&gt;WHERE type=&quot;advertiser&quot;;&lt;&lt; Encountered 'type' &quot;type&quot; at column 7.
3668==&gt; WHERE type=&quot;advertiser&quot;;
3669</pre></div></div>
3670</div></div></div></div>
3671<div class="section">
3672<h2><a name="Identifier_Resolution_Errors"></a><a name="Identifier_resolution_errors" id="Identifier_resolution_errors">Identifier Resolution Errors</a></h2>
3673<p>Referring to an undefined identifier can cause an error if the identifier cannot be successfully resolved as a valid field access.</p>
3674<div class="section">
3675<div class="section">
3676<div class="section">
3677<h5><a name="Example"></a>Example</h5>
3678
3679<div>
3680<div>
3681<pre class="source">SELECT *
3682FROM GleambookUser user;
3683</pre></div></div>
3684
3685<p>If we have a typo as above in &#x201c;GleambookUsers&#x201d; that misses the dataset name&#x2019;s ending &#x201c;s&#x201d;, we will get an identifier resolution error as follows:</p>
3686
3687<div>
3688<div>
3689<pre class="source">Error: Cannot find dataset GleambookUser in dataverse Default nor an alias with name GleambookUser!
3690</pre></div></div>
3691</div>
3692<div class="section">
3693<h5><a name="Example"></a>Example</h5>
3694
3695<div>
3696<div>
3697<pre class="source">SELECT name, message
3698FROM GleambookUsers u JOIN GleambookMessages m ON m.authorId = u.id;
3699</pre></div></div>
3700
3701<p>If the compiler cannot figure out how to resolve an unqualified field name, which will occur if there is more than one variable in scope (e.g., <tt>GleambookUsers u</tt> and <tt>GleambookMessages m</tt> as above), we will get an identifier resolution error as follows:</p>
3702
3703<div>
3704<div>
3705<pre class="source">Error: Cannot resolve ambiguous alias reference for undefined identifier name
3706</pre></div></div>
3707</div></div></div></div>
3708<div class="section">
3709<h2><a name="Type_Errors"></a><a name="Type_errors" id="Type_errors">Type Errors</a></h2>
3710<p>The query compiler does type checks based on its available type information. In addition, the query runtime also reports type errors if a data model instance it processes does not satisfy the type requirement.</p>
3711<div class="section">
3712<div class="section">
3713<div class="section">
3714<h5><a name="Example"></a>Example</h5>
3715
3716<div>
3717<div>
3718<pre class="source">abs(&quot;123&quot;);
3719</pre></div></div>
3720
3721<p>Since function <tt>abs</tt> can only process numeric input values, we will get a type error as follows:</p>
3722
3723<div>
3724<div>
3725<pre class="source">Error: Type mismatch: function abs expects its 1st input parameter to be of type tinyint, smallint, integer, bigint, float or double, but the actual input type is string
3726</pre></div></div>
3727</div></div></div></div>
3728<div class="section">
3729<h2><a name="Resource_Errors"></a><a name="Resource_errors" id="Resource_errors">Resource Errors</a></h2>
3730<p>A query can potentially exhaust system resources, such as the number of open files and disk spaces. For instance, the following two resource errors could be potentially be seen when running the system:</p>
3731
3732<div>
3733<div>
3734<pre class="source">Error: no space left on device
3735Error: too many open files
3736</pre></div></div>
3737
3738<p>The &#x201c;no space left on device&#x201d; issue usually can be fixed by cleaning up disk spaces and reserving more disk spaces for the system. The &#x201c;too many open files&#x201d; issue usually can be fixed by a system administrator, following the instructions <a class="externalLink" href="https://easyengine.io/tutorials/linux/increase-open-files-limit/">here</a>.</p><!--
3739 ! Licensed to the Apache Software Foundation (ASF) under one
3740 ! or more contributor license agreements. See the NOTICE file
3741 ! distributed with this work for additional information
3742 ! regarding copyright ownership. The ASF licenses this file
3743 ! to you under the Apache License, Version 2.0 (the
3744 ! "License"); you may not use this file except in compliance
3745 ! with the License. You may obtain a copy of the License at
3746 !
3747 ! http://www.apache.org/licenses/LICENSE-2.0
3748 !
3749 ! Unless required by applicable law or agreed to in writing,
3750 ! software distributed under the License is distributed on an
3751 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3752 ! KIND, either express or implied. See the License for the
3753 ! specific language governing permissions and limitations
3754 ! under the License.
3755 !-->
3756
3757<h1><a name="DDL_and_DML_statements" id="DDL_and_DML_statements">5. DDL and DML statements</a></h1>
3758
3759<div>
3760<div>
3761<pre class="source">Statement ::= ( ( SingleStatement )? ( &quot;;&quot; )+ )* &lt;EOF&gt;
3762SingleStatement ::= DatabaseDeclaration
3763 | FunctionDeclaration
3764 | CreateStatement
3765 | DropStatement
3766 | LoadStatement
3767 | SetStatement
3768 | InsertStatement
3769 | DeleteStatement
3770 | Query
3771</pre></div></div>
3772
3773<p>In addition to queries, an implementation of the query language needs to support statements for data definition and manipulation purposes as well as controlling the context to be used in evaluating query expressions. This section details the DDL and DML statements supported in the query language as realized today in Apache AsterixDB.</p><!--
3774 ! Licensed to the Apache Software Foundation (ASF) under one
3775 ! or more contributor license agreements. See the NOTICE file
3776 ! distributed with this work for additional information
3777 ! regarding copyright ownership. The ASF licenses this file
3778 ! to you under the Apache License, Version 2.0 (the
3779 ! "License"); you may not use this file except in compliance
3780 ! with the License. You may obtain a copy of the License at
3781 !
3782 ! http://www.apache.org/licenses/LICENSE-2.0
3783 !
3784 ! Unless required by applicable law or agreed to in writing,
3785 ! software distributed under the License is distributed on an
3786 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3787 ! KIND, either express or implied. See the License for the
3788 ! specific language governing permissions and limitations
3789 ! under the License.
3790 !-->
3791</div>
3792<div class="section">
3793<h2><a name="Lifecycle_Management_Statements"></a><a name="Lifecycle_management_statements" id="Lifecycle_management_statements">Lifecycle Management Statements</a></h2>
3794
3795<div>
3796<div>
3797<pre class="source">CreateStatement ::= &quot;CREATE&quot; ( DatabaseSpecification
3798 | TypeSpecification
3799 | DatasetSpecification
3800 | IndexSpecification
3801 | SynonymSpecification
3802 | FunctionSpecification )
3803
3804QualifiedName ::= Identifier ( &quot;.&quot; Identifier )?
3805DoubleQualifiedName ::= Identifier &quot;.&quot; Identifier ( &quot;.&quot; Identifier )?
3806</pre></div></div>
3807
3808<p>The CREATE statement is used for creating dataverses as well as other persistent artifacts in a dataverse. It can be used to create new dataverses, datatypes, datasets, indexes, and user-defined query functions.</p>
3809<div class="section">
3810<h3><a name="Dataverses" id="Dataverses"> Dataverses</a></h3>
3811
3812<div>
3813<div>
3814<pre class="source">DatabaseSpecification ::= &quot;DATAVERSE&quot; Identifier IfNotExists
3815</pre></div></div>
3816
3817<p>The CREATE DATAVERSE statement is used to create new dataverses. To ease the authoring of reusable query scripts, an optional IF NOT EXISTS clause is included to allow creation to be requested either unconditionally or only if the dataverse does not already exist. If this clause is absent, an error is returned if a dataverse with the indicated name already exists.</p>
3818<p>The following example creates a new dataverse named TinySocial if one does not already exist.</p>
3819<div class="section">
3820<div class="section">
3821<h5><a name="Example"></a>Example</h5>
3822
3823<div>
3824<div>
3825<pre class="source">CREATE DATAVERSE TinySocial IF NOT EXISTS;
3826</pre></div></div>
3827</div></div></div>
3828<div class="section">
3829<h3><a name="Types" id="Types"> Types</a></h3>
3830
3831<div>
3832<div>
3833<pre class="source">TypeSpecification ::= &quot;TYPE&quot; FunctionOrTypeName IfNotExists &quot;AS&quot; ObjectTypeDef
3834FunctionOrTypeName ::= QualifiedName
3835IfNotExists ::= ( &lt;IF&gt; &lt;NOT&gt; &lt;EXISTS&gt; )?
3836TypeExpr ::= ObjectTypeDef | TypeReference | ArrayTypeDef | MultisetTypeDef
3837ObjectTypeDef ::= ( &lt;CLOSED&gt; | &lt;OPEN&gt; )? &quot;{&quot; ( ObjectField ( &quot;,&quot; ObjectField )* )? &quot;}&quot;
3838ObjectField ::= Identifier &quot;:&quot; ( TypeExpr ) ( &quot;?&quot; )?
3839NestedField ::= Identifier ( &quot;.&quot; Identifier )*
3840IndexField ::= NestedField ( &quot;:&quot; TypeReference )?
3841TypeReference ::= Identifier
3842ArrayTypeDef ::= &quot;[&quot; ( TypeExpr ) &quot;]&quot;
3843MultisetTypeDef ::= &quot;{{&quot; ( TypeExpr ) &quot;}}&quot;
3844</pre></div></div>
3845
3846<p>The CREATE TYPE statement is used to create a new named datatype. This type can then be used to create stored collections or utilized when defining one or more other datatypes. Much more information about the data model is available in the <a href="../datamodel.html">data model reference guide</a>. A new type can be a object type, a renaming of another type, an array type, or a multiset type. A object type can be defined as being either open or closed. Instances of a closed object type are not permitted to contain fields other than those specified in the create type statement. Instances of an open object type may carry additional fields, and open is the default for new types if neither option is specified.</p>
3847<p>The following example creates a new object type called GleambookUser type. Since it is defined as (defaulting to) being an open type, instances will be permitted to contain more than what is specified in the type definition. The first four fields are essentially traditional typed name/value pairs (much like SQL fields). The friendIds field is a multiset of integers. The employment field is an array of instances of another named object type, EmploymentType.</p>
3848<div class="section">
3849<div class="section">
3850<h5><a name="Example"></a>Example</h5>
3851
3852<div>
3853<div>
3854<pre class="source">CREATE TYPE GleambookUserType AS {
3855 id: int,
3856 alias: string,
3857 name: string,
3858 userSince: datetime,
3859 friendIds: {{ int }},
3860 employment: [ EmploymentType ]
3861};
3862</pre></div></div>
3863
3864<p>The next example creates a new object type, closed this time, called MyUserTupleType. Instances of this closed type will not be permitted to have extra fields, although the alias field is marked as optional and may thus be NULL or MISSING in legal instances of the type. Note that the type of the id field in the example is UUID. This field type can be used if you want to have this field be an autogenerated-PK field. (Refer to the Datasets section later for more details on such fields.)</p></div>
3865<div class="section">
3866<h5><a name="Example"></a>Example</h5>
3867
3868<div>
3869<div>
3870<pre class="source">CREATE TYPE MyUserTupleType AS CLOSED {
3871 id: uuid,
3872 alias: string?,
3873 name: string
3874};
3875</pre></div></div>
3876</div></div></div>
3877<div class="section">
3878<h3><a name="Datasets" id="Datasets"> Datasets</a></h3>
3879
3880<div>
3881<div>
3882<pre class="source">DatasetSpecification ::= ( &lt;INTERNAL&gt; )? &lt;DATASET&gt; QualifiedName &quot;(&quot; QualifiedName &quot;)&quot; IfNotExists
3883 PrimaryKey ( &lt;ON&gt; Identifier )? ( &lt;HINTS&gt; Properties )?
3884 ( &quot;USING&quot; &quot;COMPACTION&quot; &quot;POLICY&quot; CompactionPolicy ( Configuration )? )?
3885 ( &lt;WITH&gt; &lt;FILTER&gt; &lt;ON&gt; Identifier )?
3886 |
3887 &lt;EXTERNAL&gt; &lt;DATASET&gt; QualifiedName &quot;(&quot; QualifiedName &quot;)&quot; IfNotExists &lt;USING&gt; AdapterName
3888 Configuration ( &lt;HINTS&gt; Properties )?
3889 ( &lt;USING&gt; &lt;COMPACTION&gt; &lt;POLICY&gt; CompactionPolicy ( Configuration )? )?
3890AdapterName ::= Identifier
3891Configuration ::= &quot;(&quot; ( KeyValuePair ( &quot;,&quot; KeyValuePair )* )? &quot;)&quot;
3892KeyValuePair ::= &quot;(&quot; StringLiteral &quot;=&quot; StringLiteral &quot;)&quot;
3893Properties ::= ( &quot;(&quot; Property ( &quot;,&quot; Property )* &quot;)&quot; )?
3894Property ::= Identifier &quot;=&quot; ( StringLiteral | IntegerLiteral )
3895FunctionSignature ::= FunctionOrTypeName &quot;@&quot; IntegerLiteral
3896PrimaryKey ::= &lt;PRIMARY&gt; &lt;KEY&gt; NestedField ( &quot;,&quot; NestedField )* ( &lt;AUTOGENERATED&gt; )?
3897CompactionPolicy ::= Identifier
3898</pre></div></div>
3899
3900<p>The CREATE DATASET statement is used to create a new dataset. Datasets are named, multisets of object type instances; they are where data lives persistently and are the usual targets for queries. Datasets are typed, and the system ensures that their contents conform to their type definitions. An Internal dataset (the default kind) is a dataset whose content lives within and is managed by the system. It is required to have a specified unique primary key field which uniquely identifies the contained objects. (The primary key is also used in secondary indexes to identify the indexed primary data objects.)</p>
3901<p>Internal datasets contain several advanced options that can be specified when appropriate. One such option is that random primary key (UUID) values can be auto-generated by declaring the field to be UUID and putting &#x201c;AUTOGENERATED&#x201d; after the &#x201c;PRIMARY KEY&#x201d; identifier. In this case, unlike other non-optional fields, a value for the auto-generated PK field should not be provided at insertion time by the user since each object&#x2019;s primary key field value will be auto-generated by the system.</p>
3902<p>Another advanced option, when creating an Internal dataset, is to specify the merge policy to control which of the underlying LSM storage components to be merged. (The system supports Log-Structured Merge tree based physical storage for Internal datasets.) Currently the system supports four different component merging policies that can be chosen per dataset: no-merge, constant, prefix, and correlated-prefix. The no-merge policy simply never merges disk components. The constant policy merges disk components when the number of components reaches a constant number k that can be configured by the user. The prefix policy relies on both component sizes and the number of components to decide which components to merge. It works by first trying to identify the smallest ordered (oldest to newest) sequence of components such that the sequence does not contain a single component that exceeds some threshold size M and that either the sum of the component&#x2019;s sizes exceeds M or the number of components in the sequence exceeds another threshold C. If such a sequence exists, the components in the sequence are merged together to form a single component. Finally, the correlated-prefix policy is similar to the prefix policy, but it delegates the decision of merging the disk components of all the indexes in a dataset to the primary index. When the correlated-prefix policy decides that the primary index needs to be merged (using the same decision criteria as for the prefix policy), then it will issue successive merge requests on behalf of all other indexes associated with the same dataset. The system&#x2019;s default policy is the prefix policy except when there is a filter on a dataset, where the preferred policy for filters is the correlated-prefix.</p>
3903<p>Another advanced option shown in the syntax above, related to performance and mentioned above, is that a <b>filter</b> can optionally be created on a field to further optimize range queries with predicates on the filter&#x2019;s field. Filters allow some range queries to avoid searching all LSM components when the query conditions match the filter. (Refer to <a href="../filters.html">Filter-Based LSM Index Acceleration</a> for more information about filters.)</p>
3904<p>An External dataset, in contrast to an Internal dataset, has data stored outside of the system&#x2019;s control. Files living in HDFS or in the local filesystem(s) of a cluster&#x2019;s nodes are currently supported. External dataset support allows queries to treat foreign data as though it were stored in the system, making it possible to query &#x201c;legacy&#x201d; file data (for example, Hive data) without having to physically import it. When defining an External dataset, an appropriate adapter type must be selected for the desired external data. (See the <a href="../externaldata.html">Guide to External Data</a> for more information on the available adapters.)</p>
3905<p>The following example creates an Internal dataset for storing FacefookUserType objects. It specifies that their id field is their primary key.</p>
3906<div class="section">
3907<h4><a name="Example"></a>Example</h4>
3908
3909<div>
3910<div>
3911<pre class="source">CREATE INTERNAL DATASET GleambookUsers(GleambookUserType) PRIMARY KEY id;
3912</pre></div></div>
3913
3914<p>The next example creates another Internal dataset (the default kind when no dataset kind is specified) for storing MyUserTupleType objects. It specifies that the id field should be used as the primary key for the dataset. It also specifies that the id field is an auto-generated field, meaning that a randomly generated UUID value should be assigned to each incoming object by the system. (A user should therefore not attempt to provide a value for this field.) Note that the id field&#x2019;s declared type must be UUID in this case.</p></div>
3915<div class="section">
3916<h4><a name="Example"></a>Example</h4>
3917
3918<div>
3919<div>
3920<pre class="source">CREATE DATASET MyUsers(MyUserTupleType) PRIMARY KEY id AUTOGENERATED;
3921</pre></div></div>
3922
3923<p>The next example creates an External dataset for querying LineItemType objects. The choice of the <tt>hdfs</tt> adapter means that this dataset&#x2019;s data actually resides in HDFS. The example CREATE statement also provides parameters used by the hdfs adapter: the URL and path needed to locate the data in HDFS and a description of the data format.</p></div>
3924<div class="section">
3925<h4><a name="Example"></a>Example</h4>
3926
3927<div>
3928<div>
3929<pre class="source">CREATE EXTERNAL DATASET LineItem(LineItemType) USING hdfs (
3930 (&quot;hdfs&quot;=&quot;hdfs://HOST:PORT&quot;),
3931 (&quot;path&quot;=&quot;HDFS_PATH&quot;),
3932 (&quot;input-format&quot;=&quot;text-input-format&quot;),
3933 (&quot;format&quot;=&quot;delimited-text&quot;),
3934 (&quot;delimiter&quot;=&quot;|&quot;));
3935</pre></div></div>
3936</div></div>
3937<div class="section">
3938<h3><a name="Indices" id="Indices">Indices</a></h3>
3939
3940<div>
3941<div>
3942<pre class="source">IndexSpecification ::= (&lt;INDEX&gt; Identifier IfNotExists &lt;ON&gt; QualifiedName
3943 &quot;(&quot; ( IndexField ) ( &quot;,&quot; IndexField )* &quot;)&quot; (&lt;TYPE&gt; IndexType)? (&lt;ENFORCED&gt;)?)
3944 |
3945 &lt;PRIMARY&gt; &lt;INDEX&gt; Identifier? IfNotExists &lt;ON&gt; QualifiedName (&lt;TYPE&gt; &lt;BTREE&gt;)?
3946IndexType ::= &lt;BTREE&gt; | &lt;RTREE&gt; | &lt;KEYWORD&gt; | &lt;NGRAM&gt; &quot;(&quot; IntegerLiteral &quot;)&quot;
3947</pre></div></div>
3948
3949<p>The CREATE INDEX statement creates a secondary index on one or more fields of a specified dataset. Supported index types include <tt>BTREE</tt> for totally ordered datatypes, <tt>RTREE</tt> for spatial data, and <tt>KEYWORD</tt> and <tt>NGRAM</tt> for textual (string) data. An index can be created on a nested field (or fields) by providing a valid path expression as an index field identifier.</p>
3950<p>An indexed field is not required to be part of the datatype associated with a dataset if the dataset&#x2019;s datatype is declared as open <b>and</b> if the field&#x2019;s type is provided along with its name and if the <tt>ENFORCED</tt> keyword is specified at the end of the index definition. <tt>ENFORCING</tt> an open field introduces a check that makes sure that the actual type of the indexed field (if the optional field exists in the object) always matches this specified (open) field type.</p>
3951<p>The following example creates a btree index called gbAuthorIdx on the authorId field of the GleambookMessages dataset. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the author-id field.</p>
3952<div class="section">
3953<h4><a name="Example"></a>Example</h4>
3954
3955<div>
3956<div>
3957<pre class="source">CREATE INDEX gbAuthorIdx ON GleambookMessages(authorId) TYPE BTREE;
3958</pre></div></div>
3959
3960<p>The following example creates an open btree index called gbSendTimeIdx on the (non-declared) <tt>sendTime</tt> field of the GleambookMessages dataset having datetime type. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the <tt>sendTime</tt> field. The index is enforced so that records that do not have the <tt>sendTime</tt> field or have a mismatched type on the field cannot be inserted into the dataset.</p></div>
3961<div class="section">
3962<h4><a name="Example"></a>Example</h4>
3963
3964<div>
3965<div>
3966<pre class="source">CREATE INDEX gbSendTimeIdx ON GleambookMessages(sendTime: datetime?) TYPE BTREE ENFORCED;
3967</pre></div></div>
3968
3969<p>The following example creates an open btree index called gbReadTimeIdx on the (non-declared) <tt>readTime</tt> field of the GleambookMessages dataset having datetime type. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the <tt>readTime</tt> field. The index is not enforced so that records that do not have the <tt>readTime</tt> field or have a mismatched type on the field can still be inserted into the dataset.</p></div>
3970<div class="section">
3971<h4><a name="Example"></a>Example</h4>
3972
3973<div>
3974<div>
3975<pre class="source">CREATE INDEX gbReadTimeIdx ON GleambookMessages(readTime: datetime?);
3976</pre></div></div>
3977
3978<p>The following example creates a btree index called crpUserScrNameIdx on screenName, a nested field residing within a object-valued user field in the ChirpMessages dataset. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the nested screenName field. Such nested fields must be singular, i.e., one cannot index through (or on) an array-valued field.</p></div>
3979<div class="section">
3980<h4><a name="Example"></a>Example</h4>
3981
3982<div>
3983<div>
3984<pre class="source">CREATE INDEX crpUserScrNameIdx ON ChirpMessages(user.screenName) TYPE BTREE;
3985</pre></div></div>
3986
3987<p>The following example creates an rtree index called gbSenderLocIdx on the sender-location field of the GleambookMessages dataset. This index can be useful for accelerating queries that use the <a href="functions.html#spatial-intersect"><tt>spatial-intersect</tt> function</a> in a predicate involving the sender-location field.</p></div>
3988<div class="section">
3989<h4><a name="Example"></a>Example</h4>
3990
3991<div>
3992<div>
3993<pre class="source">CREATE INDEX gbSenderLocIndex ON GleambookMessages(&quot;sender-location&quot;) TYPE RTREE;
3994</pre></div></div>
3995
3996<p>The following example creates a 3-gram index called fbUserIdx on the name field of the GleambookUsers dataset. This index can be used to accelerate some similarity or substring maching queries on the name field. For details refer to the document on <a href="similarity.html#NGram_Index">similarity queries</a>.</p></div>
3997<div class="section">
3998<h4><a name="Example"></a>Example</h4>
3999
4000<div>
4001<div>
4002<pre class="source">CREATE INDEX fbUserIdx ON GleambookUsers(name) TYPE NGRAM(3);
4003</pre></div></div>
4004
4005<p>The following example creates a keyword index called fbMessageIdx on the message field of the GleambookMessages dataset. This keyword index can be used to optimize queries with token-based similarity predicates on the message field. For details refer to the document on <a href="similarity.html#Keyword_Index">similarity queries</a>.</p></div>
4006<div class="section">
4007<h4><a name="Example"></a>Example</h4>
4008
4009<div>
4010<div>
4011<pre class="source">CREATE INDEX fbMessageIdx ON GleambookMessages(message) TYPE KEYWORD;
4012</pre></div></div>
4013
4014<p>The following example creates a special secondary index which holds only the primary keys. This index is useful for speeding up aggregation queries which involve only primary keys. The name of the index is optional. If the name is not specified, the system will generate one. When the user would like to drop this index, the metadata can be queried to find the system-generated name.</p></div>
4015<div class="section">
4016<h4><a name="Example"></a>Example</h4>
4017
4018<div>
4019<div>
4020<pre class="source">CREATE PRIMARY INDEX gb_pk_idx ON GleambookMessages;
4021</pre></div></div>
4022
4023<p>An example query that can be accelerated using the primary-key index:</p>
4024
4025<div>
4026<div>
4027<pre class="source">SELECT COUNT(*) FROM GleambookMessages;
4028</pre></div></div>
4029
4030<p>To look up the the above primary-key index, issue the following query:</p>
4031
4032<div>
4033<div>
4034<pre class="source">SELECT VALUE i
4035FROM Metadata.`Index` i
4036WHERE i.DataverseName = &quot;TinySocial&quot; AND i.DatasetName = &quot;GleambookMessages&quot;;
4037</pre></div></div>
4038
4039<p>The query returns:</p>
4040
4041<div>
4042<div>
4043<pre class="source">[ { &quot;DataverseName&quot;: &quot;TinySocial&quot;, &quot;DatasetName&quot;: &quot;GleambookMessages&quot;, &quot;IndexName&quot;: &quot;GleambookMessages&quot;, &quot;IndexStructure&quot;: &quot;BTREE&quot;, &quot;SearchKey&quot;: [ [ &quot;messageId&quot; ] ], &quot;IsPrimary&quot;: true, &quot;Timestamp&quot;: &quot;Wed Nov 07 17:25:11 PST 2018&quot;, &quot;PendingOp&quot;: 0 }
4044, { &quot;DataverseName&quot;: &quot;TinySocial&quot;, &quot;DatasetName&quot;: &quot;GleambookMessages&quot;, &quot;IndexName&quot;: &quot;gb_pk_idx&quot;, &quot;IndexStructure&quot;: &quot;BTREE&quot;, &quot;SearchKey&quot;: [ ], &quot;IsPrimary&quot;: false, &quot;Timestamp&quot;: &quot;Wed Nov 07 17:25:11 PST 2018&quot;, &quot;PendingOp&quot;: 0 }
4045 ]
4046</pre></div></div>
4047
4048<p>Remember that <tt>CREATE PRIMARY INDEX</tt> creates a secondary index. That is the reason the <tt>IsPrimary</tt> field is false. The primary-key index can be identified by the fact that the <tt>SearchKey</tt> field is empty since it only contains primary key fields.<!--
4049! Licensed to the Apache Software Foundation (ASF) under one
4050! or more contributor license agreements. See the NOTICE file
4051! distributed with this work for additional information
4052! regarding copyright ownership. The ASF licenses this file
4053! to you under the Apache License, Version 2.0 (the
4054! "License"); you may not use this file except in compliance
4055! with the License. You may obtain a copy of the License at
4056!
4057! http://www.apache.org/licenses/LICENSE-2.0
4058!
4059! Unless required by applicable law or agreed to in writing,
4060! software distributed under the License is distributed on an
4061! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
4062! KIND, either express or implied. See the License for the
4063! specific language governing permissions and limitations
4064! under the License.
4065!--></p></div></div>
4066<div class="section">
4067<h3><a name="Functions" id="Functions"> Functions</a></h3>
4068<p>The CREATE FUNCTION statement creates a <b>named</b> function that can then be used and reused in queries. The body of a function can be any query expression involving the function&#x2019;s parameters.</p>
4069
4070<div>
4071<div>
4072<pre class="source">FunctionSpecification ::= &quot;FUNCTION&quot; FunctionOrTypeName IfNotExists ParameterList &quot;{&quot; Expression &quot;}&quot;
4073</pre></div></div>
4074
4075<p>The following is an example of a CREATE FUNCTION statement which is similar to our earlier DECLARE FUNCTION example. It differs from that example in that it results in a function that is persistently registered by name in the specified dataverse (the current dataverse being used, if not otherwise specified).</p>
4076<div class="section">
4077<div class="section">
4078<h5><a name="Example"></a>Example</h5>
4079
4080<div>
4081<div>
4082<pre class="source">CREATE FUNCTION friendInfo(userId) {
4083 (SELECT u.id, u.name, len(u.friendIds) AS friendCount
4084 FROM GleambookUsers u
4085 WHERE u.id = userId)[0]
4086 };
4087</pre></div></div>
4088</div></div></div>
4089<div class="section">
4090<h3><a name="Synonyms" id="Synonyms"> Synonyms</a></h3>
4091
4092<div>
4093<div>
4094<pre class="source">SynonymSpecification ::= &quot;SYNONYM&quot; QualifiedName &quot;FOR&quot; QualifiedName IfNotExists
4095</pre></div></div>
4096
4097<p>The CREATE SYNONYM statement creates a synonym for a given dataset. This synonym may be used used instead of the dataset name in SELECT, INSERT, UPSERT, DELETE, and LOAD statements. The target dataset does not need to exist when the synonym is created.</p>
4098<div class="section">
4099<div class="section">
4100<h5><a name="Example"></a>Example</h5>
4101
4102<div>
4103<div>
4104<pre class="source">CREATE DATASET GleambookUsers(GleambookUserType) PRIMARY KEY id;
4105
4106CREATE SYNONYM GleambookUsersSynonym FOR GleambookUsers;
4107
4108SELECT * FROM GleambookUsersSynonym;
4109</pre></div></div>
4110
4111<p>More information on how synonyms are resolved can be found in the appendix section on Variable Resolution.</p></div></div></div>
4112<div class="section">
4113<h3><a name="Removal" id="Removal"> Removal</a></h3>
4114
4115<div>
4116<div>
4117<pre class="source">DropStatement ::= &quot;DROP&quot; ( &quot;DATAVERSE&quot; Identifier IfExists
4118 | &quot;TYPE&quot; FunctionOrTypeName IfExists
4119 | &quot;DATASET&quot; QualifiedName IfExists
4120 | &quot;INDEX&quot; DoubleQualifiedName IfExists
4121 | &quot;SYNONYM&quot; QualifiedName IfExists
4122 | &quot;FUNCTION&quot; FunctionSignature IfExists )
4123IfExists ::= ( &quot;IF&quot; &quot;EXISTS&quot; )?
4124</pre></div></div>
4125
4126<p>The DROP statement is the inverse of the CREATE statement. It can be used to drop dataverses, datatypes, datasets, indexes, functions, and synonyms.</p>
4127<p>The following examples illustrate some uses of the DROP statement.</p>
4128<div class="section">
4129<div class="section">
4130<h5><a name="Example"></a>Example</h5>
4131
4132<div>
4133<div>
4134<pre class="source">DROP DATASET GleambookUsers IF EXISTS;
4135
4136DROP INDEX GleambookMessages.gbSenderLocIndex;
4137
4138DROP TYPE TinySocial2.GleambookUserType;
4139
4140DROP FUNCTION friendInfo@1;
4141
4142DROP SYNONYM GleambookUsersSynonym;
4143
4144DROP DATAVERSE TinySocial;
4145</pre></div></div>
4146
4147<p>When an artifact is dropped, it will be droppped from the current dataverse if none is specified (see the DROP DATASET example above) or from the specified dataverse (see the DROP TYPE example above) if one is specified by fully qualifying the artifact name in the DROP statement. When specifying an index to drop, the index name must be qualified by the dataset that it indexes. When specifying a function to drop, since the query language allows functions to be overloaded by their number of arguments, the identifying name of the function to be dropped must explicitly include that information. (<tt>friendInfo@1</tt> above denotes the 1-argument function named friendInfo in the current dataverse.)</p></div></div></div>
4148<div class="section">
4149<h3><a name="Load_Statement"></a><a name="Load_statement" id="Load_statement">Load Statement</a></h3>
4150
4151<div>
4152<div>
4153<pre class="source">LoadStatement ::= &lt;LOAD&gt; &lt;DATASET&gt; QualifiedName &lt;USING&gt; AdapterName Configuration ( &lt;PRE-SORTED&gt; )?
4154</pre></div></div>
4155
4156<p>The LOAD statement is used to initially populate a dataset via bulk loading of data from an external file. An appropriate adapter must be selected to handle the nature of the desired external data. The LOAD statement accepts the same adapters and the same parameters as discussed earlier for External datasets. (See the <a href="externaldata.html">guide to external data</a> for more information on the available adapters.) If a dataset has an auto-generated primary key field, the file to be imported should not include that field in it.</p>
4157<p>The target dataset name may be a synonym introduced by CREATE SYNONYM statement.</p>
4158<p>The following example shows how to bulk load the GleambookUsers dataset from an external file containing data that has been prepared in ADM (Asterix Data Model) format.</p>
4159<div class="section">
4160<div class="section">
4161<h5><a name="Example"></a>Example</h5>
4162
4163<div>
4164<div>
4165<pre class="source"> LOAD DATASET GleambookUsers USING localfs
4166 ((&quot;path&quot;=&quot;127.0.0.1:///Users/bignosqlfan/tinysocialnew/gbu.adm&quot;),(&quot;format&quot;=&quot;adm&quot;));
4167</pre></div></div>
4168<!--
4169 ! Licensed to the Apache Software Foundation (ASF) under one
4170 ! or more contributor license agreements. See the NOTICE file
4171 ! distributed with this work for additional information
4172 ! regarding copyright ownership. The ASF licenses this file
4173 ! to you under the Apache License, Version 2.0 (the
4174 ! "License"); you may not use this file except in compliance
4175 ! with the License. You may obtain a copy of the License at
4176 !
4177 ! http://www.apache.org/licenses/LICENSE-2.0
4178 !
4179 ! Unless required by applicable law or agreed to in writing,
4180 ! software distributed under the License is distributed on an
4181 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
4182 ! KIND, either express or implied. See the License for the
4183 ! specific language governing permissions and limitations
4184 ! under the License.
4185 !-->
4186</div></div></div></div>
4187<div class="section">
4188<h2><a name="Modification_statements" id="Modification_statements">Modification statements</a></h2>
4189<div class="section">
4190<h3><a name="INSERTs"></a><a name="Inserts" id="Inserts">INSERTs</a></h3>
4191
4192<div>
4193<div>
4194<pre class="source">InsertStatement ::= &lt;INSERT&gt; &lt;INTO&gt; QualifiedName Query
4195</pre></div></div>
4196
4197<p>The INSERT statement is used to insert new data into a dataset. The data to be inserted comes from a query expression. This expression can be as simple as a constant expression, or in general it can be any legal query. In case the dataset has an auto-generated primary key, when performing an INSERT operation, the system allows the user to manually add the auto-generated key field in the INSERT statement, or skip that field and the system will automatically generate it and add it. However, it is important to note that if the a record already exists in the dataset with the auto-generated key provided by the user, then that operation is going to fail. As a general rule, insertion will fail if the dataset already has data with the primary key value(s) being inserted.</p>
4198<p>Inserts are processed transactionally by the system. The transactional scope of each insert transaction is the insertion of a single object plus its affiliated secondary index entries (if any). If the query part of an insert returns a single object, then the INSERT statement will be a single, atomic transaction. If the query part returns multiple objects, each object being inserted will be treated as a separate tranaction.</p>
4199<p>The target dataset name may be a synonym introduced by CREATE SYNONYM statement.</p>
4200<p>The following example illustrates a query-based insertion.</p>
4201<div class="section">
4202<div class="section">
4203<h5><a name="Example"></a>Example</h5>
4204
4205<div>
4206<div>
4207<pre class="source">INSERT INTO UsersCopy (SELECT VALUE user FROM GleambookUsers user)
4208</pre></div></div>
4209</div></div></div>
4210<div class="section">
4211<h3><a name="UPSERTs"></a><a name="Upserts" id="Upserts">UPSERTs</a></h3>
4212
4213<div>
4214<div>
4215<pre class="source">UpsertStatement ::= &lt;UPSERT&gt; &lt;INTO&gt; QualifiedName Query
4216</pre></div></div>
4217
4218<p>The UPSERT statement syntactically mirrors the INSERT statement discussed above. The difference lies in its semantics, which for UPSERT are &#x201c;add or replace&#x201d; instead of the INSERT &#x201c;add if not present, else error&#x201d; semantics. Whereas an INSERT can fail if another object already exists with the specified key, the analogous UPSERT will replace the previous object&#x2019;s value with that of the new object in such cases. Like the INSERT statement, the system allows the user to manually provide the auto-generated key for datasets with an auto-generated key as its primary key. This operation will insert the record if no record with that key already exists, but if a record with the key already exists, then the operation will be converted to a replace/update operation.</p>
4219<p>The target dataset name may be a synonym introduced by CREATE SYNONYM statement.</p>
4220<p>The following example illustrates a query-based upsert operation.</p>
4221<div class="section">
4222<div class="section">
4223<h5><a name="Example"></a>Example</h5>
4224
4225<div>
4226<div>
4227<pre class="source">UPSERT INTO UsersCopy (SELECT VALUE user FROM GleambookUsers user)
4228</pre></div></div>
4229
4230<p>*Editor&#x2019;s note: Upserts currently work in AQL but are not yet enabled (at the moment) in the current query language.</p></div></div></div>
4231<div class="section">
4232<h3><a name="DELETEs"></a><a name="Deletes" id="Deletes">DELETEs</a></h3>
4233
4234<div>
4235<div>
4236<pre class="source">DeleteStatement ::= &lt;DELETE&gt; &lt;FROM&gt; QualifiedName ( ( &lt;AS&gt; )? Variable )? ( &lt;WHERE&gt; Expression )?
4237</pre></div></div>
4238
4239<p>The DELETE statement is used to delete data from a target dataset. The data to be deleted is identified by a boolean expression involving the variable bound to the target dataset in the DELETE statement.</p>
4240<p>Deletes are processed transactionally by the system. The transactional scope of each delete transaction is the deletion of a single object plus its affiliated secondary index entries (if any). If the boolean expression for a delete identifies a single object, then the DELETE statement itself will be a single, atomic transaction. If the expression identifies multiple objects, then each object deleted will be handled as a separate transaction.</p>
4241<p>The target dataset name may be a synonym introduced by CREATE SYNONYM statement.</p>
4242<p>The following examples illustrate single-object deletions.</p>
4243<div class="section">
4244<div class="section">
4245<h5><a name="Example"></a>Example</h5>
4246
4247<div>
4248<div>
4249<pre class="source">DELETE FROM GleambookUsers user WHERE user.id = 8;
4250</pre></div></div>
4251</div>
4252<div class="section">
4253<h5><a name="Example"></a>Example</h5>
4254
4255<div>
4256<div>
4257<pre class="source">DELETE FROM GleambookUsers WHERE id = 5;
4258</pre></div></div>
4259<!--
4260 ! Licensed to the Apache Software Foundation (ASF) under one
4261 ! or more contributor license agreements. See the NOTICE file
4262 ! distributed with this work for additional information
4263 ! regarding copyright ownership. The ASF licenses this file
4264 ! to you under the Apache License, Version 2.0 (the
4265 ! "License"); you may not use this file except in compliance
4266 ! with the License. You may obtain a copy of the License at
4267 !
4268 ! http://www.apache.org/licenses/LICENSE-2.0
4269 !
4270 ! Unless required by applicable law or agreed to in writing,
4271 ! software distributed under the License is distributed on an
4272 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
4273 ! KIND, either express or implied. See the License for the
4274 ! specific language governing permissions and limitations
4275 ! under the License.
4276 !-->
4277
4278<h1><a name="Reserved_keywords" id="Reserved_keywords">Appendix 1. Reserved keywords</a></h1><!--
4279 ! Licensed to the Apache Software Foundation (ASF) under one
4280 ! or more contributor license agreements. See the NOTICE file
4281 ! distributed with this work for additional information
4282 ! regarding copyright ownership. The ASF licenses this file
4283 ! to you under the Apache License, Version 2.0 (the
4284 ! "License"); you may not use this file except in compliance
4285 ! with the License. You may obtain a copy of the License at
4286 !
4287 ! http://www.apache.org/licenses/LICENSE-2.0
4288 !
4289 ! Unless required by applicable law or agreed to in writing,
4290 ! software distributed under the License is distributed on an
4291 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
4292 ! KIND, either express or implied. See the License for the
4293 ! specific language governing permissions and limitations
4294 ! under the License.
4295 !-->
4296
4297<p>All reserved keywords are listed in the following table:</p>
4298<table border="0" class="table table-striped">
4299<thead>
4300
4301<tr class="a">
4302<th> </th>
4303<th> </th>
4304<th> </th>
4305<th> </th>
4306<th> </th>
4307<th> </th></tr>
4308</thead><tbody>
4309
4310<tr class="b">
4311<td> AND </td>
4312<td> ANY </td>
4313<td> APPLY </td>
4314<td> AS </td>
4315<td> ASC </td>
4316<td> AT </td></tr>
4317<tr class="a">
4318<td> AUTOGENERATED </td>
4319<td> BETWEEN </td>
4320<td> BTREE </td>
4321<td> BY </td>
4322<td> CASE </td>
4323<td> CLOSED </td></tr>
4324<tr class="b">
4325<td> CREATE </td>
4326<td> COMPACTION </td>
4327<td> COMPACT </td>
4328<td> CONNECT </td>
4329<td> CORRELATE </td>
4330<td> DATASET </td></tr>
4331<tr class="a">
4332<td> COLLECTION </td>
4333<td> DATAVERSE </td>
4334<td> DECLARE </td>
4335<td> DEFINITION </td>
4336<td> DECLARE </td>
4337<td> DEFINITION </td></tr>
4338<tr class="b">
4339<td> DELETE </td>
4340<td> DESC </td>
4341<td> DISCONNECT </td>
4342<td> DISTINCT </td>
4343<td> DROP </td>
4344<td> ELEMENT </td></tr>
4345<tr class="a">
4346<td> ELEMENT </td>
4347<td> EXPLAIN </td>
4348<td> ELSE </td>
4349<td> ENFORCED </td>
4350<td> END </td>
4351<td> EVERY </td></tr>
4352<tr class="b">
4353<td> EXCEPT </td>
4354<td> EXIST </td>
4355<td> EXTERNAL </td>
4356<td> FEED </td>
4357<td> FILTER </td>
4358<td> FLATTEN </td></tr>
4359<tr class="a">
4360<td> FOR </td>
4361<td> FROM </td>
4362<td> FULL </td>
4363<td> FUNCTION </td>
4364<td> GROUP </td>
4365<td> HAVING </td></tr>
4366<tr class="b">
4367<td> HINTS </td>
4368<td> IF </td>
4369<td> INTO </td>
4370<td> IN </td>
4371<td> INDEX </td>
4372<td> INGESTION </td></tr>
4373<tr class="a">
4374<td> INNER </td>
4375<td> INSERT </td>
4376<td> INTERNAL </td>
4377<td> INTERSECT </td>
4378<td> IS </td>
4379<td> JOIN </td></tr>
4380<tr class="b">
4381<td> KEYWORD </td>
4382<td> LEFT </td>
4383<td> LETTING </td>
4384<td> LET </td>
4385<td> LIKE </td>
4386<td> LIMIT </td></tr>
4387<tr class="a">
4388<td> LOAD </td>
4389<td> NODEGROUP </td>
4390<td> NGRAM </td>
4391<td> NOT </td>
4392<td> OFFSET </td>
4393<td> ON </td></tr>
4394<tr class="b">
4395<td> OPEN </td>
4396<td> OR </td>
4397<td> ORDER </td>
4398<td> OUTER </td>
4399<td> OUTPUT </td>
4400<td> OVER </td></tr>
4401<tr class="a">
4402<td> PATH </td>
4403<td> POLICY </td>
4404<td> PRE-SORTED </td>
4405<td> PRIMARY </td>
4406<td> RAW </td>
4407<td> REFRESH </td></tr>
4408<tr class="b">
4409<td> RETURN </td>
4410<td> RTREE </td>
4411<td> RUN </td>
4412<td> SATISFIES </td>
4413<td> SECONDARY </td>
4414<td> SELECT </td></tr>
4415<tr class="a">
4416<td> SET </td>
4417<td> SOME </td>
4418<td> TEMPORARY </td>
4419<td> THEN </td>
4420<td> TYPE </td>
4421<td> UNKNOWN </td></tr>
4422<tr class="b">
4423<td> UNNEST </td>
4424<td> UPDATE </td>
4425<td> USE </td>
4426<td> USING </td>
4427<td> VALUE </td>
4428<td> WHEN </td></tr>
4429<tr class="a">
4430<td> WHERE </td>
4431<td> WITH </td>
4432<td> WRITE </td>
4433<td> </td>
4434<td> </td>
4435<td> </td></tr>
4436</tbody>
4437</table><!--
4438 ! Licensed to the Apache Software Foundation (ASF) under one
4439 ! or more contributor license agreements. See the NOTICE file
4440 ! distributed with this work for additional information
4441 ! regarding copyright ownership. The ASF licenses this file
4442 ! to you under the Apache License, Version 2.0 (the
4443 ! "License"); you may not use this file except in compliance
4444 ! with the License. You may obtain a copy of the License at
4445 !
4446 ! http://www.apache.org/licenses/LICENSE-2.0
4447 !
4448 ! Unless required by applicable law or agreed to in writing,
4449 ! software distributed under the License is distributed on an
4450 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
4451 ! KIND, either express or implied. See the License for the
4452 ! specific language governing permissions and limitations
4453 ! under the License.
4454 !-->
4455</div></div></div></div>
4456<div class="section">
4457<h2><a name="Appendix_2._Performance_Tuning"></a><a name="Performance_tuning" id="Performance_tuning">Appendix 2. Performance Tuning</a></h2><!--
4458 ! Licensed to the Apache Software Foundation (ASF) under one
4459 ! or more contributor license agreements. See the NOTICE file
4460 ! distributed with this work for additional information
4461 ! regarding copyright ownership. The ASF licenses this file
4462 ! to you under the Apache License, Version 2.0 (the
4463 ! "License"); you may not use this file except in compliance
4464 ! with the License. You may obtain a copy of the License at
4465 !
4466 ! http://www.apache.org/licenses/LICENSE-2.0
4467 !
4468 ! Unless required by applicable law or agreed to in writing,
4469 ! software distributed under the License is distributed on an
4470 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
4471 ! KIND, either express or implied. See the License for the
4472 ! specific language governing permissions and limitations
4473 ! under the License.
4474 !-->
4475
4476<p>The SET statement can be used to override some cluster-wide configuration parameters for a specific request:</p>
4477
4478<div>
4479<div>
4480<pre class="source">SET &lt;IDENTIFIER&gt; &lt;STRING_LITERAL&gt;
4481</pre></div></div>
4482
4483<p>As parameter identifiers are qualified names (containing a &#x2018;.&#x2019;) they have to be escaped using backticks (``). Note that changing query parameters will not affect query correctness but only impact performance characteristics, such as response time and throughput.</p></div>
4484<div class="section">
4485<h2><a name="Parallelism_Parameter"></a><a name="Parallelism_parameter" id="Parallelism_parameter">Parallelism Parameter</a></h2>
4486<p>The system can execute each request using multiple cores on multiple machines (a.k.a., partitioned parallelism) in a cluster. A user can manually specify the maximum execution parallelism for a request to scale it up and down using the following parameter:</p>
4487<ul>
4488
4489<li><b>compiler.parallelism</b>: the maximum number of CPU cores can be used to process a query. There are three cases of the value <i>p</i> for compiler.parallelism:
4490<ul>
4491
4492<li>
4493
4494<p><i>p</i> &lt; 0 or <i>p</i> &gt; the total number of cores in a cluster: the system will use all available cores in the cluster;</p>
4495</li>
4496<li>
4497
4498<p><i>p</i> = 0 (the default): the system will use the storage parallelism (the number of partitions of stored datasets) as the maximum parallelism for query processing;</p>
4499</li>
4500<li>
4501
4502<p>all other cases: the system will use the user-specified number as the maximum number of CPU cores to use for executing the query.</p>
4503</li>
4504</ul>
4505</li>
4506</ul>
4507<div class="section">
4508<div class="section">
4509<div class="section">
4510<h5><a name="Example"></a>Example</h5>
4511
4512<div>
4513<div>
4514<pre class="source">SET `compiler.parallelism` &quot;16&quot;;
4515
4516SELECT u.name AS uname, m.message AS message
4517FROM GleambookUsers u JOIN GleambookMessages m ON m.authorId = u.id;
4518</pre></div></div>
4519</div></div></div></div>
4520<div class="section">
4521<h2><a name="Memory_Parameters"></a><a name="Memory_parameters" id="Memory_parameters">Memory Parameters</a></h2>
4522<p>In the system, each blocking runtime operator such as join, group-by and order-by works within a fixed memory budget, and can gracefully spill to disks if the memory budget is smaller than the amount of data they have to hold. A user can manually configure the memory budget of those operators within a query. The supported configurable memory parameters are:</p>
4523<ul>
4524
4525<li>
4526
4527<p><b>compiler.groupmemory</b>: the memory budget that each parallel group-by operator instance can use; 32MB is the default budget.</p>
4528</li>
4529<li>
4530
4531<p><b>compiler.sortmemory</b>: the memory budget that each parallel sort operator instance can use; 32MB is the default budget.</p>
4532</li>
4533<li>
4534
4535<p><b>compiler.joinmemory</b>: the memory budget that each parallel hash join operator instance can use; 32MB is the default budget.</p>
4536</li>
4537<li>
4538
4539<p><b>compiler.windowmemory</b>: the memory budget that each parallel window aggregate operator instance can use; 32MB is the default budget.</p>
4540</li>
4541</ul>
4542<p>For each memory budget value, you can use a 64-bit integer value with a 1024-based binary unit suffix (for example, B, KB, MB, GB). If there is no user-provided suffix, &#x201c;B&#x201d; is the default suffix. See the following examples.</p>
4543<div class="section">
4544<div class="section">
4545<div class="section">
4546<h5><a name="Example"></a>Example</h5>
4547
4548<div>
4549<div>
4550<pre class="source">SET `compiler.groupmemory` &quot;64MB&quot;;
4551
4552SELECT msg.authorId, COUNT(*)
4553FROM GleambookMessages msg
4554GROUP BY msg.authorId;
4555</pre></div></div>
4556</div>
4557<div class="section">
4558<h5><a name="Example"></a>Example</h5>
4559
4560<div>
4561<div>
4562<pre class="source">SET `compiler.sortmemory` &quot;67108864&quot;;
4563
4564SELECT VALUE user
4565FROM GleambookUsers AS user
4566ORDER BY ARRAY_LENGTH(user.friendIds) DESC;
4567</pre></div></div>
4568</div>
4569<div class="section">
4570<h5><a name="Example"></a>Example</h5>
4571
4572<div>
4573<div>
4574<pre class="source">SET `compiler.joinmemory` &quot;132000KB&quot;;
4575
4576SELECT u.name AS uname, m.message AS message
4577FROM GleambookUsers u JOIN GleambookMessages m ON m.authorId = u.id;
4578</pre></div></div>
4579<!--
4580 ! Licensed to the Apache Software Foundation (ASF) under one
4581 ! or more contributor license agreements. See the NOTICE file
4582 ! distributed with this work for additional information
4583 ! regarding copyright ownership. The ASF licenses this file
4584 ! to you under the Apache License, Version 2.0 (the
4585 ! "License"); you may not use this file except in compliance
4586 ! with the License. You may obtain a copy of the License at
4587 !
4588 ! http://www.apache.org/licenses/LICENSE-2.0
4589 !
4590 ! Unless required by applicable law or agreed to in writing,
4591 ! software distributed under the License is distributed on an
4592 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
4593 ! KIND, either express or implied. See the License for the
4594 ! specific language governing permissions and limitations
4595 ! under the License.
4596 !-->
4597</div></div></div></div>
4598<div class="section">
4599<h2><a name="Parallel_Sort_Parameter"></a><a name="Parallel_sort_parameter" id="Parallel_sort_parameter">Parallel Sort Parameter</a></h2>
4600<p>The following parameter enables you to activate or deactivate full parallel sort for order-by operations.</p>
4601<p>When full parallel sort is inactive (<tt>false</tt>), each existing data partition is sorted (in parallel), and then all data partitions are merged into a single node.</p>
4602<p>When full parallel sort is active (<tt>true</tt>), the data is first sampled, and then repartitioned so that each partition contains data that is greater than the previous partition. The data in each partition is then sorted (in parallel), but the sorted partitions are not merged into a single node.</p>
4603<ul>
4604
4605<li><b>compiler.sort.parallel</b>: A boolean specifying whether full parallel sort is active (<tt>true</tt>) or inactive (<tt>false</tt>). The default value is <tt>true</tt>.</li>
4606</ul>
4607<div class="section">
4608<div class="section">
4609<div class="section">
4610<h5><a name="Example"></a>Example</h5>
4611
4612<div>
4613<div>
4614<pre class="source">SET `compiler.sort.parallel` &quot;true&quot;;
4615
4616SELECT VALUE user
4617FROM GleambookUsers AS user
4618ORDER BY ARRAY_LENGTH(user.friendIds) DESC;
4619</pre></div></div>
4620<!--
4621 ! Licensed to the Apache Software Foundation (ASF) under one
4622 ! or more contributor license agreements. See the NOTICE file
4623 ! distributed with this work for additional information
4624 ! regarding copyright ownership. The ASF licenses this file
4625 ! to you under the Apache License, Version 2.0 (the
4626 ! "License"); you may not use this file except in compliance
4627 ! with the License. You may obtain a copy of the License at
4628 !
4629 ! http://www.apache.org/licenses/LICENSE-2.0
4630 !
4631 ! Unless required by applicable law or agreed to in writing,
4632 ! software distributed under the License is distributed on an
4633 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
4634 ! KIND, either express or implied. See the License for the
4635 ! specific language governing permissions and limitations
4636 ! under the License.
4637 !-->
4638</div></div></div></div>
4639<div class="section">
4640<h2><a name="Controlling_Index-Only-Plan_Parameter"></a><a name="Index_Only" id="Index_Only">Controlling Index-Only-Plan Parameter</a></h2>
4641<p>By default, the system tries to build an index-only plan whenever utilizing a secondary index is possible. For example, if a SELECT or JOIN query can utilize an enforced B+Tree or R-Tree index on a field, the optimizer checks whether a secondary-index search alone can generate the result that the query asks for. It mainly checks two conditions: (1) predicates used in WHERE only uses the primary key field and/or secondary key field and (2) the result does not return any other fields. If these two conditions hold, it builds an index-only plan. Since an index-only plan only searches a secondary-index to answer a query, it is faster than a non-index-only plan that needs to search the primary index. However, this index-only plan can be turned off per query by setting the following parameter.</p>
4642<ul>
4643
4644<li><b>compiler.indexonly</b>: if this is set to false, the index-only-plan will not be applied; the default value is true.</li>
4645</ul>
4646<div class="section">
4647<div class="section">
4648<div class="section">
4649<h5><a name="Example"></a>Example</h5>
4650
4651<div>
4652<div>
4653<pre class="source">set `compiler.indexonly` &quot;false&quot;;
4654
4655SELECT m.message AS message
4656FROM GleambookMessages m where m.message = &quot; love product-b its shortcut-menu is awesome:)&quot;;
4657</pre></div></div>
4658<!--
4659 ! Licensed to the Apache Software Foundation (ASF) under one
4660 ! or more contributor license agreements. See the NOTICE file
4661 ! distributed with this work for additional information
4662 ! regarding copyright ownership. The ASF licenses this file
4663 ! to you under the Apache License, Version 2.0 (the
4664 ! "License"); you may not use this file except in compliance
4665 ! with the License. You may obtain a copy of the License at
4666 !
4667 ! http://www.apache.org/licenses/LICENSE-2.0
4668 !
4669 ! Unless required by applicable law or agreed to in writing,
4670 ! software distributed under the License is distributed on an
4671 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
4672 ! KIND, either express or implied. See the License for the
4673 ! specific language governing permissions and limitations
4674 ! under the License.
4675 !-->
4676</div></div></div></div>
4677<div class="section">
4678<h2><a name="Appendix_3._Variable_Bindings_and_Name_Resolution"></a><a name="Variable_bindings_and_name_resolution" id="Variable_bindings_and_name_resolution">Appendix 3. Variable Bindings and Name Resolution</a></h2><!--
4679 ! Licensed to the Apache Software Foundation (ASF) under one
4680 ! or more contributor license agreements. See the NOTICE file
4681 ! distributed with this work for additional information
4682 ! regarding copyright ownership. The ASF licenses this file
4683 ! to you under the Apache License, Version 2.0 (the
4684 ! "License"); you may not use this file except in compliance
4685 ! with the License. You may obtain a copy of the License at
4686 !
4687 ! http://www.apache.org/licenses/LICENSE-2.0
4688 !
4689 ! Unless required by applicable law or agreed to in writing,
4690 ! software distributed under the License is distributed on an
4691 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
4692 ! KIND, either express or implied. See the License for the
4693 ! specific language governing permissions and limitations
4694 ! under the License.
4695 !-->
4696
4697<p>In this Appendix, we&#x2019;ll look at how variables are bound and how names are resolved. Names can appear in every clause of a query. Sometimes a name consists of just a single identifier, e.g., <tt>region</tt> or <tt>revenue</tt>. More often a name will consist of two identifiers separated by a dot, e.g., <tt>customer.address</tt>. Occasionally a name may have more than two identifiers, e.g., <tt>policy.owner.address.zipcode</tt>. <i>Resolving</i> a name means determining exactly what the (possibly multi-part) name refers to. It is necessary to have well-defined rules for how to resolve a name in cases of ambiguity. (In the absence of schemas, such cases arise more commonly, and also differently, than they do in SQL.)</p>
4698<p>The basic job of each clause in a query block is to bind variables. Each clause sees the variables bound by previous clauses and may bind additional variables. Names are always resolved with respect to the variables that are bound (&#x201c;in scope&#x201d;) at the place where the name use in question occurs. It is possible that the name resolution process will fail, which may lead to an empty result or an error message.</p>
4699<p>One important bit of background: When the system is reading a query and resolving its names, it has a list of all the available dataverses and datasets. As a result, it knows whether <tt>a.b</tt> is a valid name for dataset <tt>b</tt> in dataverse <tt>a</tt>. However, the system does not in general have knowledge of the schemas of the data inside the datasets; remember that this is a much more open world. As a result, in general the system cannot know whether any object in a particular dataset will have a field named <tt>c</tt>. These assumptions affect how errors are handled. If you try to access dataset <tt>a.b</tt> and no dataset by that name exists, you will get an error and your query will not run. However, if you try to access a field <tt>c</tt> in a collection of objects, your query will run and return <tt>missing</tt> for each object that doesn&#x2019;t have a field named <tt>c</tt> &#x2013; this is because it&#x2019;s possible that some object (someday) could have such a field.</p></div>
4700<div class="section">
4701<h2><a name="Binding_Variables"></a><a name="Binding_variables" id="Binding_variables">Binding Variables</a></h2>
4702<p>Variables can be bound in the following ways:</p>
4703<ol style="list-style-type: decimal">
4704
4705<li>
4706
4707<p>WITH and LET clauses bind a variable to the result of an expression in a straightforward way</p>
4708<p>Examples:</p>
4709<p><tt>WITH cheap_parts AS (SELECT partno FROM parts WHERE price &lt; 100)</tt> binds the variable <tt>cheap_parts</tt> to the result of the subquery.</p>
4710<p><tt>LET pay = salary + bonus</tt> binds the variable <tt>pay</tt> to the result of evaluating the expression <tt>salary + bonus</tt>.</p>
4711</li>
4712<li>
4713
4714<p>FROM, GROUP BY, and SELECT clauses have optional AS subclauses that contain an expression and a name (called an <i>iteration variable</i> in a FROM clause, or an alias in GROUP BY or SELECT.)</p>
4715<p>Examples:</p>
4716<p><tt>FROM customer AS c, order AS o</tt></p>
4717<p><tt>GROUP BY salary + bonus AS total_pay</tt></p>
4718<p><tt>SELECT MAX(price) AS highest_price</tt></p>
4719<p>An AS subclause always binds the name (as a variable) to the result of the expression (or, in the case of a FROM clause, to the <i>individual members</i> of the collection identified by the expression.)</p>
4720<p>It&#x2019;s always a good practice to use the keyword AS when defining an alias or iteration variable. However, as in SQL, the syntax allows the keyword AS to be omitted. For example, the FROM clause above could have been written like this:</p>
4721<p><tt>FROM customer c, order o</tt></p>
4722<p>Omitting the keyword AS does not affect the binding of variables. The FROM clause in this example binds variables c and o whether the keyword AS is used or not.</p>
4723<p>In certain cases, a variable is automatically bound even if no alias or variable-name is specified. Whenever an expression could have been followed by an AS subclause, if the expression consists of a simple name or a path expression, that expression binds a variable whose name is the same as the simple name or the last step in the path expression. Here are some examples:</p>
4724<p><tt>FROM customer, order</tt> binds iteration variables named <tt>customer</tt> and <tt>order</tt></p>
4725<p><tt>GROUP BY address.zipcode</tt> binds a variable named <tt>zipcode</tt></p>
4726<p><tt>SELECT item[0].price</tt> binds a variable named <tt>price</tt></p>
4727<p>Note that a FROM clause iterates over a collection (usually a dataset), binding a variable to each member of the collection in turn. The name of the collection remains in scope, but it is not a variable. For example, consider this FROM clause used in a self-join:</p>
4728<p><tt>FROM customer AS c1, customer AS c2</tt></p>
4729<p>This FROM clause joins the customer dataset to itself, binding the iteration variables c1 and c2 to objects in the left-hand-side and right-hand-side of the join, respectively. After the FROM clause, c1 and c2 are in scope as variables, and customer remains accessible as a dataset name but not as a variable.</p>
4730</li>
4731<li>
4732
4733<p>Special rules for GROUP BY:</p>
4734<ol style="list-style-type: decimal">
4735
4736<li>
4737
4738<p>If a GROUP BY clause specifies an expression that has no explicit alias, it binds a pseudo-variable that is lexicographically identical to the expression itself. For example:</p>
4739<p><tt>GROUP BY salary + bonus</tt> binds a pseudo-variable named <tt>salary + bonus</tt>.</p>
4740<p>This rule allows subsequent clauses to refer to the grouping expression (salary + bonus) even though its constituent variables (salary and bonus) are no longer in scope. For example, the following query is valid:</p>
4741
4742<div>
4743<div>
4744<pre class="source">FROM employee
4745GROUP BY salary + bonus
4746HAVING salary + bonus &gt; 1000
4747SELECT salary + bonus, COUNT(*) AS how_many
4748</pre></div></div>
4749
4750<p>While it might have been more elegant to explicitly require an alias in cases like this, the pseudo-variable rule is retained for SQL compatibility. Note that the expression <tt>salary + bonus</tt> is not <i>actually</i> evaluated in the HAVING and SELECT clauses (and could not be since <tt>salary</tt> and <tt>bonus</tt> are no longer individually in scope). Instead, the expression <tt>salary + bonus</tt> is treated as a reference to the pseudo-variable defined in the GROUP BY clause.</p>
4751</li>
4752<li>
4753
4754<p>A GROUP BY clause may be followed by a GROUP AS clause that binds a variable to the group. The purpose of this variable is to make the individual objects inside the group visible to subqueries that may need to iterate over them.</p>
4755<p>The GROUP AS variable is bound to a multiset of objects. Each object represents one of the members of the group. Since the group may have been formed from a join, each of the member-objects contains a nested object for each variable bound by the nearest FROM clause (and its LET subclause, if any). These nested objects, in turn, contain the actual fields of the group-member. To understand this process, consider the following query fragment:</p>
4756
4757<div>
4758<div>
4759<pre class="source">FROM parts AS p, suppliers AS s
4760WHERE p.suppno = s.suppno
4761GROUP BY p.color GROUP AS g
4762</pre></div></div>
4763
4764<p>Suppose that the objects in <tt>parts</tt> have fields <tt>partno</tt>, <tt>color</tt>, and <tt>suppno</tt>. Suppose that the objects in suppliers have fields <tt>suppno</tt> and <tt>location</tt>.</p>
4765<p>Then, for each group formed by the GROUP BY, the variable g will be bound to a multiset with the following structure:</p>
4766
4767<div>
4768<div>
4769<pre class="source">[ { &quot;p&quot;: { &quot;partno&quot;: &quot;p1&quot;, &quot;color&quot;: &quot;red&quot;, &quot;suppno&quot;: &quot;s1&quot; },
4770 &quot;s&quot;: { &quot;suppno&quot;: &quot;s1&quot;, &quot;location&quot;: &quot;Denver&quot; } },
4771 { &quot;p&quot;: { &quot;partno&quot;: &quot;p2&quot;, &quot;color&quot;: &quot;red&quot;, &quot;suppno&quot;: &quot;s2&quot; },
4772 &quot;s&quot;: { &quot;suppno&quot;: &quot;s2&quot;, &quot;location&quot;: &quot;Atlanta&quot; } },
4773 ...
4774]
4775</pre></div></div>
4776</li>
4777</ol>
4778</li>
4779</ol></div>
4780<div class="section">
4781<h2><a name="Scoping" id="Scoping">Scoping</a></h2>
4782<p>In general, the variables that are in scope at a particular position are those variables that were bound earlier in the current query block, in outer (enclosing) query blocks, or in a WITH clause at the beginning of the query. More specific rules follow.</p>
4783<p>The clauses in a query block are conceptually processed in the following order:</p>
4784<ul>
4785
4786<li>FROM (followed by LET subclause, if any)</li>
4787<li>WHERE</li>
4788<li>GROUP BY (followed by LET subclause, if any)</li>
4789<li>HAVING</li>
4790<li>SELECT or SELECT VALUE</li>
4791<li>ORDER BY</li>
4792<li>OFFSET</li>
4793<li>LIMIT</li>
4794</ul>
4795<p>During processing of each clause, the variables that are in scope are those variables that are bound in the following places:</p>
4796<ol style="list-style-type: decimal">
4797
4798<li>
4799
4800<p>In earlier clauses of the same query block (as defined by the ordering given above).</p>
4801<p>Example: <tt>FROM orders AS o SELECT o.date</tt> The variable <tt>o</tt> in the SELECT clause is bound, in turn, to each object in the dataset <tt>orders</tt>.</p>
4802</li>
4803<li>
4804
4805<p>In outer query blocks in which the current query block is nested. In case of duplication, the innermost binding wins.</p>
4806</li>
4807<li>
4808
4809<p>In the WITH clause (if any) at the beginning of the query.</p>
4810</li>
4811</ol>
4812<p>However, in a query block where a GROUP BY clause is present:</p>
4813<ol style="list-style-type: decimal">
4814
4815<li>
4816
4817<p>In clauses processed before GROUP BY, scoping rules are the same as though no GROUP BY were present.</p>
4818</li>
4819<li>
4820
4821<p>In clauses processed after GROUP BY, the variables bound in the nearest FROM-clause (and its LET subclause, if any) are removed from scope and replaced by the variables bound in the GROUP BY clause (and its LET subclause, if any). However, this replacement does not apply inside the arguments of the five SQL special aggregating functions (MIN, MAX, AVG, SUM, and COUNT). These functions still need to see the individual data items over which they are computing an aggregation. For example, after <tt>FROM employee AS e GROUP BY deptno</tt>, it would not be valid to reference <tt>e.salary</tt>, but <tt>AVG(e.salary)</tt> would be valid.</p>
4822</li>
4823</ol>
4824<p>Special case: In an expression inside a FROM clause, a variable is in scope if it was bound in an earlier expression in the same FROM clause. Example:</p>
4825
4826<div>
4827<div>
4828<pre class="source">FROM orders AS o, o.items AS i
4829</pre></div></div>
4830
4831<p>The reason for this special case is to support iteration over nested collections.</p>
4832<p>Note that, since the SELECT clause comes <i>after</i> the WHERE and GROUP BY clauses in conceptual processing order, any variables defined in SELECT are not visible in WHERE or GROUP BY. Therefore the following query will not return what might be the expected result (since in the WHERE clause, <tt>pay</tt> will be interpreted as a field in the <tt>emp</tt> object rather than as the computed value <tt>salary + bonus</tt>):</p>
4833
4834<div>
4835<div>
4836<pre class="source">SELECT name, salary + bonus AS pay
4837FROM emp
4838WHERE pay &gt; 1000
4839ORDER BY pay
4840</pre></div></div>
4841
4842<p>The likely intent of the query above can be accomplished as follows:</p>
4843
4844<div>
4845<div>
4846<pre class="source">FROM emp AS e
4847LET pay = e.salary + e.bonus
4848WHERE pay &gt; 1000
4849SELECT e.name, pay
4850ORDER BY pay
4851</pre></div></div>
4852
4853<p>Note that variables defined by <tt>JOIN</tt> subclauses are not visible to other subclauses in the same <tt>FROM</tt> clause. This also applies to the <tt>FROM</tt> variable that starts the <tt>JOIN</tt> subclause.</p></div>
4854<div class="section">
4855<h2><a name="Resolving_Names"></a><a name="Resolving_names" id="Resolving_names">Resolving Names</a></h2>
4856<p>The process of name resolution begins with the leftmost identifier in the name. The rules for resolving the leftmost identifier are:</p>
4857<ol style="list-style-type: decimal">
4858
4859<li>
4860
4861<p><i>In a FROM clause</i>: Names in a FROM clause identify the collections over which the query block will iterate. These collections may be stored datasets or may be the results of nested query blocks. A stored dataset may be in a named dataverse or in the default dataverse. Thus, if the two-part name <tt>a.b</tt> is in a FROM clause, a might represent a dataverse and <tt>b</tt> might represent a dataset in that dataverse. Another example of a two-part name in a FROM clause is <tt>FROM orders AS o, o.items AS i</tt>. In <tt>o.items</tt>, <tt>o</tt> represents an order object bound earlier in the FROM clause, and items represents the items object inside that order.</p>
4862<p>The rules for resolving the leftmost identifier in a FROM clause (including a JOIN subclause), or in the expression following IN in a quantified predicate, are as follows:</p>
4863<ol style="list-style-type: decimal">
4864
4865<li>
4866
4867<p>If the identifier matches a variable-name that is in scope, it resolves to the binding of that variable. (Note that in the case of a subquery, an in-scope variable might have been bound in an outer query block; this is called a correlated subquery.)</p>
4868</li>
4869<li>
4870
4871<p>Otherwise, if the identifier is the first part of a two-part name like <tt>a.b</tt>, the name is treated as <tt>dataverse.dataset</tt>. If the identifier stands alone as a one-part name, it is treated as the name of a dataset in the default dataverse. If the designated dataset exists then the identifier is resolved to that dataset, otherwise if a synonym with given name exists then the identifier is resolved to the target dataset of that synonym (potentially recursively if this synonym points to another synonym). An error will result if the designated dataset or a synonym with this name does not exist.</p>
4872<p>Datasets take precedence over synonyms, so if both a dataset and a synonym have the same name then the resolution is to the dataset.</p>
4873</li>
4874</ol>
4875</li>
4876<li>
4877
4878<p><i>Elsewhere in a query block</i>: In clauses other than FROM, a name typically identifies a field of some object. For example, if the expression <tt>a.b</tt> is in a SELECT or WHERE clause, it&#x2019;s likely that <tt>a</tt> represents an object and <tt>b</tt> represents a field in that object.</p>
4879<p>The rules for resolving the leftmost identifier in clauses other than the ones listed in Rule 1 are:</p>
4880<ol style="list-style-type: decimal">
4881
4882<li>
4883
4884<p>If the identifier matches a variable-name that is in scope, it resolves to the binding of that variable. (In the case of a correlated subquery, the in-scope variable might have been bound in an outer query block.)</p>
4885</li>
4886<li>
4887
4888<p>(The &#x201c;Single Variable Rule&#x201d;): Otherwise, if the FROM clause in the current query block binds exactly one variable, the identifier is treated as a field access on the object bound to that variable. For example, in the query <tt>FROM customer SELECT address</tt>, the identifier address is treated as a field in the object bound to the variable customer. At runtime, if the object bound to customer has no <tt>address</tt> field, the <tt>address</tt> expression will return <tt>missing</tt>. If the FROM clause in the current query block binds multiple variables, name resolution fails with an &#x201c;ambiguous name&#x201d; error. If there&#x2019;s no FROM clause in the current query block, name resolution fails with an &#x201c;undefined identifier&#x201d; error. Note that the Single Variable Rule searches for bound variables only in the current query block, not in outer (containing) blocks. The purpose of this rule is to permit the compiler to resolve field-references unambiguously without relying on any schema information. Also note that variables defined by LET clauses do not participate in the resolution process performed by this rule.</p>
4889<p>Exception: In a query that has a GROUP BY clause, the Single Variable Rule does not apply in any clauses that occur after the GROUP BY because, in these clauses, the variables bound by the FROM clause are no longer in scope. In clauses after GROUP BY, only Rule 2.1 applies.</p>
4890</li>
4891</ol>
4892</li>
4893<li>
4894
4895<p>In an ORDER BY clause following a UNION ALL expression:</p>
4896<p>The leftmost identifier is treated as a field-access on the objects that are generated by the UNION ALL. For example:</p>
4897
4898<div>
4899<div>
4900<pre class="source">query-block-1
4901UNION ALL
4902query-block-2
4903ORDER BY salary
4904</pre></div></div>
4905
4906<p>In the result of this query, objects that have a foo field will be ordered by the value of this field; objects that have no foo field will appear at at the beginning of the query result (in ascending order) or at the end (in descending order.)</p>
4907</li>
4908<li>
4909
4910<p><i>In a standalone expression</i>: If a query consists of a standalone expression then identifiers inside that expression are resolved according to Rule 1. For example, if the whole query is <tt>ARRAY_COUNT(a.b)</tt> then <tt>a.b</tt> will be treated as dataset <tt>b</tt> contained in dataverse <tt>a</tt>. Note that this rule only applies to identifiers which are located directly inside a standalone expression. Identifiers inside SELECT statements in a standalone expression are still resolved according to Rules 1-3. For example, if the whole query is <tt>ARRAY_SUM( (FROM employee AS e SELECT VALUE salary) )</tt> then <tt>salary</tt> is resolved as <tt>e.salary</tt> following the &#x201c;Single Variable Rule&#x201d; (Rule 2.2).</p>
4911</li>
4912<li>
4913
4914<p>Once the leftmost identifier has been resolved, the following dots and identifiers in the name (if any) are treated as a path expression that navigates to a field nested inside that object. The name resolves to the field at the end of the path. If this field does not exist, the value <tt>missing</tt> is returned.</p>
4915</li>
4916</ol></div>
4917 </div>
4918 </div>
4919 </div>
4920 <hr/>
4921 <footer>
4922 <div class="container-fluid">
4923 <div class="row-fluid">
4924<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
4925 feather logo, and the Apache AsterixDB project logo are either
4926 registered trademarks or trademarks of The Apache Software
4927 Foundation in the United States and other countries.
4928 All other marks mentioned may be trademarks or registered
4929 trademarks of their respective owners.
4930 </div>
4931 </div>
4932 </div>
4933 </footer>
4934 </body>
4935</html>