blob: 8d8b5df20ad6ae6a3611fd3a19e77803ea37c457 [file] [log] [blame]
Ian Maxonb2f1d3e2018-10-12 14:42:34 -07001<!DOCTYPE html>
2<!--
3 | Generated by Apache Maven Doxia Site Renderer 1.8.1 from target/generated-site/markdown/sqlpp/manual.md at 2018-10-12
4 | Rendered using Apache Maven Fluido Skin 1.7
5-->
6<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7 <head>
8 <meta charset="UTF-8" />
9 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
10 <meta name="Date-Revision-yyyymmdd" content="20181012" />
11 <meta http-equiv="Content-Language" content="en" />
12 <title>AsterixDB &#x2013; The SQL++ Query Language</title>
13 <link rel="stylesheet" href="../css/apache-maven-fluido-1.7.min.css" />
14 <link rel="stylesheet" href="../css/site.css" />
15 <link rel="stylesheet" href="../css/print.css" media="print" />
16 <script type="text/javascript" src="../js/apache-maven-fluido-1.7.min.js"></script>
17
18 </head>
19 <body class="topBarDisabled">
20 <div class="container-fluid">
21 <div id="banner">
22 <div class="pull-left"><a href=".././" id="bannerLeft"><img src="../images/asterixlogo.png" alt="AsterixDB"/></a></div>
23 <div class="pull-right"></div>
24 <div class="clear"><hr/></div>
25 </div>
26
27 <div id="breadcrumbs">
28 <ul class="breadcrumb">
29 <li id="publishDate">Last Published: 2018-10-12</li>
30 <li id="projectVersion" class="pull-right">Version: 0.9.4</li>
31 <li class="pull-right"><a href="../index.html" title="Documentation Home">Documentation Home</a></li>
32 </ul>
33 </div>
34 <div class="row-fluid">
35 <div id="leftColumn" class="span2">
36 <div class="well sidebar-nav">
37 <ul class="nav nav-list">
38 <li class="nav-header">Get Started - Installation</li>
39 <li><a href="../ncservice.html" title="Option 1: using NCService"><span class="none"></span>Option 1: using NCService</a></li>
40 <li><a href="../ansible.html" title="Option 2: using Ansible"><span class="none"></span>Option 2: using Ansible</a></li>
41 <li><a href="../aws.html" title="Option 3: using Amazon Web Services"><span class="none"></span>Option 3: using Amazon Web Services</a></li>
42 <li class="nav-header">AsterixDB Primer</li>
43 <li><a href="../sqlpp/primer-sqlpp.html" title="Using SQL++"><span class="none"></span>Using SQL++</a></li>
44 <li class="nav-header">Data Model</li>
45 <li><a href="../datamodel.html" title="The Asterix Data Model"><span class="none"></span>The Asterix Data Model</a></li>
46 <li class="nav-header">Queries</li>
47 <li class="active"><a href="#"><span class="none"></span>The SQL++ Query Language</a></li>
48 <li><a href="../sqlpp/builtins.html" title="Builtin Functions"><span class="none"></span>Builtin Functions</a></li>
49 <li class="nav-header">API/SDK</li>
50 <li><a href="../api.html" title="HTTP API"><span class="none"></span>HTTP API</a></li>
51 <li><a href="../csv.html" title="CSV Output"><span class="none"></span>CSV Output</a></li>
52 <li class="nav-header">Advanced Features</li>
53 <li><a href="../aql/externaldata.html" title="Accessing External Data"><span class="none"></span>Accessing External Data</a></li>
54 <li><a href="../feeds.html" title="Data Ingestion with Feeds"><span class="none"></span>Data Ingestion with Feeds</a></li>
55 <li><a href="../udf.html" title="User Defined Functions"><span class="none"></span>User Defined Functions</a></li>
56 <li><a href="../sqlpp/filters.html" title="Filter-Based LSM Index Acceleration"><span class="none"></span>Filter-Based LSM Index Acceleration</a></li>
57 <li><a href="../sqlpp/fulltext.html" title="Support of Full-text Queries"><span class="none"></span>Support of Full-text Queries</a></li>
58 <li><a href="../sqlpp/similarity.html" title="Support of Similarity Queries"><span class="none"></span>Support of Similarity Queries</a></li>
59 <li class="nav-header">Deprecated</li>
60 <li><a href="../aql/primer.html" title="AsterixDB Primer: Using AQL"><span class="none"></span>AsterixDB Primer: Using AQL</a></li>
61 <li><a href="../aql/manual.html" title="Queries: The Asterix Query Language (AQL)"><span class="none"></span>Queries: The Asterix Query Language (AQL)</a></li>
62 <li><a href="../aql/builtins.html" title="Queries: Builtin Functions (AQL)"><span class="none"></span>Queries: Builtin Functions (AQL)</a></li>
63</ul>
64 <hr />
65 <div id="poweredBy">
66 <div class="clear"></div>
67 <div class="clear"></div>
68 <div class="clear"></div>
69 <div class="clear"></div>
70<a href=".././" title="AsterixDB" class="builtBy"><img class="builtBy" alt="AsterixDB" src="../images/asterixlogo.png" /></a>
71 </div>
72 </div>
73 </div>
74 <div id="bodyColumn" class="span10" >
75<!--
76 ! Licensed to the Apache Software Foundation (ASF) under one
77 ! or more contributor license agreements. See the NOTICE file
78 ! distributed with this work for additional information
79 ! regarding copyright ownership. The ASF licenses this file
80 ! to you under the Apache License, Version 2.0 (the
81 ! "License"); you may not use this file except in compliance
82 ! with the License. You may obtain a copy of the License at
83 !
84 ! http://www.apache.org/licenses/LICENSE-2.0
85 !
86 ! Unless required by applicable law or agreed to in writing,
87 ! software distributed under the License is distributed on an
88 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
89 ! KIND, either express or implied. See the License for the
90 ! specific language governing permissions and limitations
91 ! under the License.
92 !-->
93<h1>The SQL++ Query Language</h1>
94<ul>
95
96<li><a href="#Introduction">1. Introduction</a></li>
97<li><a href="#Expressions">2. Expressions</a>
98<ul>
99
100<li><a href="#Operator_expressions">Operator Expressions</a>
101<ul>
102
103<li><a href="#Arithmetic_operators">Arithmetic Operators</a></li>
104<li><a href="#Collection_operators">Collection Operators</a></li>
105<li><a href="#Comparison_operators">Comparison Operators</a></li>
106<li><a href="#Logical_operators">Logical Operators</a></li>
107</ul>
108</li>
109<li><a href="#Case_expressions">Case Expressions</a></li>
110<li><a href="#Quantified_expressions">Quantified Expressions</a></li>
111<li><a href="#Path_expressions">Path Expressions</a></li>
112<li><a href="#Primary_expressions">Primary Expressions</a>
113<ul>
114
115<li><a href="#Literals">Literals</a></li>
116<li><a href="#Variable_references">Variable References</a></li>
117<li><a href="#Parenthesized_expressions">Parenthesized Expressions</a></li>
118<li><a href="#Function_call_expressions">Function call Expressions</a></li>
119<li><a href="#Constructors">Constructors</a></li>
120</ul>
121</li>
122</ul>
123</li>
124<li><a href="#Queries">3. Queries</a>
125<ul>
126
127<li><a href="#Declarations">Declarations</a></li>
128<li><a href="#SELECT_statements">SELECT Statements</a></li>
129<li><a href="#Select_clauses">SELECT Clauses</a>
130<ul>
131
132<li><a href="#Select_element">Select Element/Value/Raw</a></li>
133<li><a href="#SQL_select">SQL-style Select</a></li>
134<li><a href="#Select_star">Select *</a></li>
135<li><a href="#Select_distinct">Select Distinct</a></li>
136<li><a href="#Unnamed_projections">Unnamed Projections</a></li>
137<li><a href="#Abbreviated_field_access_expressions">Abbreviated Field Access Expressions</a></li>
138</ul>
139</li>
140<li><a href="#Unnest_clauses">UNNEST Clauses</a>
141<ul>
142
143<li><a href="#Inner_unnests">Inner Unnests</a></li>
144<li><a href="#Left_outer_unnests">Left Outer Unnests</a></li>
145<li><a href="#Expressing_joins_using_unnests">Expressing Joins Using Unnests</a></li>
146</ul>
147</li>
148<li><a href="#From_clauses">FROM clauses</a>
149<ul>
150
151<li><a href="#Binding_expressions">Binding Expressions</a></li>
152<li><a href="#Multiple_from_terms">Multiple From Terms</a></li>
153<li><a href="#Expressing_joins_using_from_terms">Expressing Joins Using From Terms</a></li>
154<li><a href="#Implicit_binding_variables">Implicit Binding Variables</a></li>
155</ul>
156</li>
157<li><a href="#Join_clauses">JOIN Clauses</a>
158<ul>
159
160<li><a href="#Inner_joins">Inner Joins</a></li>
161<li><a href="#Left_outer_joins">Left Outer Joins</a></li>
162</ul>
163</li>
164<li><a href="#Group_By_clauses">GROUP BY Clauses</a>
165<ul>
166
167<li><a href="#Group_variables">Group Variables</a></li>
168<li><a href="#Implicit_group_key_variables">Implicit Group Key Variables</a></li>
169<li><a href="#Implicit_group_variables">Implicit Group Variables</a></li>
170<li><a href="#Aggregation_functions">Aggregation Functions</a></li>
171<li><a href="#SQL-92_aggregation_functions">SQL-92 Aggregation Functions</a></li>
172<li><a href="#SQL-92_compliant_gby">SQL-92 Compliant GROUP BY Aggregations</a></li>
173<li><a href="#Column_aliases">Column Aliases</a></li>
174</ul>
175</li>
176<li><a href="#Where_having_clauses">WHERE Clauses and HAVING Clauses</a></li>
177<li><a href="#Order_By_clauses">ORDER BY Clauses</a></li>
178<li><a href="#Limit_clauses">LIMIT Clauses</a></li>
179<li><a href="#With_clauses">WITH Clauses</a></li>
180<li><a href="#Let_clauses">LET Clauses</a></li>
181<li><a href="#Union_all">UNION ALL</a></li>
182<li><a href="#Vs_SQL-92">SQL++ Vs. SQL-92</a></li>
183</ul>
184</li>
185<li><a href="#Errors">4. Errors</a>
186<ul>
187
188<li><a href="#Syntax_errors">Syntax Errors</a></li>
189<li><a href="#Identifier_resolution_errors">Identifier Resolution Errors</a></li>
190<li><a href="#Type_errors">Type Errors</a></li>
191<li><a href="#Resource_errors">Resource Errors</a></li>
192</ul>
193</li>
194<li><a href="#DDL_and_DML_statements">5. DDL and DML Statements</a>
195<ul>
196
197<li><a href="#Lifecycle_management_statements">Lifecycle Management Statements</a>
198<ul>
199
200<li><a href="#Dataverses">Dataverses</a></li>
201<li><a href="#Types">Types</a></li>
202<li><a href="#Datasets">Datasets</a></li>
203<li><a href="#Indices">Indices</a></li>
204<li><a href="#Functions">Functions</a></li>
205<li><a href="#Removal">Removal</a></li>
206<li><a href="#Load_statement">Load Statement</a></li>
207</ul>
208</li>
209<li><a href="#Modification_statements">Modification Statements</a>
210<ul>
211
212<li><a href="#Inserts">Inserts</a></li>
213<li><a href="#Upserts">Upserts</a></li>
214<li><a href="#Deletes">Deletes</a></li>
215</ul>
216</li>
217</ul>
218</li>
219<li><a href="#Reserved_keywords">Appendix 1. Reserved Keywords</a></li>
220<li><a href="#Performance_tuning">Appendix 2. Performance Tuning</a>
221<ul>
222
223<li><a href="#Parallelism_parameter">Parallelism Parameter</a></li>
224<li><a href="#Memory_parameters">Memory Parameters</a></li>
225</ul>
226</li>
227<li><a href="#Variable_bindings_and_name_resolution">Appendix 3. Variable Bindings and Name Resolution</a></li>
228</ul><!--
229 ! Licensed to the Apache Software Foundation (ASF) under one
230 ! or more contributor license agreements. See the NOTICE file
231 ! distributed with this work for additional information
232 ! regarding copyright ownership. The ASF licenses this file
233 ! to you under the Apache License, Version 2.0 (the
234 ! "License"); you may not use this file except in compliance
235 ! with the License. You may obtain a copy of the License at
236 !
237 ! http://www.apache.org/licenses/LICENSE-2.0
238 !
239 ! Unless required by applicable law or agreed to in writing,
240 ! software distributed under the License is distributed on an
241 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
242 ! KIND, either express or implied. See the License for the
243 ! specific language governing permissions and limitations
244 ! under the License.
245 !-->
246
247<h1><a name="Introduction" id="Introduction">1. Introduction</a><font size="3" /></h1>
248<p>This document is intended as a reference guide to the full syntax and semantics of the SQL++ Query Language, a SQL-inspired language for working with semistructured data. SQL++ has much in common with SQL, but some differences do exist due to the different data models that the two languages were designed to serve. SQL was designed in the 1970&#x2019;s for interacting with the flat, schema-ified world of relational databases, while SQL++ is much newer and targets the nested, schema-optional (or even schema-less) world of modern NoSQL systems.</p>
249<p>In the context of Apache AsterixDB, SQL++ is intended for working with the Asterix Data Model (<a href="../datamodel.html">ADM</a>),a data model based on a superset of JSON with an enriched and flexible type system. New AsterixDB users are encouraged to read and work through the (much friendlier) guide &#x201c;<a href="primer-sqlpp.html">AsterixDB 101: An ADM and SQL++ Primer</a>&#x201d; before attempting to make use of this document. In addition, readers are advised to read through the <a href="../datamodel.html">Asterix Data Model (ADM) reference guide</a> first as well, as an understanding of the data model is a prerequisite to understanding SQL++.</p>
250<p>In what follows, we detail the features of the SQL++ language in a grammar-guided manner. We list and briefly explain each of the productions in the SQL++ grammar, offering examples (and results) for clarity.</p><!--
251 ! Licensed to the Apache Software Foundation (ASF) under one
252 ! or more contributor license agreements. See the NOTICE file
253 ! distributed with this work for additional information
254 ! regarding copyright ownership. The ASF licenses this file
255 ! to you under the Apache License, Version 2.0 (the
256 ! "License"); you may not use this file except in compliance
257 ! with the License. You may obtain a copy of the License at
258 !
259 ! http://www.apache.org/licenses/LICENSE-2.0
260 !
261 ! Unless required by applicable law or agreed to in writing,
262 ! software distributed under the License is distributed on an
263 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
264 ! KIND, either express or implied. See the License for the
265 ! specific language governing permissions and limitations
266 ! under the License.
267 !-->
268
269<h1><a name="Expressions" id="Expressions">2. Expressions</a></h1><!--
270 ! Licensed to the Apache Software Foundation (ASF) under one
271 ! or more contributor license agreements. See the NOTICE file
272 ! distributed with this work for additional information
273 ! regarding copyright ownership. The ASF licenses this file
274 ! to you under the Apache License, Version 2.0 (the
275 ! "License"); you may not use this file except in compliance
276 ! with the License. You may obtain a copy of the License at
277 !
278 ! http://www.apache.org/licenses/LICENSE-2.0
279 !
280 ! Unless required by applicable law or agreed to in writing,
281 ! software distributed under the License is distributed on an
282 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
283 ! KIND, either express or implied. See the License for the
284 ! specific language governing permissions and limitations
285 ! under the License.
286 !-->
287
288<p>SQL++ is a highly composable expression language. Each SQL++ expression returns zero or more data model instances. There are three major kinds of expressions in SQL++. At the topmost level, a SQL++ expression can be an OperatorExpression (similar to a mathematical expression), an ConditionalExpression (to choose between alternative values), or a QuantifiedExpression (which yields a boolean value). Each will be detailed as we explore the full SQL++ grammar.</p>
289
290<div>
291<div>
292<pre class="source">Expression ::= OperatorExpression | CaseExpression | QuantifiedExpression
293</pre></div></div>
294
295<p>Note that in the following text, words enclosed in angle brackets denote keywords that are not case-sensitive.</p>
296<div class="section">
297<h2><a name="Operator_Expressions"></a><a name="Operator_expressions" id="Operator_expressions">Operator Expressions</a></h2>
298<p>Operators perform a specific operation on the input values or expressions. The syntax of an operator expression is as follows:</p>
299
300<div>
301<div>
302<pre class="source">OperatorExpression ::= PathExpression
303 | Operator OperatorExpression
304 | OperatorExpression Operator (OperatorExpression)?
305 | OperatorExpression &lt;BETWEEN&gt; OperatorExpression &lt;AND&gt; OperatorExpression
306</pre></div></div>
307
308<p>SQL++ provides a full set of operators that you can use within its statements. Here are the categories of operators:</p>
309<ul>
310
311<li><a href="#Arithmetic_operators">Arithmetic Operators</a>, to perform basic mathematical operations;</li>
312<li><a href="#Collection_operators">Collection Operators</a>, to evaluate expressions on collections or objects;</li>
313<li><a href="#Comparison_operators">Comparison Operators</a>, to compare two expressions;</li>
314<li><a href="#Logical_operators">Logical Operators</a>, to combine operators using Boolean logic.</li>
315</ul>
316<p>The following table summarizes the precedence order (from higher to lower) of the major unary and binary operators:</p>
317<table border="0" class="table table-striped">
318<thead>
319
320<tr class="a">
321<th> Operator </th>
322<th> Operation </th></tr>
323</thead><tbody>
324
325<tr class="b">
326<td> EXISTS, NOT EXISTS </td>
327<td> Collection emptiness testing </td></tr>
328<tr class="a">
329<td> ^ </td>
330<td> Exponentiation </td></tr>
331<tr class="b">
332<td> *, /, % </td>
333<td> Multiplication, division, modulo </td></tr>
334<tr class="a">
335<td> +, - </td>
336<td> Addition, subtraction </td></tr>
337<tr class="b">
338<td> || </td>
339<td> String concatenation </td></tr>
340<tr class="a">
341<td> IS NULL, IS NOT NULL, IS MISSING, IS NOT MISSING, <br />IS UNKNOWN, IS NOT UNKNOWN, IS VALUED, IS NOT VALUED </td>
342<td> Unknown value comparison </td></tr>
343<tr class="b">
344<td> BETWEEN, NOT BETWEEN </td>
345<td> Range comparison (inclusive on both sides) </td></tr>
346<tr class="a">
347<td> =, !=, &lt;&gt;, &lt;, &gt;, &lt;=, &gt;=, LIKE, NOT LIKE, IN, NOT IN </td>
348<td> Comparison </td></tr>
349<tr class="b">
350<td> NOT </td>
351<td> Logical negation </td></tr>
352<tr class="a">
353<td> AND </td>
354<td> Conjunction </td></tr>
355<tr class="b">
356<td> OR </td>
357<td> Disjunction </td></tr>
358</tbody>
359</table>
360<p>In general, if any operand evaluates to a <tt>MISSING</tt> value, the enclosing operator will return <tt>MISSING</tt>; if none of operands evaluates to a <tt>MISSING</tt> value but there is an operand evaluates to a <tt>NULL</tt> value, the enclosing operator will return <tt>NULL</tt>. However, there are a few exceptions listed in <a href="#Comparison_operators">comparison operators</a> and <a href="#Logical_operators">logical operators</a>.</p>
361<div class="section">
362<h3><a name="Arithmetic_Operators"></a><a name="Arithmetic_operators" id="Arithmetic_operators">Arithmetic Operators</a></h3>
363<p>Arithmetic operators are used to exponentiate, add, subtract, multiply, and divide numeric values, or concatenate string values.</p>
364<table border="0" class="table table-striped">
365<thead>
366
367<tr class="a">
368<th> Operator </th>
369<th> Purpose </th>
370<th> Example </th></tr>
371</thead><tbody>
372
373<tr class="b">
374<td> +, - </td>
375<td> As unary operators, they denote a <br />positive or negative expression </td>
376<td> SELECT VALUE -1; </td></tr>
377<tr class="a">
378<td> +, - </td>
379<td> As binary operators, they add or subtract </td>
380<td> SELECT VALUE 1 + 2; </td></tr>
381<tr class="b">
382<td> *, /, % </td>
383<td> Multiply, divide, modulo </td>
384<td> SELECT VALUE 4 / 2.0; </td></tr>
385<tr class="a">
386<td> ^ </td>
387<td> Exponentiation </td>
388<td> SELECT VALUE 2^3; </td></tr>
389<tr class="b">
390<td> || </td>
391<td> String concatenation </td>
392<td> SELECT VALUE &#x201c;ab&#x201d;||&#x201c;c&#x201d;||&#x201c;d&#x201d;; </td></tr>
393</tbody>
394</table></div>
395<div class="section">
396<h3><a name="Collection_Operators"></a><a name="Collection_operators" id="Collection_operators">Collection Operators</a></h3>
397<p>Collection operators are used for membership tests (IN, NOT IN) or empty collection tests (EXISTS, NOT EXISTS).</p>
398<table border="0" class="table table-striped">
399<thead>
400
401<tr class="a">
402<th> Operator </th>
403<th> Purpose </th>
404<th> Example </th></tr>
405</thead><tbody>
406
407<tr class="b">
408<td> IN </td>
409<td> Membership test </td>
410<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.lang IN [&#x201c;en&#x201d;, &#x201c;de&#x201d;]; </td></tr>
411<tr class="a">
412<td> NOT IN </td>
413<td> Non-membership test </td>
414<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.lang NOT IN [&#x201c;en&#x201d;]; </td></tr>
415<tr class="b">
416<td> EXISTS </td>
417<td> Check whether a collection is not empty </td>
418<td> SELECT * FROM ChirpMessages cm <br />WHERE EXISTS cm.referredTopics; </td></tr>
419<tr class="a">
420<td> NOT EXISTS </td>
421<td> Check whether a collection is empty </td>
422<td> SELECT * FROM ChirpMessages cm <br />WHERE NOT EXISTS cm.referredTopics; </td></tr>
423</tbody>
424</table></div>
425<div class="section">
426<h3><a name="Comparison_Operators"></a><a name="Comparison_operators" id="Comparison_operators">Comparison Operators</a></h3>
427<p>Comparison operators are used to compare values. The comparison operators fall into one of two sub-categories: missing value comparisons and regular value comparisons. SQL++ (and JSON) has two ways of representing missing information in a object - the presence of the field with a NULL for its value (as in SQL), and the absence of the field (which JSON permits). For example, the first of the following objects represents Jack, whose friend is Jill. In the other examples, Jake is friendless a la SQL, with a friend field that is NULL, while Joe is friendless in a more natural (for JSON) way, i.e., by not having a friend field.</p>
428<div class="section">
429<div class="section">
430<h5><a name="Examples"></a>Examples</h5>
431<p>{&#x201c;name&#x201d;: &#x201c;Jack&#x201d;, &#x201c;friend&#x201d;: &#x201c;Jill&#x201d;}</p>
432<p>{&#x201c;name&#x201d;: &#x201c;Jake&#x201d;, &#x201c;friend&#x201d;: NULL}</p>
433<p>{&#x201c;name&#x201d;: &#x201c;Joe&#x201d;}</p>
434<p>The following table enumerates all of SQL++&#x2019;s comparison operators.</p>
435<table border="0" class="table table-striped">
436<thead>
437
438<tr class="a">
439<th> Operator </th>
440<th> Purpose </th>
441<th> Example </th></tr>
442</thead><tbody>
443
444<tr class="b">
445<td> IS NULL </td>
446<td> Test if a value is NULL </td>
447<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS NULL; </td></tr>
448<tr class="a">
449<td> IS NOT NULL </td>
450<td> Test if a value is not NULL </td>
451<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS NOT NULL; </td></tr>
452<tr class="b">
453<td> IS MISSING </td>
454<td> Test if a value is MISSING </td>
455<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS MISSING; </td></tr>
456<tr class="a">
457<td> IS NOT MISSING </td>
458<td> Test if a value is not MISSING </td>
459<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS NOT MISSING;</td></tr>
460<tr class="b">
461<td> IS UNKNOWN </td>
462<td> Test if a value is NULL or MISSING </td>
463<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS UNKNOWN; </td></tr>
464<tr class="a">
465<td> IS NOT UNKNOWN </td>
466<td> Test if a value is neither NULL nor MISSING </td>
467<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS NOT UNKNOWN;</td></tr>
468<tr class="b">
469<td> IS VALUED </td>
470<td> Test if a value is neither NULL nor MISSING </td>
471<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS VALUED; </td></tr>
472<tr class="a">
473<td> IS NOT VALUED </td>
474<td> Test if a value is NULL or MISSING </td>
475<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name IS NOT VALUED;</td></tr>
476<tr class="b">
477<td> BETWEEN </td>
478<td> Test if a value is between a start value and <br />a end value. The comparison is inclusive <br />to both start and end values. </td>
479<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId BETWEEN 10 AND 20;</td></tr>
480<tr class="a">
481<td> = </td>
482<td> Equality test </td>
483<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId=10; </td></tr>
484<tr class="b">
485<td> != </td>
486<td> Inequality test </td>
487<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId!=10;</td></tr>
488<tr class="a">
489<td> &lt;&gt; </td>
490<td> Inequality test </td>
491<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId&lt;&gt;10;</td></tr>
492<tr class="b">
493<td> &lt; </td>
494<td> Less than </td>
495<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId&lt;10; </td></tr>
496<tr class="a">
497<td> &gt; </td>
498<td> Greater than </td>
499<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId&gt;10; </td></tr>
500<tr class="b">
501<td> &lt;= </td>
502<td> Less than or equal to </td>
503<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId&lt;=10; </td></tr>
504<tr class="a">
505<td> &gt;= </td>
506<td> Greater than or equal to </td>
507<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.chirpId&gt;=10; </td></tr>
508<tr class="b">
509<td> LIKE </td>
510<td> Test if the left side matches a<br /> pattern defined on the right<br /> side; in the pattern, &#x201c;%&#x201d; matches <br />any string while &#x201c;_&#x201d; matches <br /> any character. </td>
511<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name LIKE &#x201c;%Giesen%&#x201d;;</td></tr>
512<tr class="a">
513<td> NOT LIKE </td>
514<td> Test if the left side does not <br />match a pattern defined on the right<br /> side; in the pattern, &#x201c;%&#x201d; matches <br />any string while &#x201c;_&#x201d; matches <br /> any character. </td>
515<td> SELECT * FROM ChirpMessages cm <br />WHERE cm.user.name NOT LIKE &#x201c;%Giesen%&#x201d;;</td></tr>
516</tbody>
517</table>
518<p>The following table summarizes how the missing value comparison operators work.</p>
519<table border="0" class="table table-striped">
520<thead>
521
522<tr class="a">
523<th> Operator </th>
524<th> Non-NULL/Non-MISSING value </th>
525<th> NULL </th>
526<th> MISSING </th></tr>
527</thead><tbody>
528
529<tr class="b">
530<td> IS NULL </td>
531<td> FALSE </td>
532<td> TRUE </td>
533<td> MISSING </td></tr>
534<tr class="a">
535<td> IS NOT NULL </td>
536<td> TRUE </td>
537<td> FALSE </td>
538<td> MISSING </td></tr>
539<tr class="b">
540<td> IS MISSING </td>
541<td> FALSE </td>
542<td> FALSE </td>
543<td> TRUE </td></tr>
544<tr class="a">
545<td> IS NOT MISSING </td>
546<td> TRUE </td>
547<td> TRUE </td>
548<td> FALSE </td></tr>
549<tr class="b">
550<td> IS UNKNOWN </td>
551<td> FALSE </td>
552<td> TRUE </td>
553<td> TRUE </td></tr>
554<tr class="a">
555<td> IS NOT UNKNOWN </td>
556<td> TRUE </td>
557<td> FALSE </td>
558<td> FALSE</td></tr>
559<tr class="b">
560<td> IS VALUED </td>
561<td> TRUE </td>
562<td> FALSE </td>
563<td> FALSE </td></tr>
564<tr class="a">
565<td> IS NOT VALUED </td>
566<td> FALSE </td>
567<td> TRUE </td>
568<td> TRUE </td></tr>
569</tbody>
570</table></div></div></div>
571<div class="section">
572<h3><a name="Logical_Operators"></a><a name="Logical_operators" id="Logical_operators">Logical Operators</a></h3>
573<p>Logical operators perform logical <tt>NOT</tt>, <tt>AND</tt>, and <tt>OR</tt> operations over Boolean values (<tt>TRUE</tt> and <tt>FALSE</tt>) plus <tt>NULL</tt> and <tt>MISSING</tt>.</p>
574<table border="0" class="table table-striped">
575<thead>
576
577<tr class="a">
578<th> Operator </th>
579<th> Purpose </th>
580<th> Example </th></tr>
581</thead><tbody>
582
583<tr class="b">
584<td> NOT </td>
585<td> Returns true if the following condition is false, otherwise returns false </td>
586<td> SELECT VALUE NOT TRUE; </td></tr>
587<tr class="a">
588<td> AND </td>
589<td> Returns true if both branches are true, otherwise returns false </td>
590<td> SELECT VALUE TRUE AND FALSE; </td></tr>
591<tr class="b">
592<td> OR </td>
593<td> Returns true if one branch is true, otherwise returns false </td>
594<td> SELECT VALUE FALSE OR FALSE; </td></tr>
595</tbody>
596</table>
597<p>The following table is the truth table for <tt>AND</tt> and <tt>OR</tt>.</p>
598<table border="0" class="table table-striped">
599<thead>
600
601<tr class="a">
602<th> A </th>
603<th> B </th>
604<th> A AND B </th>
605<th> A OR B </th></tr>
606</thead><tbody>
607
608<tr class="b">
609<td> TRUE </td>
610<td> TRUE </td>
611<td> TRUE </td>
612<td> TRUE </td></tr>
613<tr class="a">
614<td> TRUE </td>
615<td> FALSE </td>
616<td> FALSE </td>
617<td> TRUE </td></tr>
618<tr class="b">
619<td> TRUE </td>
620<td> NULL </td>
621<td> NULL </td>
622<td> TRUE </td></tr>
623<tr class="a">
624<td> TRUE </td>
625<td> MISSING </td>
626<td> MISSING </td>
627<td> TRUE </td></tr>
628<tr class="b">
629<td> FALSE </td>
630<td> FALSE </td>
631<td> FALSE </td>
632<td> FALSE </td></tr>
633<tr class="a">
634<td> FALSE </td>
635<td> NULL </td>
636<td> FALSE </td>
637<td> NULL </td></tr>
638<tr class="b">
639<td> FALSE </td>
640<td> MISSING </td>
641<td> FALSE </td>
642<td> MISSING </td></tr>
643<tr class="a">
644<td> NULL </td>
645<td> NULL </td>
646<td> NULL </td>
647<td> NULL </td></tr>
648<tr class="b">
649<td> NULL </td>
650<td> MISSING </td>
651<td> MISSING </td>
652<td> NULL </td></tr>
653<tr class="a">
654<td> MISSING </td>
655<td> MISSING </td>
656<td> MISSING </td>
657<td> MISSING </td></tr>
658</tbody>
659</table>
660<p>The following table demonstrates the results of <tt>NOT</tt> on all possible inputs.</p>
661<table border="0" class="table table-striped">
662<thead>
663
664<tr class="a">
665<th> A </th>
666<th> NOT A </th></tr>
667</thead><tbody>
668
669<tr class="b">
670<td> TRUE </td>
671<td> FALSE </td></tr>
672<tr class="a">
673<td> FALSE </td>
674<td> TRUE </td></tr>
675<tr class="b">
676<td> NULL </td>
677<td> NULL </td></tr>
678<tr class="a">
679<td> MISSING </td>
680<td> MISSING </td></tr>
681</tbody>
682</table></div></div>
683<div class="section">
684<h2><a name="Case_Expressions"></a><a name="Case_expressions" id="Case_expressions">Case Expressions</a></h2>
685
686<div>
687<div>
688<pre class="source">CaseExpression ::= SimpleCaseExpression | SearchedCaseExpression
689SimpleCaseExpression ::= &lt;CASE&gt; Expression ( &lt;WHEN&gt; Expression &lt;THEN&gt; Expression )+ ( &lt;ELSE&gt; Expression )? &lt;END&gt;
690SearchedCaseExpression ::= &lt;CASE&gt; ( &lt;WHEN&gt; Expression &lt;THEN&gt; Expression )+ ( &lt;ELSE&gt; Expression )? &lt;END&gt;
691</pre></div></div>
692
693<p>In a simple <tt>CASE</tt> expression, the query evaluator searches for the first <tt>WHEN</tt> &#x2026; <tt>THEN</tt> pair in which the <tt>WHEN</tt> expression is equal to the expression following <tt>CASE</tt> and returns the expression following <tt>THEN</tt>. If none of the <tt>WHEN</tt> &#x2026; <tt>THEN</tt> pairs meet this condition, and an <tt>ELSE</tt> branch exists, it returns the <tt>ELSE</tt> expression. Otherwise, <tt>NULL</tt> is returned.</p>
694<p>In a searched CASE expression, the query evaluator searches from left to right until it finds a <tt>WHEN</tt> expression that is evaluated to <tt>TRUE</tt>, and then returns its corresponding <tt>THEN</tt> expression. If no condition is found to be <tt>TRUE</tt>, and an <tt>ELSE</tt> branch exists, it returns the <tt>ELSE</tt> expression. Otherwise, it returns <tt>NULL</tt>.</p>
695<p>The following example illustrates the form of a case expression.</p>
696<div class="section">
697<div class="section">
698<div class="section">
699<h5><a name="Example"></a>Example</h5>
700
701<div>
702<div>
703<pre class="source">CASE (2 &lt; 3) WHEN true THEN &quot;yes&quot; ELSE &quot;no&quot; END
704</pre></div></div>
705</div></div></div></div>
706<div class="section">
707<h2><a name="Quantified_Expressions"></a><a name="Quantified_expressions" id="Quantified_expressions">Quantified Expressions</a></h2>
708
709<div>
710<div>
711<pre class="source">QuantifiedExpression ::= ( (&lt;ANY&gt;|&lt;SOME&gt;) | &lt;EVERY&gt; ) Variable &lt;IN&gt; Expression ( &quot;,&quot; Variable &quot;in&quot; Expression )*
712 &lt;SATISFIES&gt; Expression (&lt;END&gt;)?
713</pre></div></div>
714
715<p>Quantified expressions are used for expressing existential or universal predicates involving the elements of a collection.</p>
716<p>The following pair of examples illustrate the use of a quantified expression to test that every (or some) element in the set [1, 2, 3] of integers is less than three. The first example yields <tt>FALSE</tt> and second example yields <tt>TRUE</tt>.</p>
717<p>It is useful to note that if the set were instead the empty set, the first expression would yield <tt>TRUE</tt> (&#x201c;every&#x201d; value in an empty set satisfies the condition) while the second expression would yield <tt>FALSE</tt> (since there isn&#x2019;t &#x201c;some&#x201d; value, as there are no values in the set, that satisfies the condition).</p>
718<p>A quantified expression will return a <tt>NULL</tt> (or <tt>MISSING</tt>) if the first expression in it evaluates to <tt>NULL</tt> (or <tt>MISSING</tt>). A type error will be raised if the first expression in a quantified expression does not return a collection.</p>
719<div class="section">
720<div class="section">
721<div class="section">
722<h5><a name="Examples"></a>Examples</h5>
723
724<div>
725<div>
726<pre class="source">EVERY x IN [ 1, 2, 3 ] SATISFIES x &lt; 3
727SOME x IN [ 1, 2, 3 ] SATISFIES x &lt; 3
728</pre></div></div>
729</div></div></div></div>
730<div class="section">
731<h2><a name="Path_Expressions"></a><a name="Path_expressions" id="Path_expressions">Path Expressions</a></h2>
732
733<div>
734<div>
735<pre class="source">PathExpression ::= PrimaryExpression ( Field | Index )*
736Field ::= &quot;.&quot; Identifier
737Index ::= &quot;[&quot; ( Expression | &quot;?&quot; ) &quot;]&quot;
738</pre></div></div>
739
740<p>Components of complex types in the data model are accessed via path expressions. Path access can be applied to the result of a SQL++ expression that yields an instance of a complex type, for example, a object or array instance. For objects, path access is based on field names. For arrays, path access is based on (zero-based) array-style indexing. SQL++ also supports an &#x201c;I&#x2019;m feeling lucky&#x201d; style index accessor, [?], for selecting an arbitrary element from an array. Attempts to access non-existent fields or out-of-bound array elements produce the special value <tt>MISSING</tt>. Type errors will be raised for inappropriate use of a path expression, such as applying a field accessor to a numeric value.</p>
741<p>The following examples illustrate field access for a object, index-based element access for an array, and also a composition thereof.</p>
742<div class="section">
743<div class="section">
744<div class="section">
745<h5><a name="Examples"></a>Examples</h5>
746
747<div>
748<div>
749<pre class="source">({&quot;name&quot;: &quot;MyABCs&quot;, &quot;array&quot;: [ &quot;a&quot;, &quot;b&quot;, &quot;c&quot;]}).array
750
751([&quot;a&quot;, &quot;b&quot;, &quot;c&quot;])[2]
752
753({&quot;name&quot;: &quot;MyABCs&quot;, &quot;array&quot;: [ &quot;a&quot;, &quot;b&quot;, &quot;c&quot;]}).array[2]
754</pre></div></div>
755</div></div></div></div>
756<div class="section">
757<h2><a name="Primary_Expressions"></a><a name="Primary_expressions" id="Primary_expressions">Primary Expressions</a></h2>
758
759<div>
760<div>
761<pre class="source">PrimaryExpr ::= Literal
762 | VariableReference
763 | ParenthesizedExpression
764 | FunctionCallExpression
765 | Constructor
766</pre></div></div>
767
768<p>The most basic building block for any SQL++ expression is PrimaryExpression. This can be a simple literal (constant) value, a reference to a query variable that is in scope, a parenthesized expression, a function call, or a newly constructed instance of the data model (such as a newly constructed object, array, or multiset of data model instances).</p></div>
769<div class="section">
770<h2><a name="Literals" id="Literals">Literals</a></h2>
771
772<div>
773<div>
774<pre class="source">Literal ::= StringLiteral
775 | IntegerLiteral
776 | FloatLiteral
777 | DoubleLiteral
778 | &lt;NULL&gt;
779 | &lt;MISSING&gt;
780 | &lt;TRUE&gt;
781 | &lt;FALSE&gt;
782StringLiteral ::= &quot;\&quot;&quot; (
783 &lt;EscapeQuot&gt;
784 | &lt;EscapeBslash&gt;
785 | &lt;EscapeSlash&gt;
786 | &lt;EscapeBspace&gt;
787 | &lt;EscapeFormf&gt;
788 | &lt;EscapeNl&gt;
789 | &lt;EscapeCr&gt;
790 | &lt;EscapeTab&gt;
791 | ~[&quot;\&quot;&quot;,&quot;\\&quot;])*
792 &quot;\&quot;&quot;
793 | &quot;\'&quot;(
794 &lt;EscapeApos&gt;
795 | &lt;EscapeBslash&gt;
796 | &lt;EscapeSlash&gt;
797 | &lt;EscapeBspace&gt;
798 | &lt;EscapeFormf&gt;
799 | &lt;EscapeNl&gt;
800 | &lt;EscapeCr&gt;
801 | &lt;EscapeTab&gt;
802 | ~[&quot;\'&quot;,&quot;\\&quot;])*
803 &quot;\'&quot;
804&lt;ESCAPE_Apos&gt; ::= &quot;\\\'&quot;
805&lt;ESCAPE_Quot&gt; ::= &quot;\\\&quot;&quot;
806&lt;EscapeBslash&gt; ::= &quot;\\\\&quot;
807&lt;EscapeSlash&gt; ::= &quot;\\/&quot;
808&lt;EscapeBspace&gt; ::= &quot;\\b&quot;
809&lt;EscapeFormf&gt; ::= &quot;\\f&quot;
810&lt;EscapeNl&gt; ::= &quot;\\n&quot;
811&lt;EscapeCr&gt; ::= &quot;\\r&quot;
812&lt;EscapeTab&gt; ::= &quot;\\t&quot;
813
814IntegerLiteral ::= &lt;DIGITS&gt;
815&lt;DIGITS&gt; ::= [&quot;0&quot; - &quot;9&quot;]+
816FloatLiteral ::= &lt;DIGITS&gt; ( &quot;f&quot; | &quot;F&quot; )
817 | &lt;DIGITS&gt; ( &quot;.&quot; &lt;DIGITS&gt; ( &quot;f&quot; | &quot;F&quot; ) )?
818 | &quot;.&quot; &lt;DIGITS&gt; ( &quot;f&quot; | &quot;F&quot; )
819DoubleLiteral ::= &lt;DIGITS&gt; &quot;.&quot; &lt;DIGITS&gt;
820 | &quot;.&quot; &lt;DIGITS&gt;
821</pre></div></div>
822
823<p>Literals (constants) in SQL++ can be strings, integers, floating point values, double values, boolean constants, or special constant values like <tt>NULL</tt> and <tt>MISSING</tt>. The <tt>NULL</tt> value is like a <tt>NULL</tt> in SQL; it is used to represent an unknown field value. The specialy value <tt>MISSING</tt> is only meaningful in the context of SQL++ field accesses; it occurs when the accessed field simply does not exist at all in a object being accessed.</p>
824<p>The following are some simple examples of SQL++ literals.</p>
825<div class="section">
826<div class="section">
827<div class="section">
828<h5><a name="Examples"></a>Examples</h5>
829
830<div>
831<div>
832<pre class="source">'a string'
833&quot;test string&quot;
83442
835</pre></div></div>
836
837<p>Different from standard SQL, double quotes play the same role as single quotes and may be used for string literals in SQL++.</p></div></div></div>
838<div class="section">
839<h3><a name="Variable_References"></a><a name="Variable_references" id="Variable_references">Variable References</a></h3>
840
841<div>
842<div>
843<pre class="source">VariableReference ::= &lt;IDENTIFIER&gt;|&lt;DelimitedIdentifier&gt;
844&lt;IDENTIFIER&gt; ::= (&lt;LETTER&gt; | &quot;_&quot;) (&lt;LETTER&gt; | &lt;DIGIT&gt; | &quot;_&quot; | &quot;$&quot;)*
845&lt;LETTER&gt; ::= [&quot;A&quot; - &quot;Z&quot;, &quot;a&quot; - &quot;z&quot;]
846DelimitedIdentifier ::= &quot;`&quot; (&lt;EscapeQuot&gt;
847 | &lt;EscapeBslash&gt;
848 | &lt;EscapeSlash&gt;
849 | &lt;EscapeBspace&gt;
850 | &lt;EscapeFormf&gt;
851 | &lt;EscapeNl&gt;
852 | &lt;EscapeCr&gt;
853 | &lt;EscapeTab&gt;
854 | ~[&quot;`&quot;,&quot;\\&quot;])*
855 &quot;`&quot;
856</pre></div></div>
857
858<p>A variable in SQL++ can be bound to any legal data model value. A variable reference refers to the value to which an in-scope variable is bound. (E.g., a variable binding may originate from one of the <tt>FROM</tt>, <tt>WITH</tt> or <tt>LET</tt> clauses of a <tt>SELECT</tt> statement or from an input parameter in the context of a function body.) Backticks, for example, `id`, are used for delimited identifiers. Delimiting is needed when a variable&#x2019;s desired name clashes with a SQL++ keyword or includes characters not allowed in regular identifiers. More information on exactly how variable references are resolved can be found in the appendix section on Variable Resolution.</p>
859<div class="section">
860<div class="section">
861<h5><a name="Examples"></a>Examples</h5>
862
863<div>
864<div>
865<pre class="source">tweet
866id
867`SELECT`
868`my-function`
869</pre></div></div>
870</div></div></div>
871<div class="section">
872<h3><a name="Parenthesized_Expressions"></a><a name="Parenthesized_expressions" id="Parenthesized_expressions">Parenthesized Expressions</a></h3>
873
874<div>
875<div>
876<pre class="source">ParenthesizedExpression ::= &quot;(&quot; Expression &quot;)&quot; | Subquery
877</pre></div></div>
878
879<p>An expression can be parenthesized to control the precedence order or otherwise clarify a query. In SQL++, for composability, a subquery is also an parenthesized expression.</p>
880<p>The following expression evaluates to the value 2.</p>
881<div class="section">
882<div class="section">
883<h5><a name="Example"></a>Example</h5>
884
885<div>
886<div>
887<pre class="source">( 1 + 1 )
888</pre></div></div>
889</div></div></div>
890<div class="section">
891<h3><a name="Function_Call_Expressions"></a><a name="Function_call_expressions" id="Function_call_expressions">Function Call Expressions</a></h3>
892
893<div>
894<div>
895<pre class="source">FunctionCallExpression ::= FunctionName &quot;(&quot; ( Expression ( &quot;,&quot; Expression )* )? &quot;)&quot;
896</pre></div></div>
897
898<p>Functions are included in SQL++, like most languages, as a way to package useful functionality or to componentize complicated or reusable SQL++ computations. A function call is a legal SQL++ query expression that represents the value resulting from the evaluation of its body expression with the given parameter bindings; the parameter value bindings can themselves be any SQL++ expressions.</p>
899<p>The following example is a (built-in) function call expression whose value is 8.</p>
900<div class="section">
901<div class="section">
902<h5><a name="Example"></a>Example</h5>
903
904<div>
905<div>
906<pre class="source">length('a string')
907</pre></div></div>
908</div></div></div>
909<div class="section">
910<h3><a name="Constructors" id="Constructors">Constructors</a></h3>
911
912<div>
913<div>
914<pre class="source">Constructor ::= ArrayConstructor | MultisetConstructor | ObjectConstructor
915ArrayConstructor ::= &quot;[&quot; ( Expression ( &quot;,&quot; Expression )* )? &quot;]&quot;
916MultisetConstructor ::= &quot;{{&quot; ( Expression ( &quot;,&quot; Expression )* )? &quot;}}&quot;
917ObjectConstructor ::= &quot;{&quot; ( FieldBinding ( &quot;,&quot; FieldBinding )* )? &quot;}&quot;
918FieldBinding ::= Expression &quot;:&quot; Expression
919</pre></div></div>
920
921<p>A major feature of SQL++ is its ability to construct new data model instances. This is accomplished using its constructors for each of the model&#x2019;s complex object structures, namely arrays, multisets, and objects. Arrays are like JSON arrays, while multisets have bag semantics. Objects are built from fields that are field-name/field-value pairs, again like JSON.</p>
922<p>The following examples illustrate how to construct a new array with 4 items and a new object with 2 fields respectively. Array elements can be homogeneous (as in the first example), which is the common case, or they may be heterogeneous (as in the second example). The data values and field name values used to construct arrays, multisets, and objects in constructors are all simply SQL++ expressions. Thus, the collection elements, field names, and field values used in constructors can be simple literals or they can come from query variable references or even arbitrarily complex SQL++ expressions (subqueries). Type errors will be raised if the field names in an object are not strings, and duplicate field errors will be raised if they are not distinct.</p>
923<div class="section">
924<div class="section">
925<h5><a name="Examples"></a>Examples</h5>
926
927<div>
928<div>
929<pre class="source">[ 'a', 'b', 'c', 'c' ]
930
931[ 42, &quot;forty-two!&quot;, { &quot;rank&quot; : &quot;Captain&quot;, &quot;name&quot;: &quot;America&quot; }, 3.14159 ]
932
933{
934 'project name': 'Hyracks',
935 'project members': [ 'vinayakb', 'dtabass', 'chenli', 'tsotras', 'tillw' ]
936}
937</pre></div></div>
938<!--
939 ! Licensed to the Apache Software Foundation (ASF) under one
940 ! or more contributor license agreements. See the NOTICE file
941 ! distributed with this work for additional information
942 ! regarding copyright ownership. The ASF licenses this file
943 ! to you under the Apache License, Version 2.0 (the
944 ! "License"); you may not use this file except in compliance
945 ! with the License. You may obtain a copy of the License at
946 !
947 ! http://www.apache.org/licenses/LICENSE-2.0
948 !
949 ! Unless required by applicable law or agreed to in writing,
950 ! software distributed under the License is distributed on an
951 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
952 ! KIND, either express or implied. See the License for the
953 ! specific language governing permissions and limitations
954 ! under the License.
955 !-->
956
957<h1><a name="Queries" id="Queries">3. Queries</a></h1>
958<p>A SQL++ query can be any legal SQL++ expression or <tt>SELECT</tt> statement. A SQL++ query always ends with a semicolon.</p>
959
960<div>
961<div>
962<pre class="source">Query ::= (Expression | SelectStatement) &quot;;&quot;
963</pre></div></div>
964<!--
965 ! Licensed to the Apache Software Foundation (ASF) under one
966 ! or more contributor license agreements. See the NOTICE file
967 ! distributed with this work for additional information
968 ! regarding copyright ownership. The ASF licenses this file
969 ! to you under the Apache License, Version 2.0 (the
970 ! "License"); you may not use this file except in compliance
971 ! with the License. You may obtain a copy of the License at
972 !
973 ! http://www.apache.org/licenses/LICENSE-2.0
974 !
975 ! Unless required by applicable law or agreed to in writing,
976 ! software distributed under the License is distributed on an
977 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
978 ! KIND, either express or implied. See the License for the
979 ! specific language governing permissions and limitations
980 ! under the License.
981 !-->
982</div></div></div></div>
983<div class="section">
984<h2><a name="Declarations" id="Declarations">Declarations</a></h2>
985
986<div>
987<div>
988<pre class="source">DatabaseDeclaration ::= &quot;USE&quot; Identifier
989</pre></div></div>
990
991<p>At the uppermost level, the world of data is organized into data namespaces called <b>dataverses</b>. To set the default dataverse for a series of statements, the USE statement is provided in SQL++.</p>
992<p>As an example, the following statement sets the default dataverse to be &#x201c;TinySocial&#x201d;.</p>
993<div class="section">
994<div class="section">
995<div class="section">
996<h5><a name="Example"></a>Example</h5>
997
998<div>
999<div>
1000<pre class="source">USE TinySocial;
1001</pre></div></div>
1002<!--
1003 ! Licensed to the Apache Software Foundation (ASF) under one
1004 ! or more contributor license agreements. See the NOTICE file
1005 ! distributed with this work for additional information
1006 ! regarding copyright ownership. The ASF licenses this file
1007 ! to you under the Apache License, Version 2.0 (the
1008 ! "License"); you may not use this file except in compliance
1009 ! with the License. You may obtain a copy of the License at
1010 !
1011 ! http://www.apache.org/licenses/LICENSE-2.0
1012 !
1013 ! Unless required by applicable law or agreed to in writing,
1014 ! software distributed under the License is distributed on an
1015 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
1016 ! KIND, either express or implied. See the License for the
1017 ! specific language governing permissions and limitations
1018 ! under the License.
1019 !-->
1020
1021<p>When writing a complex SQL++ query, it can sometimes be helpful to define one or more auxilliary functions that each address a sub-piece of the overall query. The declare function statement supports the creation of such helper functions. In general, the function body (expression) can be any legal SQL++ query expression.</p>
1022
1023<div>
1024<div>
1025<pre class="source">FunctionDeclaration ::= &quot;DECLARE&quot; &quot;FUNCTION&quot; Identifier ParameterList &quot;{&quot; Expression &quot;}&quot;
1026ParameterList ::= &quot;(&quot; ( &lt;VARIABLE&gt; ( &quot;,&quot; &lt;VARIABLE&gt; )* )? &quot;)&quot;
1027</pre></div></div>
1028
1029<p>The following is a simple example of a temporary SQL++ function definition and its use.</p></div>
1030<div class="section">
1031<h5><a name="Example"></a>Example</h5>
1032
1033<div>
1034<div>
1035<pre class="source">DECLARE FUNCTION friendInfo(userId) {
1036 (SELECT u.id, u.name, len(u.friendIds) AS friendCount
1037 FROM GleambookUsers u
1038 WHERE u.id = userId)[0]
1039 };
1040
1041SELECT VALUE friendInfo(2);
1042</pre></div></div>
1043
1044<p>For our sample data set, this returns:</p>
1045
1046<div>
1047<div>
1048<pre class="source">[
1049 { &quot;id&quot;: 2, &quot;name&quot;: &quot;IsbelDull&quot;, &quot;friendCount&quot;: 2 }
1050]
1051</pre></div></div>
1052<!--
1053 ! Licensed to the Apache Software Foundation (ASF) under one
1054 ! or more contributor license agreements. See the NOTICE file
1055 ! distributed with this work for additional information
1056 ! regarding copyright ownership. The ASF licenses this file
1057 ! to you under the Apache License, Version 2.0 (the
1058 ! "License"); you may not use this file except in compliance
1059 ! with the License. You may obtain a copy of the License at
1060 !
1061 ! http://www.apache.org/licenses/LICENSE-2.0
1062 !
1063 ! Unless required by applicable law or agreed to in writing,
1064 ! software distributed under the License is distributed on an
1065 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
1066 ! KIND, either express or implied. See the License for the
1067 ! specific language governing permissions and limitations
1068 ! under the License.
1069 !-->
1070</div></div></div></div>
1071<div class="section">
1072<h2><a name="SELECT_Statements"></a><a name="SELECT_statements" id="SELECT_statements">SELECT Statements</a></h2>
1073<p>The following shows the (rich) grammar for the <tt>SELECT</tt> statement in SQL++.</p>
1074
1075<div>
1076<div>
1077<pre class="source">SelectStatement ::= ( WithClause )?
1078 SelectSetOperation (OrderbyClause )? ( LimitClause )?
1079SelectSetOperation ::= SelectBlock (&lt;UNION&gt; &lt;ALL&gt; ( SelectBlock | Subquery ) )*
1080Subquery ::= &quot;(&quot; SelectStatement &quot;)&quot;
1081
1082SelectBlock ::= SelectClause
1083 ( FromClause ( LetClause )?)?
1084 ( WhereClause )?
1085 ( GroupbyClause ( LetClause )? ( HavingClause )? )?
1086 |
1087 FromClause ( LetClause )?
1088 ( WhereClause )?
1089 ( GroupbyClause ( LetClause )? ( HavingClause )? )?
1090 SelectClause
1091
1092SelectClause ::= &lt;SELECT&gt; ( &lt;ALL&gt; | &lt;DISTINCT&gt; )? ( SelectRegular | SelectValue )
1093SelectRegular ::= Projection ( &quot;,&quot; Projection )*
1094SelectValue ::= ( &lt;VALUE&gt; | &lt;ELEMENT&gt; | &lt;RAW&gt; ) Expression
1095Projection ::= ( Expression ( &lt;AS&gt; )? Identifier | &quot;*&quot; )
1096
1097FromClause ::= &lt;FROM&gt; FromTerm ( &quot;,&quot; FromTerm )*
1098FromTerm ::= Expression (( &lt;AS&gt; )? Variable)?
1099 ( ( JoinType )? ( JoinClause | UnnestClause ) )*
1100
1101JoinClause ::= &lt;JOIN&gt; Expression (( &lt;AS&gt; )? Variable)? &lt;ON&gt; Expression
1102UnnestClause ::= ( &lt;UNNEST&gt; ) Expression
1103 ( &lt;AS&gt; )? Variable ( &lt;AT&gt; Variable )?
1104JoinType ::= ( &lt;INNER&gt; | &lt;LEFT&gt; ( &lt;OUTER&gt; )? )
1105
1106WithClause ::= &lt;WITH&gt; WithElement ( &quot;,&quot; WithElement )*
1107LetClause ::= (&lt;LET&gt; | &lt;LETTING&gt;) LetElement ( &quot;,&quot; LetElement )*
1108LetElement ::= Variable &quot;=&quot; Expression
1109WithElement ::= Variable &lt;AS&gt; Expression
1110
1111WhereClause ::= &lt;WHERE&gt; Expression
1112
1113GroupbyClause ::= &lt;GROUP&gt; &lt;BY&gt; Expression ( ( (&lt;AS&gt;)? Variable )?
1114 ( &quot;,&quot; Expression ( (&lt;AS&gt;)? Variable )? )* )
1115 ( &lt;GROUP&gt; &lt;AS&gt; Variable
1116 (&quot;(&quot; VariableReference &lt;AS&gt; Identifier
1117 (&quot;,&quot; VariableReference &lt;AS&gt; Identifier )* &quot;)&quot;)?
1118 )?
1119HavingClause ::= &lt;HAVING&gt; Expression
1120
1121OrderbyClause ::= &lt;ORDER&gt; &lt;BY&gt; Expression ( &lt;ASC&gt; | &lt;DESC&gt; )?
1122 ( &quot;,&quot; Expression ( &lt;ASC&gt; | &lt;DESC&gt; )? )*
1123LimitClause ::= &lt;LIMIT&gt; Expression ( &lt;OFFSET&gt; Expression )?
1124</pre></div></div>
1125
1126<p>In this section, we will make use of two stored collections of objects (datasets), <tt>GleambookUsers</tt> and <tt>GleambookMessages</tt>, in a series of running examples to explain <tt>SELECT</tt> queries. The contents of the example collections are as follows:</p>
1127<p><tt>GleambookUsers</tt> collection (or, dataset):</p>
1128
1129<div>
1130<div>
1131<pre class="source">[ {
1132 &quot;id&quot;:1,
1133 &quot;alias&quot;:&quot;Margarita&quot;,
1134 &quot;name&quot;:&quot;MargaritaStoddard&quot;,
1135 &quot;nickname&quot;:&quot;Mags&quot;,
1136 &quot;userSince&quot;:&quot;2012-08-20T10:10:00&quot;,
1137 &quot;friendIds&quot;:[2,3,6,10],
1138 &quot;employment&quot;:[{
1139 &quot;organizationName&quot;:&quot;Codetechno&quot;,
1140 &quot;start-date&quot;:&quot;2006-08-06&quot;
1141 },
1142 {
1143 &quot;organizationName&quot;:&quot;geomedia&quot;,
1144 &quot;start-date&quot;:&quot;2010-06-17&quot;,
1145 &quot;end-date&quot;:&quot;2010-01-26&quot;
1146 }],
1147 &quot;gender&quot;:&quot;F&quot;
1148},
1149{
1150 &quot;id&quot;:2,
1151 &quot;alias&quot;:&quot;Isbel&quot;,
1152 &quot;name&quot;:&quot;IsbelDull&quot;,
1153 &quot;nickname&quot;:&quot;Izzy&quot;,
1154 &quot;userSince&quot;:&quot;2011-01-22T10:10:00&quot;,
1155 &quot;friendIds&quot;:[1,4],
1156 &quot;employment&quot;:[{
1157 &quot;organizationName&quot;:&quot;Hexviafind&quot;,
1158 &quot;startDate&quot;:&quot;2010-04-27&quot;
1159 }]
1160},
1161{
1162 &quot;id&quot;:3,
1163 &quot;alias&quot;:&quot;Emory&quot;,
1164 &quot;name&quot;:&quot;EmoryUnk&quot;,
1165 &quot;userSince&quot;:&quot;2012-07-10T10:10:00&quot;,
1166 &quot;friendIds&quot;:[1,5,8,9],
1167 &quot;employment&quot;:[{
1168 &quot;organizationName&quot;:&quot;geomedia&quot;,
1169 &quot;startDate&quot;:&quot;2010-06-17&quot;,
1170 &quot;endDate&quot;:&quot;2010-01-26&quot;
1171 }]
1172} ]
1173</pre></div></div>
1174
1175<p><tt>GleambookMessages</tt> collection (or, dataset):</p>
1176
1177<div>
1178<div>
1179<pre class="source">[ {
1180 &quot;messageId&quot;:2,
1181 &quot;authorId&quot;:1,
1182 &quot;inResponseTo&quot;:4,
1183 &quot;senderLocation&quot;:[41.66,80.87],
1184 &quot;message&quot;:&quot; dislike x-phone its touch-screen is horrible&quot;
1185},
1186{
1187 &quot;messageId&quot;:3,
1188 &quot;authorId&quot;:2,
1189 &quot;inResponseTo&quot;:4,
1190 &quot;senderLocation&quot;:[48.09,81.01],
1191 &quot;message&quot;:&quot; like product-y the plan is amazing&quot;
1192},
1193{
1194 &quot;messageId&quot;:4,
1195 &quot;authorId&quot;:1,
1196 &quot;inResponseTo&quot;:2,
1197 &quot;senderLocation&quot;:[37.73,97.04],
1198 &quot;message&quot;:&quot; can't stand acast the network is horrible:(&quot;
1199},
1200{
1201 &quot;messageId&quot;:6,
1202 &quot;authorId&quot;:2,
1203 &quot;inResponseTo&quot;:1,
1204 &quot;senderLocation&quot;:[31.5,75.56],
1205 &quot;message&quot;:&quot; like product-z its platform is mind-blowing&quot;
1206}
1207{
1208 &quot;messageId&quot;:8,
1209 &quot;authorId&quot;:1,
1210 &quot;inResponseTo&quot;:11,
1211 &quot;senderLocation&quot;:[40.33,80.87],
1212 &quot;message&quot;:&quot; like ccast the 3G is awesome:)&quot;
1213},
1214{
1215 &quot;messageId&quot;:10,
1216 &quot;authorId&quot;:1,
1217 &quot;inResponseTo&quot;:12,
1218 &quot;senderLocation&quot;:[42.5,70.01],
1219 &quot;message&quot;:&quot; can't stand product-w the touch-screen is terrible&quot;
1220},
1221{
1222 &quot;messageId&quot;:11,
1223 &quot;authorId&quot;:1,
1224 &quot;inResponseTo&quot;:1,
1225 &quot;senderLocation&quot;:[38.97,77.49],
1226 &quot;message&quot;:&quot; can't stand acast its plan is terrible&quot;
1227} ]
1228</pre></div></div>
1229</div>
1230<div class="section">
1231<h2><a name="SELECT_Clause"></a><a name="Select_clauses" id="Select_clauses">SELECT Clause</a></h2>
1232<p>The SQL++ <tt>SELECT</tt> clause always returns a collection value as its result (even if the result is empty or a singleton).</p>
1233<div class="section">
1234<h3><a name="Select_Element.2FValue.2FRaw"></a><a name="Select_element" id="Select_element">Select Element/Value/Raw</a></h3>
1235<p>The <tt>SELECT VALUE</tt> clause in SQL++ returns an array or multiset that contains the results of evaluating the <tt>VALUE</tt> expression, with one evaluation being performed per &#x201c;binding tuple&#x201d; (i.e., per <tt>FROM</tt> clause item) satisfying the statement&#x2019;s selection criteria. For historical reasons SQL++ also allows the keywords <tt>ELEMENT</tt> or <tt>RAW</tt> to be used in place of <tt>VALUE</tt> (not recommended).</p>
1236<p>If there is no FROM clause, the expression after <tt>VALUE</tt> is evaluated once with no binding tuples (except those inherited from an outer environment).</p>
1237<div class="section">
1238<div class="section">
1239<h5><a name="Example"></a>Example</h5>
1240
1241<div>
1242<div>
1243<pre class="source">SELECT VALUE 1;
1244</pre></div></div>
1245
1246<p>This query returns:</p>
1247
1248<div>
1249<div>
1250<pre class="source">[
1251 1
1252]
1253</pre></div></div>
1254
1255<p>The following example shows a query that selects one user from the GleambookUsers collection.</p></div>
1256<div class="section">
1257<h5><a name="Example"></a>Example</h5>
1258
1259<div>
1260<div>
1261<pre class="source">SELECT VALUE user
1262FROM GleambookUsers user
1263WHERE user.id = 1;
1264</pre></div></div>
1265
1266<p>This query returns:</p>
1267
1268<div>
1269<div>
1270<pre class="source">[{
1271 &quot;userSince&quot;: &quot;2012-08-20T10:10:00.000Z&quot;,
1272 &quot;friendIds&quot;: [
1273 2,
1274 3,
1275 6,
1276 10
1277 ],
1278 &quot;gender&quot;: &quot;F&quot;,
1279 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1280 &quot;nickname&quot;: &quot;Mags&quot;,
1281 &quot;alias&quot;: &quot;Margarita&quot;,
1282 &quot;id&quot;: 1,
1283 &quot;employment&quot;: [
1284 {
1285 &quot;organizationName&quot;: &quot;Codetechno&quot;,
1286 &quot;start-date&quot;: &quot;2006-08-06&quot;
1287 },
1288 {
1289 &quot;end-date&quot;: &quot;2010-01-26&quot;,
1290 &quot;organizationName&quot;: &quot;geomedia&quot;,
1291 &quot;start-date&quot;: &quot;2010-06-17&quot;
1292 }
1293 ]
1294} ]
1295</pre></div></div>
1296</div></div></div>
1297<div class="section">
1298<h3><a name="SQL-style_SELECT"></a><a name="SQL_select" id="SQL_select">SQL-style SELECT</a></h3>
1299<p>In SQL++, the traditional SQL-style <tt>SELECT</tt> syntax is also supported. This syntax can also be reformulated in a <tt>SELECT VALUE</tt> based manner in SQL++. (E.g., <tt>SELECT expA AS fldA, expB AS fldB</tt> is syntactic sugar for <tt>SELECT VALUE { 'fldA': expA, 'fldB': expB }</tt>.) Unlike in SQL, the result of an SQL++ query does not preserve the order of expressions in the <tt>SELECT</tt> clause.</p>
1300<div class="section">
1301<div class="section">
1302<h5><a name="Example"></a>Example</h5>
1303
1304<div>
1305<div>
1306<pre class="source">SELECT user.alias user_alias, user.name user_name
1307FROM GleambookUsers user
1308WHERE user.id = 1;
1309</pre></div></div>
1310
1311<p>Returns:</p>
1312
1313<div>
1314<div>
1315<pre class="source">[ {
1316 &quot;user_name&quot;: &quot;MargaritaStoddard&quot;,
1317 &quot;user_alias&quot;: &quot;Margarita&quot;
1318} ]
1319</pre></div></div>
1320</div></div></div>
1321<div class="section">
1322<h3><a name="SELECT_.2A"></a><a name="Select_star" id="Select_star">SELECT *</a></h3>
1323<p>In SQL++, <tt>SELECT *</tt> returns a object with a nested field for each input tuple. Each field has as its field name the name of a binding variable generated by either the <tt>FROM</tt> clause or <tt>GROUP BY</tt> clause in the current enclosing <tt>SELECT</tt> statement, and its field value is the value of that binding variable.</p>
1324<p>Note that the result of <tt>SELECT *</tt> is different from the result of query that selects all the fields of an object.</p>
1325<div class="section">
1326<div class="section">
1327<h5><a name="Example"></a>Example</h5>
1328
1329<div>
1330<div>
1331<pre class="source">SELECT *
1332FROM GleambookUsers user;
1333</pre></div></div>
1334
1335<p>Since <tt>user</tt> is the only binding variable generated in the <tt>FROM</tt> clause, this query returns:</p>
1336
1337<div>
1338<div>
1339<pre class="source">[ {
1340 &quot;user&quot;: {
1341 &quot;userSince&quot;: &quot;2012-08-20T10:10:00.000Z&quot;,
1342 &quot;friendIds&quot;: [
1343 2,
1344 3,
1345 6,
1346 10
1347 ],
1348 &quot;gender&quot;: &quot;F&quot;,
1349 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1350 &quot;nickname&quot;: &quot;Mags&quot;,
1351 &quot;alias&quot;: &quot;Margarita&quot;,
1352 &quot;id&quot;: 1,
1353 &quot;employment&quot;: [
1354 {
1355 &quot;organizationName&quot;: &quot;Codetechno&quot;,
1356 &quot;start-date&quot;: &quot;2006-08-06&quot;
1357 },
1358 {
1359 &quot;end-date&quot;: &quot;2010-01-26&quot;,
1360 &quot;organizationName&quot;: &quot;geomedia&quot;,
1361 &quot;start-date&quot;: &quot;2010-06-17&quot;
1362 }
1363 ]
1364 }
1365}, {
1366 &quot;user&quot;: {
1367 &quot;userSince&quot;: &quot;2011-01-22T10:10:00.000Z&quot;,
1368 &quot;friendIds&quot;: [
1369 1,
1370 4
1371 ],
1372 &quot;name&quot;: &quot;IsbelDull&quot;,
1373 &quot;nickname&quot;: &quot;Izzy&quot;,
1374 &quot;alias&quot;: &quot;Isbel&quot;,
1375 &quot;id&quot;: 2,
1376 &quot;employment&quot;: [
1377 {
1378 &quot;organizationName&quot;: &quot;Hexviafind&quot;,
1379 &quot;startDate&quot;: &quot;2010-04-27&quot;
1380 }
1381 ]
1382 }
1383}, {
1384 &quot;user&quot;: {
1385 &quot;userSince&quot;: &quot;2012-07-10T10:10:00.000Z&quot;,
1386 &quot;friendIds&quot;: [
1387 1,
1388 5,
1389 8,
1390 9
1391 ],
1392 &quot;name&quot;: &quot;EmoryUnk&quot;,
1393 &quot;alias&quot;: &quot;Emory&quot;,
1394 &quot;id&quot;: 3,
1395 &quot;employment&quot;: [
1396 {
1397 &quot;organizationName&quot;: &quot;geomedia&quot;,
1398 &quot;endDate&quot;: &quot;2010-01-26&quot;,
1399 &quot;startDate&quot;: &quot;2010-06-17&quot;
1400 }
1401 ]
1402 }
1403} ]
1404</pre></div></div>
1405</div>
1406<div class="section">
1407<h5><a name="Example"></a>Example</h5>
1408
1409<div>
1410<div>
1411<pre class="source">SELECT *
1412FROM GleambookUsers u, GleambookMessages m
1413WHERE m.authorId = u.id and u.id = 2;
1414</pre></div></div>
1415
1416<p>This query does an inner join that we will discuss in <a href="#Multiple_from_terms">multiple from terms</a>. Since both <tt>u</tt> and <tt>m</tt> are binding variables generated in the <tt>FROM</tt> clause, this query returns:</p>
1417
1418<div>
1419<div>
1420<pre class="source">[ {
1421 &quot;u&quot;: {
1422 &quot;userSince&quot;: &quot;2011-01-22T10:10:00&quot;,
1423 &quot;friendIds&quot;: [
1424 1,
1425 4
1426 ],
1427 &quot;name&quot;: &quot;IsbelDull&quot;,
1428 &quot;nickname&quot;: &quot;Izzy&quot;,
1429 &quot;alias&quot;: &quot;Isbel&quot;,
1430 &quot;id&quot;: 2,
1431 &quot;employment&quot;: [
1432 {
1433 &quot;organizationName&quot;: &quot;Hexviafind&quot;,
1434 &quot;startDate&quot;: &quot;2010-04-27&quot;
1435 }
1436 ]
1437 },
1438 &quot;m&quot;: {
1439 &quot;senderLocation&quot;: [
1440 31.5,
1441 75.56
1442 ],
1443 &quot;inResponseTo&quot;: 1,
1444 &quot;messageId&quot;: 6,
1445 &quot;authorId&quot;: 2,
1446 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
1447 }
1448}, {
1449 &quot;u&quot;: {
1450 &quot;userSince&quot;: &quot;2011-01-22T10:10:00&quot;,
1451 &quot;friendIds&quot;: [
1452 1,
1453 4
1454 ],
1455 &quot;name&quot;: &quot;IsbelDull&quot;,
1456 &quot;nickname&quot;: &quot;Izzy&quot;,
1457 &quot;alias&quot;: &quot;Isbel&quot;,
1458 &quot;id&quot;: 2,
1459 &quot;employment&quot;: [
1460 {
1461 &quot;organizationName&quot;: &quot;Hexviafind&quot;,
1462 &quot;startDate&quot;: &quot;2010-04-27&quot;
1463 }
1464 ]
1465 },
1466 &quot;m&quot;: {
1467 &quot;senderLocation&quot;: [
1468 48.09,
1469 81.01
1470 ],
1471 &quot;inResponseTo&quot;: 4,
1472 &quot;messageId&quot;: 3,
1473 &quot;authorId&quot;: 2,
1474 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
1475 }
1476} ]
1477</pre></div></div>
1478</div></div></div>
1479<div class="section">
1480<h3><a name="SELECT_DISTINCT"></a><a name="Select_distinct" id="Select_distinct">SELECT DISTINCT</a></h3>
1481<p>SQL++&#x2019;s <tt>DISTINCT</tt> keyword is used to eliminate duplicate items in results. The following example shows how it works.</p>
1482<div class="section">
1483<div class="section">
1484<h5><a name="Example"></a>Example</h5>
1485
1486<div>
1487<div>
1488<pre class="source">SELECT DISTINCT * FROM [1, 2, 2, 3] AS foo;
1489</pre></div></div>
1490
1491<p>This query returns:</p>
1492
1493<div>
1494<div>
1495<pre class="source">[ {
1496 &quot;foo&quot;: 1
1497}, {
1498 &quot;foo&quot;: 2
1499}, {
1500 &quot;foo&quot;: 3
1501} ]
1502</pre></div></div>
1503</div>
1504<div class="section">
1505<h5><a name="Example"></a>Example</h5>
1506
1507<div>
1508<div>
1509<pre class="source">SELECT DISTINCT VALUE foo FROM [1, 2, 2, 3] AS foo;
1510</pre></div></div>
1511
1512<p>This version of the query returns:</p>
1513
1514<div>
1515<div>
1516<pre class="source">[ 1
1517, 2
1518, 3
1519 ]
1520</pre></div></div>
1521</div></div></div>
1522<div class="section">
1523<h3><a name="Unnamed_Projections"></a><a name="Unnamed_projections" id="Unnamed_projections">Unnamed Projections</a></h3>
1524<p>Similar to standard SQL, SQL++ supports unnamed projections (a.k.a, unnamed <tt>SELECT</tt> clause items), for which names are generated. Name generation has three cases:</p>
1525<ul>
1526
1527<li>If a projection expression is a variable reference expression, its generated name is the name of the variable.</li>
1528<li>If a projection expression is a field access expression, its generated name is the last identifier in the expression.</li>
1529<li>For all other cases, the query processor will generate a unique name.</li>
1530</ul>
1531<div class="section">
1532<div class="section">
1533<h5><a name="Example"></a>Example</h5>
1534
1535<div>
1536<div>
1537<pre class="source">SELECT substr(user.name, 10), user.alias
1538FROM GleambookUsers user
1539WHERE user.id = 1;
1540</pre></div></div>
1541
1542<p>This query outputs:</p>
1543
1544<div>
1545<div>
1546<pre class="source">[ {
1547 &quot;alias&quot;: &quot;Margarita&quot;,
1548 &quot;$1&quot;: &quot;Stoddard&quot;
1549} ]
1550</pre></div></div>
1551
1552<p>In the result, <tt>$1</tt> is the generated name for <tt>substr(user.name, 1)</tt>, while <tt>alias</tt> is the generated name for <tt>user.alias</tt>.</p></div></div></div>
1553<div class="section">
1554<h3><a name="Abbreviated_Field_Access_Expressions"></a><a name="Abbreviated_field_access_expressions" id="Abbreviated_field_access_expressions">Abbreviated Field Access Expressions</a></h3>
1555<p>As in standard SQL, SQL++ field access expressions can be abbreviated (not recommended!) when there is no ambiguity. In the next example, the variable <tt>user</tt> is the only possible variable reference for fields <tt>id</tt>, <tt>name</tt> and <tt>alias</tt> and thus could be omitted in the query. More information on abbbreviated field access can be found in the appendix section on Variable Resolution.</p>
1556<div class="section">
1557<div class="section">
1558<h5><a name="Example"></a>Example</h5>
1559
1560<div>
1561<div>
1562<pre class="source">SELECT substr(name, 10) AS lname, alias
1563FROM GleambookUsers user
1564WHERE id = 1;
1565</pre></div></div>
1566
1567<p>Outputs:</p>
1568
1569<div>
1570<div>
1571<pre class="source">[ {
1572 &quot;lname&quot;: &quot;Stoddard&quot;,
1573 &quot;alias&quot;: &quot;Margarita&quot;
1574} ]
1575</pre></div></div>
1576</div></div></div></div>
1577<div class="section">
1578<h2><a name="UNNEST_Clause"></a><a name="Unnest_clauses" id="Unnest_clauses">UNNEST Clause</a></h2>
1579<p>For each of its input tuples, the <tt>UNNEST</tt> clause flattens a collection-valued expression into individual items, producing multiple tuples, each of which is one of the expression&#x2019;s original input tuples augmented with a flattened item from its collection.</p>
1580<div class="section">
1581<h3><a name="Inner_UNNEST"></a><a name="Inner_unnests" id="Inner_unnests">Inner UNNEST</a></h3>
1582<p>The following example is a query that retrieves the names of the organizations that a selected user has worked for. It uses the <tt>UNNEST</tt> clause to unnest the nested collection <tt>employment</tt> in the user&#x2019;s object.</p>
1583<div class="section">
1584<div class="section">
1585<h5><a name="Example"></a>Example</h5>
1586
1587<div>
1588<div>
1589<pre class="source">SELECT u.id AS userId, e.organizationName AS orgName
1590FROM GleambookUsers u
1591UNNEST u.employment e
1592WHERE u.id = 1;
1593</pre></div></div>
1594
1595<p>This query returns:</p>
1596
1597<div>
1598<div>
1599<pre class="source">[ {
1600 &quot;orgName&quot;: &quot;Codetechno&quot;,
1601 &quot;userId&quot;: 1
1602}, {
1603 &quot;orgName&quot;: &quot;geomedia&quot;,
1604 &quot;userId&quot;: 1
1605} ]
1606</pre></div></div>
1607
1608<p>Note that <tt>UNNEST</tt> has SQL&#x2019;s inner join semantics &#x2014; that is, if a user has no employment history, no tuple corresponding to that user will be emitted in the result.</p></div></div></div>
1609<div class="section">
1610<h3><a name="Left_Outer_UNNEST"></a><a name="Left_outer_unnests" id="Left_outer_unnests">Left Outer UNNEST</a></h3>
1611<p>As an alternative, the <tt>LEFT OUTER UNNEST</tt> clause offers SQL&#x2019;s left outer join semantics. For example, no collection-valued field named <tt>hobbies</tt> exists in the object for the user whose id is 1, but the following query&#x2019;s result still includes user 1.</p>
1612<div class="section">
1613<div class="section">
1614<h5><a name="Example"></a>Example</h5>
1615
1616<div>
1617<div>
1618<pre class="source">SELECT u.id AS userId, h.hobbyName AS hobby
1619FROM GleambookUsers u
1620LEFT OUTER UNNEST u.hobbies h
1621WHERE u.id = 1;
1622</pre></div></div>
1623
1624<p>Returns:</p>
1625
1626<div>
1627<div>
1628<pre class="source">[ {
1629 &quot;userId&quot;: 1
1630} ]
1631</pre></div></div>
1632
1633<p>Note that if <tt>u.hobbies</tt> is an empty collection or leads to a <tt>MISSING</tt> (as above) or <tt>NULL</tt> value for a given input tuple, there is no corresponding binding value for variable <tt>h</tt> for an input tuple. A <tt>MISSING</tt> value will be generated for <tt>h</tt> so that the input tuple can still be propagated.</p></div></div></div>
1634<div class="section">
1635<h3><a name="Expressing_Joins_Using_UNNEST"></a><a name="Expressing_joins_using_unnests" id="Expressing_joins_using_unnests">Expressing Joins Using UNNEST</a></h3>
1636<p>The SQL++ <tt>UNNEST</tt> clause is similar to SQL&#x2019;s <tt>JOIN</tt> clause except that it allows its right argument to be correlated to its left argument, as in the examples above &#x2014; i.e., think &#x201c;correlated cross-product&#x201d;. The next example shows this via a query that joins two data sets, GleambookUsers and GleambookMessages, returning user/message pairs. The results contain one object per pair, with result objects containing the user&#x2019;s name and an entire message. The query can be thought of as saying &#x201c;for each Gleambook user, unnest the <tt>GleambookMessages</tt> collection and filter the output with the condition <tt>message.authorId = user.id</tt>&#x201d;.</p>
1637<div class="section">
1638<div class="section">
1639<h5><a name="Example"></a>Example</h5>
1640
1641<div>
1642<div>
1643<pre class="source">SELECT u.name AS uname, m.message AS message
1644FROM GleambookUsers u
1645UNNEST GleambookMessages m
1646WHERE m.authorId = u.id;
1647</pre></div></div>
1648
1649<p>This returns:</p>
1650
1651<div>
1652<div>
1653<pre class="source">[ {
1654 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1655 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
1656}, {
1657 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1658 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
1659}, {
1660 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1661 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
1662}, {
1663 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1664 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
1665}, {
1666 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1667 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
1668}, {
1669 &quot;uname&quot;: &quot;IsbelDull&quot;,
1670 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
1671}, {
1672 &quot;uname&quot;: &quot;IsbelDull&quot;,
1673 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
1674} ]
1675</pre></div></div>
1676
1677<p>Similarly, the above query can also be expressed as the <tt>UNNEST</tt>ing of a correlated SQL++ subquery:</p></div>
1678<div class="section">
1679<h5><a name="Example"></a>Example</h5>
1680
1681<div>
1682<div>
1683<pre class="source">SELECT u.name AS uname, m.message AS message
1684FROM GleambookUsers u
1685UNNEST (
1686 SELECT VALUE msg
1687 FROM GleambookMessages msg
1688 WHERE msg.authorId = u.id
1689) AS m;
1690</pre></div></div>
1691</div></div></div></div>
1692<div class="section">
1693<h2><a name="FROM_clauses"></a><a name="From_clauses" id="From_clauses">FROM clauses</a></h2>
1694<p>A <tt>FROM</tt> clause is used for enumerating (i.e., conceptually iterating over) the contents of collections, as in SQL.</p>
1695<div class="section">
1696<h3><a name="Binding_expressions" id="Binding_expressions">Binding expressions</a></h3>
1697<p>In SQL++, in addition to stored collections, a <tt>FROM</tt> clause can iterate over any intermediate collection returned by a valid SQL++ expression. In the tuple stream generated by a <tt>FROM</tt> clause, the ordering of the input tuples are not guaranteed to be preserved.</p>
1698<div class="section">
1699<div class="section">
1700<h5><a name="Example"></a>Example</h5>
1701
1702<div>
1703<div>
1704<pre class="source">SELECT VALUE foo
1705FROM [1, 2, 2, 3] AS foo
1706WHERE foo &gt; 2;
1707</pre></div></div>
1708
1709<p>Returns:</p>
1710
1711<div>
1712<div>
1713<pre class="source">[
1714 3
1715]
1716</pre></div></div>
1717</div></div></div>
1718<div class="section">
1719<h3><a name="Multiple_FROM_Terms"></a><a name="Multiple_from_terms" id="Multiple_from_terms">Multiple FROM Terms</a></h3>
1720<p>SQL++ permits correlations among <tt>FROM</tt> terms. Specifically, a <tt>FROM</tt> binding expression can refer to variables defined to its left in the given <tt>FROM</tt> clause. Thus, the first unnesting example above could also be expressed as follows:</p>
1721<div class="section">
1722<div class="section">
1723<h5><a name="Example"></a>Example</h5>
1724
1725<div>
1726<div>
1727<pre class="source">SELECT u.id AS userId, e.organizationName AS orgName
1728FROM GleambookUsers u, u.employment e
1729WHERE u.id = 1;
1730</pre></div></div>
1731</div></div></div>
1732<div class="section">
1733<h3><a name="Expressing_Joins_Using_FROM_Terms"></a><a name="Expressing_joins_using_from_terms" id="Expressing_joins_using_from_terms">Expressing Joins Using FROM Terms</a></h3>
1734<p>Similarly, the join intentions of the other <tt>UNNEST</tt>-based join examples above could be expressed as:</p>
1735<div class="section">
1736<div class="section">
1737<h5><a name="Example"></a>Example</h5>
1738
1739<div>
1740<div>
1741<pre class="source">SELECT u.name AS uname, m.message AS message
1742FROM GleambookUsers u, GleambookMessages m
1743WHERE m.authorId = u.id;
1744</pre></div></div>
1745</div>
1746<div class="section">
1747<h5><a name="Example"></a>Example</h5>
1748
1749<div>
1750<div>
1751<pre class="source">SELECT u.name AS uname, m.message AS message
1752FROM GleambookUsers u,
1753 (
1754 SELECT VALUE msg
1755 FROM GleambookMessages msg
1756 WHERE msg.authorId = u.id
1757 ) AS m;
1758</pre></div></div>
1759
1760<p>Note that the first alternative is one of the SQL-92 approaches to expressing a join.</p></div></div></div>
1761<div class="section">
1762<h3><a name="Implicit_Binding_Variables"></a><a name="Implicit_binding_variables" id="Implicit_binding_variables">Implicit Binding Variables</a></h3>
1763<p>Similar to standard SQL, SQL++ supports implicit <tt>FROM</tt> binding variables (i.e., aliases), for which a binding variable is generated. SQL++ variable generation falls into three cases:</p>
1764<ul>
1765
1766<li>If the binding expression is a variable reference expression, the generated variable&#x2019;s name will be the name of the referenced variable itself.</li>
1767<li>If the binding expression is a field access expression (or a fully qualified name for a dataset), the generated variable&#x2019;s name will be the last identifier (or the dataset name) in the expression.</li>
1768<li>For all other cases, a compilation error will be raised.</li>
1769</ul>
1770<p>The next two examples show queries that do not provide binding variables in their <tt>FROM</tt> clauses.</p>
1771<div class="section">
1772<div class="section">
1773<h5><a name="Example"></a>Example</h5>
1774
1775<div>
1776<div>
1777<pre class="source">SELECT GleambookUsers.name, GleambookMessages.message
1778FROM GleambookUsers, GleambookMessages
1779WHERE GleambookMessages.authorId = GleambookUsers.id;
1780</pre></div></div>
1781
1782<p>Returns:</p>
1783
1784<div>
1785<div>
1786<pre class="source">[ {
1787 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1788 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
1789}, {
1790 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1791 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
1792}, {
1793 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1794 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
1795}, {
1796 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1797 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
1798}, {
1799 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
1800 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
1801}, {
1802 &quot;name&quot;: &quot;IsbelDull&quot;,
1803 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
1804}, {
1805 &quot;name&quot;: &quot;IsbelDull&quot;,
1806 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
1807} ]
1808</pre></div></div>
1809</div>
1810<div class="section">
1811<h5><a name="Example"></a>Example</h5>
1812
1813<div>
1814<div>
1815<pre class="source">SELECT GleambookUsers.name, GleambookMessages.message
1816FROM GleambookUsers,
1817 (
1818 SELECT VALUE GleambookMessages
1819 FROM GleambookMessages
1820 WHERE GleambookMessages.authorId = GleambookUsers.id
1821 );
1822</pre></div></div>
1823
1824<p>Returns:</p>
1825
1826<div>
1827<div>
1828<pre class="source">Error: &quot;Syntax error: Need an alias for the enclosed expression:\n(select element GleambookMessages\n from GleambookMessages as GleambookMessages\n where (GleambookMessages.authorId = GleambookUsers.id)\n )&quot;,
1829 &quot;query_from_user&quot;: &quot;use TinySocial;\n\nSELECT GleambookUsers.name, GleambookMessages.message\n FROM GleambookUsers,\n (\n SELECT VALUE GleambookMessages\n FROM GleambookMessages\n WHERE GleambookMessages.authorId = GleambookUsers.id\n );&quot;
1830</pre></div></div>
1831
1832<p>More information on implicit binding variables can be found in the appendix section on Variable Resolution.</p></div></div></div></div>
1833<div class="section">
1834<h2><a name="JOIN_Clauses"></a><a name="Join_clauses" id="Join_clauses">JOIN Clauses</a></h2>
1835<p>The join clause in SQL++ supports both inner joins and left outer joins from standard SQL.</p>
1836<div class="section">
1837<h3><a name="Inner_joins" id="Inner_joins">Inner joins</a></h3>
1838<p>Using a <tt>JOIN</tt> clause, the inner join intent from the preceeding examples can also be expressed as follows:</p>
1839<div class="section">
1840<div class="section">
1841<h5><a name="Example"></a>Example</h5>
1842
1843<div>
1844<div>
1845<pre class="source">SELECT u.name AS uname, m.message AS message
1846FROM GleambookUsers u JOIN GleambookMessages m ON m.authorId = u.id;
1847</pre></div></div>
1848</div></div></div>
1849<div class="section">
1850<h3><a name="Left_Outer_Joins"></a><a name="Left_outer_joins" id="Left_outer_joins">Left Outer Joins</a></h3>
1851<p>SQL++ supports SQL&#x2019;s notion of left outer join. The following query is an example:</p>
1852
1853<div>
1854<div>
1855<pre class="source">SELECT u.name AS uname, m.message AS message
1856FROM GleambookUsers u LEFT OUTER JOIN GleambookMessages m ON m.authorId = u.id;
1857</pre></div></div>
1858
1859<p>Returns:</p>
1860
1861<div>
1862<div>
1863<pre class="source">[ {
1864 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1865 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
1866}, {
1867 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1868 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
1869}, {
1870 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1871 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
1872}, {
1873 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1874 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
1875}, {
1876 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
1877 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
1878}, {
1879 &quot;uname&quot;: &quot;IsbelDull&quot;,
1880 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
1881}, {
1882 &quot;uname&quot;: &quot;IsbelDull&quot;,
1883 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
1884}, {
1885 &quot;uname&quot;: &quot;EmoryUnk&quot;
1886} ]
1887</pre></div></div>
1888
1889<p>For non-matching left-side tuples, SQL++ produces <tt>MISSING</tt> values for the right-side binding variables; that is why the last object in the above result doesn&#x2019;t have a <tt>message</tt> field. Note that this is slightly different from standard SQL, which instead would fill in <tt>NULL</tt> values for the right-side fields. The reason for this difference is that, for non-matches in its join results, SQL++ views fields from the right-side as being &#x201c;not there&#x201d; (a.k.a. <tt>MISSING</tt>) instead of as being &#x201c;there but unknown&#x201d; (i.e., <tt>NULL</tt>).</p>
1890<p>The left-outer join query can also be expressed using <tt>LEFT OUTER UNNEST</tt>:</p>
1891
1892<div>
1893<div>
1894<pre class="source">SELECT u.name AS uname, m.message AS message
1895FROM GleambookUsers u
1896LEFT OUTER UNNEST (
1897 SELECT VALUE message
1898 FROM GleambookMessages message
1899 WHERE message.authorId = u.id
1900 ) m;
1901</pre></div></div>
1902
1903<p>In general, in SQL++, SQL-style join queries can also be expressed by <tt>UNNEST</tt> clauses and left outer join queries can be expressed by <tt>LEFT OUTER UNNESTs</tt>.</p></div></div>
1904<div class="section">
1905<h2><a name="GROUP_BY_Clauses"></a><a name="Group_By_clauses" id="Group_By_clauses">GROUP BY Clauses</a></h2>
1906<p>The SQL++ <tt>GROUP BY</tt> clause generalizes standard SQL&#x2019;s grouping and aggregation semantics, but it also retains backward compatibility with the standard (relational) SQL <tt>GROUP BY</tt> and aggregation features.</p>
1907<div class="section">
1908<h3><a name="Group_variables" id="Group_variables">Group variables</a></h3>
1909<p>In a <tt>GROUP BY</tt> clause, in addition to the binding variable(s) defined for the grouping key(s), SQL++ allows a user to define a <i>group variable</i> by using the clause&#x2019;s <tt>GROUP AS</tt> extension to denote the resulting group. After grouping, then, the query&#x2019;s in-scope variables include the grouping key&#x2019;s binding variables as well as this group variable which will be bound to one collection value for each group. This per-group collection (i.e., multiset) value will be a set of nested objects in which each field of the object is the result of a renamed variable defined in parentheses following the group variable&#x2019;s name. The <tt>GROUP AS</tt> syntax is as follows:</p>
1910
1911<div>
1912<div>
1913<pre class="source">&lt;GROUP&gt; &lt;AS&gt; Variable (&quot;(&quot; VariableReference &lt;AS&gt; Identifier (&quot;,&quot; VariableReference &lt;AS&gt; Identifier )* &quot;)&quot;)?
1914</pre></div></div>
1915
1916<div class="section">
1917<div class="section">
1918<h5><a name="Example"></a>Example</h5>
1919
1920<div>
1921<div>
1922<pre class="source">SELECT *
1923FROM GleambookMessages message
1924GROUP BY message.authorId AS uid GROUP AS msgs(message AS msg);
1925</pre></div></div>
1926
1927<p>This first example query returns:</p>
1928
1929<div>
1930<div>
1931<pre class="source">[ {
1932 &quot;msgs&quot;: [
1933 {
1934 &quot;msg&quot;: {
1935 &quot;senderLocation&quot;: [
1936 38.97,
1937 77.49
1938 ],
1939 &quot;inResponseTo&quot;: 1,
1940 &quot;messageId&quot;: 11,
1941 &quot;authorId&quot;: 1,
1942 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
1943 }
1944 },
1945 {
1946 &quot;msg&quot;: {
1947 &quot;senderLocation&quot;: [
1948 41.66,
1949 80.87
1950 ],
1951 &quot;inResponseTo&quot;: 4,
1952 &quot;messageId&quot;: 2,
1953 &quot;authorId&quot;: 1,
1954 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
1955 }
1956 },
1957 {
1958 &quot;msg&quot;: {
1959 &quot;senderLocation&quot;: [
1960 37.73,
1961 97.04
1962 ],
1963 &quot;inResponseTo&quot;: 2,
1964 &quot;messageId&quot;: 4,
1965 &quot;authorId&quot;: 1,
1966 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
1967 }
1968 },
1969 {
1970 &quot;msg&quot;: {
1971 &quot;senderLocation&quot;: [
1972 40.33,
1973 80.87
1974 ],
1975 &quot;inResponseTo&quot;: 11,
1976 &quot;messageId&quot;: 8,
1977 &quot;authorId&quot;: 1,
1978 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
1979 }
1980 },
1981 {
1982 &quot;msg&quot;: {
1983 &quot;senderLocation&quot;: [
1984 42.5,
1985 70.01
1986 ],
1987 &quot;inResponseTo&quot;: 12,
1988 &quot;messageId&quot;: 10,
1989 &quot;authorId&quot;: 1,
1990 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
1991 }
1992 }
1993 ],
1994 &quot;uid&quot;: 1
1995}, {
1996 &quot;msgs&quot;: [
1997 {
1998 &quot;msg&quot;: {
1999 &quot;senderLocation&quot;: [
2000 31.5,
2001 75.56
2002 ],
2003 &quot;inResponseTo&quot;: 1,
2004 &quot;messageId&quot;: 6,
2005 &quot;authorId&quot;: 2,
2006 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
2007 }
2008 },
2009 {
2010 &quot;msg&quot;: {
2011 &quot;senderLocation&quot;: [
2012 48.09,
2013 81.01
2014 ],
2015 &quot;inResponseTo&quot;: 4,
2016 &quot;messageId&quot;: 3,
2017 &quot;authorId&quot;: 2,
2018 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
2019 }
2020 }
2021 ],
2022 &quot;uid&quot;: 2
2023} ]
2024</pre></div></div>
2025
2026<p>As we can see from the above query result, each group in the example query&#x2019;s output has an associated group variable value called <tt>msgs</tt> that appears in the <tt>SELECT *</tt>&#x2019;s result. This variable contains a collection of objects associated with the group; each of the group&#x2019;s <tt>message</tt> values appears in the <tt>msg</tt> field of the objects in the <tt>msgs</tt> collection.</p>
2027<p>The group variable in SQL++ makes more complex, composable, nested subqueries over a group possible, which is important given the more complex data model of SQL++ (relative to SQL). As a simple example of this, as we really just want the messages associated with each user, we might wish to avoid the &#x201c;extra wrapping&#x201d; of each message as the <tt>msg</tt> field of a object. (That wrapping is useful in more complex cases, but is essentially just in the way here.) We can use a subquery in the <tt>SELECT</tt> clase to tunnel through the extra nesting and produce the desired result.</p></div>
2028<div class="section">
2029<h5><a name="Example"></a>Example</h5>
2030
2031<div>
2032<div>
2033<pre class="source">SELECT uid, (SELECT VALUE g.msg FROM g) AS msgs
2034FROM GleambookMessages gbm
2035GROUP BY gbm.authorId AS uid
2036GROUP AS g(gbm as msg);
2037</pre></div></div>
2038
2039<p>This variant of the example query returns:</p>
2040
2041<div>
2042<div>
2043<pre class="source"> [ {
2044 &quot;msgs&quot;: [
2045 {
2046 &quot;senderLocation&quot;: [
2047 38.97,
2048 77.49
2049 ],
2050 &quot;inResponseTo&quot;: 1,
2051 &quot;messageId&quot;: 11,
2052 &quot;authorId&quot;: 1,
2053 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
2054 },
2055 {
2056 &quot;senderLocation&quot;: [
2057 41.66,
2058 80.87
2059 ],
2060 &quot;inResponseTo&quot;: 4,
2061 &quot;messageId&quot;: 2,
2062 &quot;authorId&quot;: 1,
2063 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
2064 },
2065 {
2066 &quot;senderLocation&quot;: [
2067 37.73,
2068 97.04
2069 ],
2070 &quot;inResponseTo&quot;: 2,
2071 &quot;messageId&quot;: 4,
2072 &quot;authorId&quot;: 1,
2073 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
2074 },
2075 {
2076 &quot;senderLocation&quot;: [
2077 40.33,
2078 80.87
2079 ],
2080 &quot;inResponseTo&quot;: 11,
2081 &quot;messageId&quot;: 8,
2082 &quot;authorId&quot;: 1,
2083 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
2084 },
2085 {
2086 &quot;senderLocation&quot;: [
2087 42.5,
2088 70.01
2089 ],
2090 &quot;inResponseTo&quot;: 12,
2091 &quot;messageId&quot;: 10,
2092 &quot;authorId&quot;: 1,
2093 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
2094 }
2095 ],
2096 &quot;uid&quot;: 1
2097 }, {
2098 &quot;msgs&quot;: [
2099 {
2100 &quot;senderLocation&quot;: [
2101 31.5,
2102 75.56
2103 ],
2104 &quot;inResponseTo&quot;: 1,
2105 &quot;messageId&quot;: 6,
2106 &quot;authorId&quot;: 2,
2107 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
2108 },
2109 {
2110 &quot;senderLocation&quot;: [
2111 48.09,
2112 81.01
2113 ],
2114 &quot;inResponseTo&quot;: 4,
2115 &quot;messageId&quot;: 3,
2116 &quot;authorId&quot;: 2,
2117 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
2118 }
2119 ],
2120 &quot;uid&quot;: 2
2121 } ]
2122</pre></div></div>
2123
2124<p>The next example shows a more interesting case involving the use of a subquery in the <tt>SELECT</tt> list. Here the subquery further processes the groups. There is no renaming in the declaration of the group variable <tt>g</tt> such that <tt>g</tt> only has one field <tt>gbm</tt> which comes from the <tt>FROM</tt> clause.</p></div>
2125<div class="section">
2126<h5><a name="Example"></a>Example</h5>
2127
2128<div>
2129<div>
2130<pre class="source">SELECT uid,
2131 (SELECT VALUE g.gbm
2132 FROM g
2133 WHERE g.gbm.message LIKE '% like%'
2134 ORDER BY g.gbm.messageId
2135 LIMIT 2) AS msgs
2136FROM GleambookMessages gbm
2137GROUP BY gbm.authorId AS uid
2138GROUP AS g;
2139</pre></div></div>
2140
2141<p>This example query returns:</p>
2142
2143<div>
2144<div>
2145<pre class="source">[ {
2146 &quot;msgs&quot;: [
2147 {
2148 &quot;senderLocation&quot;: [
2149 40.33,
2150 80.87
2151 ],
2152 &quot;inResponseTo&quot;: 11,
2153 &quot;messageId&quot;: 8,
2154 &quot;authorId&quot;: 1,
2155 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
2156 }
2157 ],
2158 &quot;uid&quot;: 1
2159}, {
2160 &quot;msgs&quot;: [
2161 {
2162 &quot;senderLocation&quot;: [
2163 48.09,
2164 81.01
2165 ],
2166 &quot;inResponseTo&quot;: 4,
2167 &quot;messageId&quot;: 3,
2168 &quot;authorId&quot;: 2,
2169 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
2170 },
2171 {
2172 &quot;senderLocation&quot;: [
2173 31.5,
2174 75.56
2175 ],
2176 &quot;inResponseTo&quot;: 1,
2177 &quot;messageId&quot;: 6,
2178 &quot;authorId&quot;: 2,
2179 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
2180 }
2181 ],
2182 &quot;uid&quot;: 2
2183} ]
2184</pre></div></div>
2185</div></div></div>
2186<div class="section">
2187<h3><a name="Implicit_Grouping_Key_Variables"></a><a name="Implicit_group_key_variables" id="Implicit_group_key_variables">Implicit Grouping Key Variables</a></h3>
2188<p>In the SQL++ syntax, providing named binding variables for <tt>GROUP BY</tt> key expressions is optional. If a grouping key is missing a user-provided binding variable, the underlying compiler will generate one. Automatic grouping key variable naming falls into three cases in SQL++, much like the treatment of unnamed projections:</p>
2189<ul>
2190
2191<li>If the grouping key expression is a variable reference expression, the generated variable gets the same name as the referred variable;</li>
2192<li>If the grouping key expression is a field access expression, the generated variable gets the same name as the last identifier in the expression;</li>
2193<li>For all other cases, the compiler generates a unique variable (but the user query is unable to refer to this generated variable).</li>
2194</ul>
2195<p>The next example illustrates a query that doesn&#x2019;t provide binding variables for its grouping key expressions.</p>
2196<div class="section">
2197<div class="section">
2198<h5><a name="Example"></a>Example</h5>
2199
2200<div>
2201<div>
2202<pre class="source">SELECT authorId,
2203 (SELECT VALUE g.gbm
2204 FROM g
2205 WHERE g.gbm.message LIKE '% like%'
2206 ORDER BY g.gbm.messageId
2207 LIMIT 2) AS msgs
2208FROM GleambookMessages gbm
2209GROUP BY gbm.authorId
2210GROUP AS g;
2211</pre></div></div>
2212
2213<p>This query returns:</p>
2214
2215<div>
2216<div>
2217<pre class="source"> [ {
2218 &quot;msgs&quot;: [
2219 {
2220 &quot;senderLocation&quot;: [
2221 40.33,
2222 80.87
2223 ],
2224 &quot;inResponseTo&quot;: 11,
2225 &quot;messageId&quot;: 8,
2226 &quot;authorId&quot;: 1,
2227 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
2228 }
2229 ],
2230 &quot;authorId&quot;: 1
2231}, {
2232 &quot;msgs&quot;: [
2233 {
2234 &quot;senderLocation&quot;: [
2235 48.09,
2236 81.01
2237 ],
2238 &quot;inResponseTo&quot;: 4,
2239 &quot;messageId&quot;: 3,
2240 &quot;authorId&quot;: 2,
2241 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
2242 },
2243 {
2244 &quot;senderLocation&quot;: [
2245 31.5,
2246 75.56
2247 ],
2248 &quot;inResponseTo&quot;: 1,
2249 &quot;messageId&quot;: 6,
2250 &quot;authorId&quot;: 2,
2251 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
2252 }
2253 ],
2254 &quot;authorId&quot;: 2
2255} ]
2256</pre></div></div>
2257
2258<p>Based on the three variable generation rules, the generated variable for the grouping key expression <tt>message.authorId</tt> is <tt>authorId</tt> (which is how it is referred to in the example&#x2019;s <tt>SELECT</tt> clause).</p></div></div></div>
2259<div class="section">
2260<h3><a name="Implicit_Group_Variables"></a><a name="Implicit_group_variables" id="Implicit_group_variables">Implicit Group Variables</a></h3>
2261<p>The group variable itself is also optional in SQL++&#x2019;s <tt>GROUP BY</tt> syntax. If a user&#x2019;s query does not declare the name and structure of the group variable using <tt>GROUP AS</tt>, the query compiler will generate a unique group variable whose fields include all of the binding variables defined in the <tt>FROM</tt> clause of the current enclosing <tt>SELECT</tt> statement. In this case the user&#x2019;s query will not be able to refer to the generated group variable, but is able to call SQL-92 aggregation functions as in SQL-92.</p></div>
2262<div class="section">
2263<h3><a name="Aggregation_Functions"></a><a name="Aggregation_functions" id="Aggregation_functions">Aggregation Functions</a></h3>
2264<p>In the traditional SQL, which doesn&#x2019;t support nested data, grouping always also involves the use of aggregation to compute properties of the groups (for example, the average number of messages per user rather than the actual set of messages per user). Each aggregation function in SQL++ takes a collection (for example, the group of messages) as its input and produces a scalar value as its output. These aggregation functions, being truly functional in nature (unlike in SQL), can be used anywhere in a query where an expression is allowed. The following table catalogs the SQL++ built-in aggregation functions and also indicates how each one handles <tt>NULL</tt>/<tt>MISSING</tt> values in the input collection or a completely empty input collection:</p>
2265<table border="0" class="table table-striped">
2266<thead>
2267
2268<tr class="a">
2269<th> Function </th>
2270<th> NULL </th>
2271<th> MISSING </th>
2272<th> Empty Collection </th></tr>
2273</thead><tbody>
2274
2275<tr class="b">
2276<td> COLL_COUNT </td>
2277<td> counted </td>
2278<td> counted </td>
2279<td> 0 </td></tr>
2280<tr class="a">
2281<td> COLL_SUM </td>
2282<td> returns NULL </td>
2283<td> returns NULL </td>
2284<td> returns NULL </td></tr>
2285<tr class="b">
2286<td> COLL_MAX </td>
2287<td> returns NULL </td>
2288<td> returns NULL </td>
2289<td> returns NULL </td></tr>
2290<tr class="a">
2291<td> COLL_MIN </td>
2292<td> returns NULL </td>
2293<td> returns NULL </td>
2294<td> returns NULL </td></tr>
2295<tr class="b">
2296<td> COLL_AVG </td>
2297<td> returns NULL </td>
2298<td> returns NULL </td>
2299<td> returns NULL </td></tr>
2300<tr class="a">
2301<td> ARRAY_COUNT </td>
2302<td> not counted </td>
2303<td> not counted </td>
2304<td> 0 </td></tr>
2305<tr class="b">
2306<td> ARRAY_SUM </td>
2307<td> ignores NULL </td>
2308<td> ignores NULL </td>
2309<td> returns NULL </td></tr>
2310<tr class="a">
2311<td> ARRAY_MAX </td>
2312<td> ignores NULL </td>
2313<td> ignores NULL </td>
2314<td> returns NULL </td></tr>
2315<tr class="b">
2316<td> ARRAY_MIN </td>
2317<td> ignores NULL </td>
2318<td> ignores NULL </td>
2319<td> returns NULL </td></tr>
2320<tr class="a">
2321<td> ARRAY_AVG </td>
2322<td> ignores NULL </td>
2323<td> ignores NULL </td>
2324<td> returns NULL </td></tr>
2325</tbody>
2326</table>
2327<p>Notice that SQL++ has twice as many functions listed above as there are aggregate functions in SQL-92. This is because SQL++ offers two versions of each &#x2013; one that handles <tt>UNKNOWN</tt> values in a semantically strict fashion, where unknown values in the input result in unknown values in the output &#x2013; and one that handles them in the ad hoc &#x201c;just ignore the unknown values&#x201d; fashion that the SQL standard chose to adopt.</p>
2328<div class="section">
2329<div class="section">
2330<h5><a name="Example"></a>Example</h5>
2331
2332<div>
2333<div>
2334<pre class="source">ARRAY_AVG(
2335 (
2336 SELECT VALUE ARRAY_COUNT(friendIds) FROM GleambookUsers
2337 )
2338);
2339</pre></div></div>
2340
2341<p>This example returns:</p>
2342
2343<div>
2344<div>
2345<pre class="source">3.3333333333333335
2346</pre></div></div>
2347</div>
2348<div class="section">
2349<h5><a name="Example"></a>Example</h5>
2350
2351<div>
2352<div>
2353<pre class="source">SELECT uid AS uid, ARRAY_COUNT(grp) AS msgCnt
2354FROM GleambookMessages message
2355GROUP BY message.authorId AS uid
2356GROUP AS grp(message AS msg);
2357</pre></div></div>
2358
2359<p>This query returns:</p>
2360
2361<div>
2362<div>
2363<pre class="source">[ {
2364 &quot;uid&quot;: 1,
2365 &quot;msgCnt&quot;: 5
2366}, {
2367 &quot;uid&quot;: 2,
2368 &quot;msgCnt&quot;: 2
2369} ]
2370</pre></div></div>
2371
2372<p>Notice how the query forms groups where each group involves a message author and their messages. (SQL cannot do this because the grouped intermediate result is non-1NF in nature.) The query then uses the collection aggregate function ARRAY_COUNT to get the cardinality of each group of messages.</p>
2373<p>Each aggregation function in SQL++ supports DISTINCT modifier that removes duplicate values from the input collection.</p></div>
2374<div class="section">
2375<h5><a name="Example"></a>Example</h5>
2376
2377<div>
2378<div>
2379<pre class="source">ARRAY_SUM(DISTINCT [1, 1, 2, 2, 3])
2380</pre></div></div>
2381
2382<p>This query returns:</p>
2383
2384<div>
2385<div>
2386<pre class="source">6
2387</pre></div></div>
2388</div></div></div>
2389<div class="section">
2390<h3><a name="SQL-92_Aggregation_Functions"></a><a name="SQL-92_aggregation_functions" id="SQL-92_aggregation_functions">SQL-92 Aggregation Functions</a></h3>
2391<p>For compatibility with the traditional SQL aggregation functions, SQL++ also offers SQL-92&#x2019;s aggregation function symbols (<tt>COUNT</tt>, <tt>SUM</tt>, <tt>MAX</tt>, <tt>MIN</tt>, and <tt>AVG</tt>) as supported syntactic sugar. The SQL++ compiler rewrites queries that utilize these function symbols into SQL++ queries that only use the SQL++ collection aggregate functions. The following example uses the SQL-92 syntax approach to compute a result that is identical to that of the more explicit SQL++ example above:</p>
2392<div class="section">
2393<div class="section">
2394<h5><a name="Example"></a>Example</h5>
2395
2396<div>
2397<div>
2398<pre class="source">SELECT uid, COUNT(*) AS msgCnt
2399FROM GleambookMessages msg
2400GROUP BY msg.authorId AS uid;
2401</pre></div></div>
2402
2403<p>It is important to realize that <tt>COUNT</tt> is actually <b>not</b> a SQL++ built-in aggregation function. Rather, the <tt>COUNT</tt> query above is using a special &#x201c;sugared&#x201d; function symbol that the SQL++ compiler will rewrite as follows:</p>
2404
2405<div>
2406<div>
2407<pre class="source">SELECT uid AS uid, ARRAY_COUNT( (SELECT VALUE 1 FROM `$1` as g) ) AS msgCnt
2408FROM GleambookMessages msg
2409GROUP BY msg.authorId AS uid
2410GROUP AS `$1`(msg AS msg);
2411</pre></div></div>
2412
2413<p>The same sort of rewritings apply to the function symbols <tt>SUM</tt>, <tt>MAX</tt>, <tt>MIN</tt>, and <tt>AVG</tt>. In contrast to the SQL++ collection aggregate functions, these special SQL-92 function symbols can only be used in the same way they are in standard SQL (i.e., with the same restrictions).</p>
2414<p>DISTINCT modifier is also supported for these aggregate functions.</p></div></div></div>
2415<div class="section">
2416<h3><a name="SQL-92_Compliant_GROUP_BY_Aggregations"></a><a name="SQL-92_compliant_gby" id="SQL-92_compliant_gby">SQL-92 Compliant GROUP BY Aggregations</a></h3>
2417<p>SQL++ provides full support for SQL-92 <tt>GROUP BY</tt> aggregation queries. The following query is such an example:</p>
2418<div class="section">
2419<div class="section">
2420<h5><a name="Example"></a>Example</h5>
2421
2422<div>
2423<div>
2424<pre class="source">SELECT msg.authorId, COUNT(*)
2425FROM GleambookMessages msg
2426GROUP BY msg.authorId;
2427</pre></div></div>
2428
2429<p>This query outputs:</p>
2430
2431<div>
2432<div>
2433<pre class="source">[ {
2434 &quot;authorId&quot;: 1,
2435 &quot;$1&quot;: 5
2436}, {
2437 &quot;authorId&quot;: 2,
2438 &quot;$1&quot;: 2
2439} ]
2440</pre></div></div>
2441
2442<p>In principle, a <tt>msg</tt> reference in the query&#x2019;s <tt>SELECT</tt> clause would be &#x201c;sugarized&#x201d; as a collection (as described in <a href="#Implicit_group_variables">Implicit Group Variables</a>). However, since the SELECT expression <tt>msg.authorId</tt> is syntactically identical to a GROUP BY key expression, it will be internally replaced by the generated group key variable. The following is the equivalent rewritten query that will be generated by the compiler for the query above:</p>
2443
2444<div>
2445<div>
2446<pre class="source">SELECT authorId AS authorId, ARRAY_COUNT( (SELECT g.msg FROM `$1` AS g) )
2447FROM GleambookMessages msg
2448GROUP BY msg.authorId AS authorId
2449GROUP AS `$1`(msg AS msg);
2450</pre></div></div>
2451</div></div></div>
2452<div class="section">
2453<h3><a name="Column_Aliases"></a><a name="Column_aliases" id="Column_aliases">Column Aliases</a></h3>
2454<p>SQL++ also allows column aliases to be used as <tt>ORDER BY</tt> keys.</p>
2455<div class="section">
2456<div class="section">
2457<h5><a name="Example"></a>Example</h5>
2458
2459<div>
2460<div>
2461<pre class="source">SELECT msg.authorId AS aid, COUNT(*)
2462FROM GleambookMessages msg
2463GROUP BY msg.authorId;
2464ORDER BY aid;
2465</pre></div></div>
2466
2467<p>This query returns:</p>
2468
2469<div>
2470<div>
2471<pre class="source">[ {
2472 &quot;$1&quot;: 5,
2473 &quot;aid&quot;: 1
2474}, {
2475 &quot;$1&quot;: 2,
2476 &quot;aid&quot;: 2
2477} ]
2478</pre></div></div>
2479</div></div></div></div>
2480<div class="section">
2481<h2><a name="WHERE_Clauses_and_HAVING_Clauses"></a><a name="Where_having_clauses" id="Where_having_clauses">WHERE Clauses and HAVING Clauses</a></h2>
2482<p>Both <tt>WHERE</tt> clauses and <tt>HAVING</tt> clauses are used to filter input data based on a condition expression. Only tuples for which the condition expression evaluates to <tt>TRUE</tt> are propagated. Note that if the condition expression evaluates to <tt>NULL</tt> or <tt>MISSING</tt> the input tuple will be disgarded.</p></div>
2483<div class="section">
2484<h2><a name="ORDER_BY_Clauses"></a><a name="Order_By_clauses" id="Order_By_clauses">ORDER BY Clauses</a></h2>
2485<p>The <tt>ORDER BY</tt> clause is used to globally sort data in either ascending order (i.e., <tt>ASC</tt>) or descending order (i.e., <tt>DESC</tt>). During ordering, <tt>MISSING</tt> and <tt>NULL</tt> are treated as being smaller than any other value if they are encountered in the ordering key(s). <tt>MISSING</tt> is treated as smaller than <tt>NULL</tt> if both occur in the data being sorted. The ordering of values of a given type is consistent with its type&#x2019;s &lt;= ordering; the ordering of values across types is implementation-defined but stable. The following example returns all <tt>GleambookUsers</tt> in descending order by their number of friends.</p>
2486<div class="section">
2487<div class="section">
2488<div class="section">
2489<h5><a name="Example"></a>Example</h5>
2490
2491<div>
2492<div>
2493<pre class="source"> SELECT VALUE user
2494 FROM GleambookUsers AS user
2495 ORDER BY ARRAY_COUNT(user.friendIds) DESC;
2496</pre></div></div>
2497
2498<p>This query returns:</p>
2499
2500<div>
2501<div>
2502<pre class="source"> [ {
2503 &quot;userSince&quot;: &quot;2012-08-20T10:10:00.000Z&quot;,
2504 &quot;friendIds&quot;: [
2505 2,
2506 3,
2507 6,
2508 10
2509 ],
2510 &quot;gender&quot;: &quot;F&quot;,
2511 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
2512 &quot;nickname&quot;: &quot;Mags&quot;,
2513 &quot;alias&quot;: &quot;Margarita&quot;,
2514 &quot;id&quot;: 1,
2515 &quot;employment&quot;: [
2516 {
2517 &quot;organizationName&quot;: &quot;Codetechno&quot;,
2518 &quot;start-date&quot;: &quot;2006-08-06&quot;
2519 },
2520 {
2521 &quot;end-date&quot;: &quot;2010-01-26&quot;,
2522 &quot;organizationName&quot;: &quot;geomedia&quot;,
2523 &quot;start-date&quot;: &quot;2010-06-17&quot;
2524 }
2525 ]
2526 }, {
2527 &quot;userSince&quot;: &quot;2012-07-10T10:10:00.000Z&quot;,
2528 &quot;friendIds&quot;: [
2529 1,
2530 5,
2531 8,
2532 9
2533 ],
2534 &quot;name&quot;: &quot;EmoryUnk&quot;,
2535 &quot;alias&quot;: &quot;Emory&quot;,
2536 &quot;id&quot;: 3,
2537 &quot;employment&quot;: [
2538 {
2539 &quot;organizationName&quot;: &quot;geomedia&quot;,
2540 &quot;endDate&quot;: &quot;2010-01-26&quot;,
2541 &quot;startDate&quot;: &quot;2010-06-17&quot;
2542 }
2543 ]
2544 }, {
2545 &quot;userSince&quot;: &quot;2011-01-22T10:10:00.000Z&quot;,
2546 &quot;friendIds&quot;: [
2547 1,
2548 4
2549 ],
2550 &quot;name&quot;: &quot;IsbelDull&quot;,
2551 &quot;nickname&quot;: &quot;Izzy&quot;,
2552 &quot;alias&quot;: &quot;Isbel&quot;,
2553 &quot;id&quot;: 2,
2554 &quot;employment&quot;: [
2555 {
2556 &quot;organizationName&quot;: &quot;Hexviafind&quot;,
2557 &quot;startDate&quot;: &quot;2010-04-27&quot;
2558 }
2559 ]
2560 } ]
2561</pre></div></div>
2562</div></div></div></div>
2563<div class="section">
2564<h2><a name="LIMIT_Clauses"></a><a name="Limit_clauses" id="Limit_clauses">LIMIT Clauses</a></h2>
2565<p>The <tt>LIMIT</tt> clause is used to limit the result set to a specified constant size. The use of the <tt>LIMIT</tt> clause is illustrated in the next example.</p>
2566<div class="section">
2567<div class="section">
2568<div class="section">
2569<h5><a name="Example"></a>Example</h5>
2570
2571<div>
2572<div>
2573<pre class="source"> SELECT VALUE user
2574 FROM GleambookUsers AS user
2575 ORDER BY len(user.friendIds) DESC
2576 LIMIT 1;
2577</pre></div></div>
2578
2579<p>This query returns:</p>
2580
2581<div>
2582<div>
2583<pre class="source"> [ {
2584 &quot;userSince&quot;: &quot;2012-08-20T10:10:00.000Z&quot;,
2585 &quot;friendIds&quot;: [
2586 2,
2587 3,
2588 6,
2589 10
2590 ],
2591 &quot;gender&quot;: &quot;F&quot;,
2592 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
2593 &quot;nickname&quot;: &quot;Mags&quot;,
2594 &quot;alias&quot;: &quot;Margarita&quot;,
2595 &quot;id&quot;: 1,
2596 &quot;employment&quot;: [
2597 {
2598 &quot;organizationName&quot;: &quot;Codetechno&quot;,
2599 &quot;start-date&quot;: &quot;2006-08-06&quot;
2600 },
2601 {
2602 &quot;end-date&quot;: &quot;2010-01-26&quot;,
2603 &quot;organizationName&quot;: &quot;geomedia&quot;,
2604 &quot;start-date&quot;: &quot;2010-06-17&quot;
2605 }
2606 ]
2607 } ]
2608</pre></div></div>
2609</div></div></div></div>
2610<div class="section">
2611<h2><a name="WITH_Clauses"></a><a name="With_clauses" id="With_clauses">WITH Clauses</a></h2>
2612<p>As in standard SQL, <tt>WITH</tt> clauses are available to improve the modularity of a query. The next query shows an example.</p>
2613<div class="section">
2614<div class="section">
2615<div class="section">
2616<h5><a name="Example"></a>Example</h5>
2617
2618<div>
2619<div>
2620<pre class="source">WITH avgFriendCount AS (
2621 SELECT VALUE AVG(ARRAY_COUNT(user.friendIds))
2622 FROM GleambookUsers AS user
2623)[0]
2624SELECT VALUE user
2625FROM GleambookUsers user
2626WHERE ARRAY_COUNT(user.friendIds) &gt; avgFriendCount;
2627</pre></div></div>
2628
2629<p>This query returns:</p>
2630
2631<div>
2632<div>
2633<pre class="source">[ {
2634 &quot;userSince&quot;: &quot;2012-08-20T10:10:00.000Z&quot;,
2635 &quot;friendIds&quot;: [
2636 2,
2637 3,
2638 6,
2639 10
2640 ],
2641 &quot;gender&quot;: &quot;F&quot;,
2642 &quot;name&quot;: &quot;MargaritaStoddard&quot;,
2643 &quot;nickname&quot;: &quot;Mags&quot;,
2644 &quot;alias&quot;: &quot;Margarita&quot;,
2645 &quot;id&quot;: 1,
2646 &quot;employment&quot;: [
2647 {
2648 &quot;organizationName&quot;: &quot;Codetechno&quot;,
2649 &quot;start-date&quot;: &quot;2006-08-06&quot;
2650 },
2651 {
2652 &quot;end-date&quot;: &quot;2010-01-26&quot;,
2653 &quot;organizationName&quot;: &quot;geomedia&quot;,
2654 &quot;start-date&quot;: &quot;2010-06-17&quot;
2655 }
2656 ]
2657}, {
2658 &quot;userSince&quot;: &quot;2012-07-10T10:10:00.000Z&quot;,
2659 &quot;friendIds&quot;: [
2660 1,
2661 5,
2662 8,
2663 9
2664 ],
2665 &quot;name&quot;: &quot;EmoryUnk&quot;,
2666 &quot;alias&quot;: &quot;Emory&quot;,
2667 &quot;id&quot;: 3,
2668 &quot;employment&quot;: [
2669 {
2670 &quot;organizationName&quot;: &quot;geomedia&quot;,
2671 &quot;endDate&quot;: &quot;2010-01-26&quot;,
2672 &quot;startDate&quot;: &quot;2010-06-17&quot;
2673 }
2674 ]
2675} ]
2676</pre></div></div>
2677
2678<p>The query is equivalent to the following, more complex, inlined form of the query:</p>
2679
2680<div>
2681<div>
2682<pre class="source">SELECT *
2683FROM GleambookUsers user
2684WHERE ARRAY_COUNT(user.friendIds) &gt;
2685 ( SELECT VALUE AVG(ARRAY_COUNT(user.friendIds))
2686 FROM GleambookUsers AS user
2687 ) [0];
2688</pre></div></div>
2689
2690<p>WITH can be particularly useful when a value needs to be used several times in a query.</p>
2691<p>Before proceeding further, notice that both the WITH query and its equivalent inlined variant include the syntax &#x201c;[0]&#x201d; &#x2013; this is due to a noteworthy difference between SQL++ and SQL-92. In SQL-92, whenever a scalar value is expected and it is being produced by a query expression, the SQL-92 query processor will evaluate the expression, check that there is only one row and column in the result at runtime, and then coerce the one-row/one-column tabular result into a scalar value. SQL++, being designed to deal with nested data and schema-less data, does not (and should not) do this. Collection-valued data is perfectly legal in most SQL++ contexts, and its data is schema-less, so a query processor rarely knows exactly what to expect where and such automatic conversion is often not desirable. Thus, in the queries above, the use of &#x201c;[0]&#x201d; extracts the first (i.e., 0th) element of an array-valued query expression&#x2019;s result; this is needed above, even though the result is an array of one element, to extract the only element in the singleton array and obtain the desired scalar for the comparison.</p></div></div></div></div>
2692<div class="section">
2693<h2><a name="LET_Clauses"></a><a name="Let_clauses" id="Let_clauses">LET Clauses</a></h2>
2694<p>Similar to <tt>WITH</tt> clauses, <tt>LET</tt> clauses can be useful when a (complex) expression is used several times within a query, allowing it to be written once to make the query more concise. The next query shows an example.</p>
2695<div class="section">
2696<div class="section">
2697<div class="section">
2698<h5><a name="Example"></a>Example</h5>
2699
2700<div>
2701<div>
2702<pre class="source">SELECT u.name AS uname, messages AS messages
2703FROM GleambookUsers u
2704LET messages = (SELECT VALUE m
2705 FROM GleambookMessages m
2706 WHERE m.authorId = u.id)
2707WHERE EXISTS messages;
2708</pre></div></div>
2709
2710<p>This query lists <tt>GleambookUsers</tt> that have posted <tt>GleambookMessages</tt> and shows all authored messages for each listed user. It returns:</p>
2711
2712<div>
2713<div>
2714<pre class="source">[ {
2715 &quot;uname&quot;: &quot;MargaritaStoddard&quot;,
2716 &quot;messages&quot;: [
2717 {
2718 &quot;senderLocation&quot;: [
2719 38.97,
2720 77.49
2721 ],
2722 &quot;inResponseTo&quot;: 1,
2723 &quot;messageId&quot;: 11,
2724 &quot;authorId&quot;: 1,
2725 &quot;message&quot;: &quot; can't stand acast its plan is terrible&quot;
2726 },
2727 {
2728 &quot;senderLocation&quot;: [
2729 41.66,
2730 80.87
2731 ],
2732 &quot;inResponseTo&quot;: 4,
2733 &quot;messageId&quot;: 2,
2734 &quot;authorId&quot;: 1,
2735 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
2736 },
2737 {
2738 &quot;senderLocation&quot;: [
2739 37.73,
2740 97.04
2741 ],
2742 &quot;inResponseTo&quot;: 2,
2743 &quot;messageId&quot;: 4,
2744 &quot;authorId&quot;: 1,
2745 &quot;message&quot;: &quot; can't stand acast the network is horrible:(&quot;
2746 },
2747 {
2748 &quot;senderLocation&quot;: [
2749 40.33,
2750 80.87
2751 ],
2752 &quot;inResponseTo&quot;: 11,
2753 &quot;messageId&quot;: 8,
2754 &quot;authorId&quot;: 1,
2755 &quot;message&quot;: &quot; like ccast the 3G is awesome:)&quot;
2756 },
2757 {
2758 &quot;senderLocation&quot;: [
2759 42.5,
2760 70.01
2761 ],
2762 &quot;inResponseTo&quot;: 12,
2763 &quot;messageId&quot;: 10,
2764 &quot;authorId&quot;: 1,
2765 &quot;message&quot;: &quot; can't stand product-w the touch-screen is terrible&quot;
2766 }
2767 ]
2768}, {
2769 &quot;uname&quot;: &quot;IsbelDull&quot;,
2770 &quot;messages&quot;: [
2771 {
2772 &quot;senderLocation&quot;: [
2773 31.5,
2774 75.56
2775 ],
2776 &quot;inResponseTo&quot;: 1,
2777 &quot;messageId&quot;: 6,
2778 &quot;authorId&quot;: 2,
2779 &quot;message&quot;: &quot; like product-z its platform is mind-blowing&quot;
2780 },
2781 {
2782 &quot;senderLocation&quot;: [
2783 48.09,
2784 81.01
2785 ],
2786 &quot;inResponseTo&quot;: 4,
2787 &quot;messageId&quot;: 3,
2788 &quot;authorId&quot;: 2,
2789 &quot;message&quot;: &quot; like product-y the plan is amazing&quot;
2790 }
2791 ]
2792} ]
2793</pre></div></div>
2794
2795<p>This query is equivalent to the following query that does not use the <tt>LET</tt> clause:</p>
2796
2797<div>
2798<div>
2799<pre class="source">SELECT u.name AS uname, ( SELECT VALUE m
2800 FROM GleambookMessages m
2801 WHERE m.authorId = u.id
2802 ) AS messages
2803FROM GleambookUsers u
2804WHERE EXISTS ( SELECT VALUE m
2805 FROM GleambookMessages m
2806 WHERE m.authorId = u.id
2807 );
2808</pre></div></div>
2809</div></div></div></div>
2810<div class="section">
2811<h2><a name="UNION_ALL"></a><a name="Union_all" id="Union_all">UNION ALL</a></h2>
2812<p>UNION ALL can be used to combine two input arrays or multisets into one. As in SQL, there is no ordering guarantee on the contents of the output stream. However, unlike SQL, SQL++ does not constrain what the data looks like on the input streams; in particular, it allows heterogenity on the input and output streams. A type error will be raised if one of the inputs is not a collection. The following odd but legal query is an example:</p>
2813<div class="section">
2814<div class="section">
2815<div class="section">
2816<h5><a name="Example"></a>Example</h5>
2817
2818<div>
2819<div>
2820<pre class="source">SELECT u.name AS uname
2821FROM GleambookUsers u
2822WHERE u.id = 2
2823 UNION ALL
2824SELECT VALUE m.message
2825FROM GleambookMessages m
2826WHERE authorId=2;
2827</pre></div></div>
2828
2829<p>This query returns:</p>
2830
2831<div>
2832<div>
2833<pre class="source">[
2834 &quot; like product-z its platform is mind-blowing&quot;
2835 , {
2836 &quot;uname&quot;: &quot;IsbelDull&quot;
2837}, &quot; like product-y the plan is amazing&quot;
2838 ]
2839</pre></div></div>
2840</div></div></div></div>
2841<div class="section">
2842<h2><a name="Subqueries" id="Subqueries">Subqueries</a></h2>
2843<p>In SQL++, an arbitrary subquery can appear anywhere that an expression can appear. Unlike SQL-92, as was just alluded to, the subqueries in a SELECT list or a boolean predicate need not return singleton, single-column relations. Instead, they may return arbitrary collections. For example, the following query is a variant of the prior group-by query examples; it retrieves an array of up to two &#x201c;dislike&#x201d; messages per user.</p>
2844<div class="section">
2845<div class="section">
2846<div class="section">
2847<h5><a name="Example"></a>Example</h5>
2848
2849<div>
2850<div>
2851<pre class="source">SELECT uid,
2852 (SELECT VALUE m.msg
2853 FROM msgs m
2854 WHERE m.msg.message LIKE '%dislike%'
2855 ORDER BY m.msg.messageId
2856 LIMIT 2) AS msgs
2857FROM GleambookMessages message
2858GROUP BY message.authorId AS uid GROUP AS msgs(message AS msg);
2859</pre></div></div>
2860
2861<p>For our sample data set, this query returns:</p>
2862
2863<div>
2864<div>
2865<pre class="source">[ {
2866 &quot;msgs&quot;: [
2867 {
2868 &quot;senderLocation&quot;: [
2869 41.66,
2870 80.87
2871 ],
2872 &quot;inResponseTo&quot;: 4,
2873 &quot;messageId&quot;: 2,
2874 &quot;authorId&quot;: 1,
2875 &quot;message&quot;: &quot; dislike x-phone its touch-screen is horrible&quot;
2876 }
2877 ],
2878 &quot;uid&quot;: 1
2879}, {
2880 &quot;msgs&quot;: [
2881
2882 ],
2883 &quot;uid&quot;: 2
2884} ]
2885</pre></div></div>
2886
2887<p>Note that a subquery, like a top-level <tt>SELECT</tt> statment, always returns a collection &#x2013; regardless of where within a query the subquery occurs &#x2013; and again, its result is never automatically cast into a scalar.</p></div></div></div></div>
2888<div class="section">
2889<h2><a name="SQL.2B.2B_vs._SQL-92"></a><a name="Vs_SQL-92" id="Vs_SQL-92">SQL++ vs. SQL-92</a></h2>
2890<p>SQL++ offers the following additional features beyond SQL-92 (hence the &#x201c;++&#x201d; in its name):</p>
2891<ul>
2892
2893<li>Fully composable and functional: A subquery can iterate over any intermediate collection and can appear anywhere in a query.</li>
2894<li>Schema-free: The query language does not assume the existence of a static schema for any data that it processes.</li>
2895<li>Correlated FROM terms: A right-side FROM term expression can refer to variables defined by FROM terms on its left.</li>
2896<li>Powerful GROUP BY: In addition to a set of aggregate functions as in standard SQL, the groups created by the <tt>GROUP BY</tt> clause are directly usable in nested queries and/or to obtain nested results.</li>
2897<li>Generalized SELECT clause: A SELECT clause can return any type of collection, while in SQL-92, a <tt>SELECT</tt> clause has to return a (homogeneous) collection of objects.</li>
2898</ul>
2899<p>The following matrix is a quick &#x201c;SQL-92 compatibility cheat sheet&#x201d; for SQL++.</p>
2900<table border="0" class="table table-striped">
2901<thead>
2902
2903<tr class="a">
2904<th> Feature </th>
2905<th> SQL++ </th>
2906<th> SQL-92 </th>
2907<th> Why different? </th></tr>
2908</thead><tbody>
2909
2910<tr class="b">
2911<td> SELECT * </td>
2912<td> Returns nested objects </td>
2913<td> Returns flattened concatenated objects </td>
2914<td> Nested collections are 1st class citizens </td></tr>
2915<tr class="a">
2916<td> SELECT list </td>
2917<td> order not preserved </td>
2918<td> order preserved </td>
2919<td> Fields in a JSON object are not ordered </td></tr>
2920<tr class="b">
2921<td> Subquery </td>
2922<td> Returns a collection </td>
2923<td> The returned collection is cast into a scalar value if the subquery appears in a SELECT list or on one side of a comparison or as input to a function </td>
2924<td> Nested collections are 1st class citizens </td></tr>
2925<tr class="a">
2926<td> LEFT OUTER JOIN </td>
2927<td> Fills in <tt>MISSING</tt>(s) for non-matches </td>
2928<td> Fills in <tt>NULL</tt>(s) for non-matches </td>
2929<td> &#x201c;Absence&#x201d; is more appropriate than &#x201c;unknown&#x201d; here </td></tr>
2930<tr class="b">
2931<td> UNION ALL </td>
2932<td> Allows heterogeneous inputs and output </td>
2933<td> Input streams must be UNION-compatible and output field names are drawn from the first input stream </td>
2934<td> Heterogenity and nested collections are common </td></tr>
2935<tr class="a">
2936<td> IN constant_expr </td>
2937<td> The constant expression has to be an array or multiset, i.e., [..,..,&#x2026;] </td>
2938<td> The constant collection can be represented as comma-separated items in a paren pair </td>
2939<td> Nested collections are 1st class citizens </td></tr>
2940<tr class="b">
2941<td> String literal </td>
2942<td> Double quotes or single quotes </td>
2943<td> Single quotes only </td>
2944<td> Double quoted strings are pervasive </td></tr>
2945<tr class="a">
2946<td> Delimited identifiers </td>
2947<td> Backticks </td>
2948<td> Double quotes </td>
2949<td> Double quoted strings are pervasive </td></tr>
2950</tbody>
2951</table>
2952<p>The following SQL-92 features are not implemented yet. However, SQL++ does not conflict with these features:</p>
2953<ul>
2954
2955<li>CROSS JOIN, NATURAL JOIN, UNION JOIN</li>
2956<li>RIGHT and FULL OUTER JOIN</li>
2957<li>INTERSECT, EXCEPT, UNION with set semantics</li>
2958<li>CAST expression</li>
2959<li>COALESCE expression</li>
2960<li>ALL and SOME predicates for linking to subqueries</li>
2961<li>UNIQUE predicate (tests a collection for duplicates)</li>
2962<li>MATCH predicate (tests for referential integrity)</li>
2963<li>Row and Table constructors</li>
2964<li>Preserved order for expressions in a SELECT list</li>
2965</ul><!--
2966 ! Licensed to the Apache Software Foundation (ASF) under one
2967 ! or more contributor license agreements. See the NOTICE file
2968 ! distributed with this work for additional information
2969 ! regarding copyright ownership. The ASF licenses this file
2970 ! to you under the Apache License, Version 2.0 (the
2971 ! "License"); you may not use this file except in compliance
2972 ! with the License. You may obtain a copy of the License at
2973 !
2974 ! http://www.apache.org/licenses/LICENSE-2.0
2975 !
2976 ! Unless required by applicable law or agreed to in writing,
2977 ! software distributed under the License is distributed on an
2978 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
2979 ! KIND, either express or implied. See the License for the
2980 ! specific language governing permissions and limitations
2981 ! under the License.
2982 !-->
2983
2984<h1><a name="Errors" id="Errors">4. Errors</a></h1><!--
2985 ! Licensed to the Apache Software Foundation (ASF) under one
2986 ! or more contributor license agreements. See the NOTICE file
2987 ! distributed with this work for additional information
2988 ! regarding copyright ownership. The ASF licenses this file
2989 ! to you under the Apache License, Version 2.0 (the
2990 ! "License"); you may not use this file except in compliance
2991 ! with the License. You may obtain a copy of the License at
2992 !
2993 ! http://www.apache.org/licenses/LICENSE-2.0
2994 !
2995 ! Unless required by applicable law or agreed to in writing,
2996 ! software distributed under the License is distributed on an
2997 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
2998 ! KIND, either express or implied. See the License for the
2999 ! specific language governing permissions and limitations
3000 ! under the License.
3001 !-->
3002
3003<p>A SQL++ query can potentially result in one of the following errors:</p>
3004<ul>
3005
3006<li>syntax error,</li>
3007<li>identifier resolution error,</li>
3008<li>type error,</li>
3009<li>resource error.</li>
3010</ul>
3011<p>If the query processor runs into any error, it will terminate the ongoing processing of the query and immediately return an error message to the client.</p></div>
3012<div class="section">
3013<h2><a name="Syntax_Errors"></a><a name="Syntax_errors" id="Syntax_errors">Syntax Errors</a></h2>
3014<p>An valid SQL++ query must satisfy the SQL++ grammar rules. Otherwise, a syntax error will be raised.</p>
3015<div class="section">
3016<div class="section">
3017<div class="section">
3018<h5><a name="Example"></a>Example</h5>
3019
3020<div>
3021<div>
3022<pre class="source">SELECT *
3023GleambookUsers user
3024</pre></div></div>
3025
3026<p>Since the query misses a <tt>FROM</tt> keyword before the dataset <tt>GleambookUsers</tt>, we will get a syntax error as follows:</p>
3027
3028<div>
3029<div>
3030<pre class="source">Syntax error: In line 2 &gt;&gt;GleambookUsers user;&lt;&lt; Encountered &lt;IDENTIFIER&gt; \&quot;GleambookUsers\&quot; at column 1.
3031</pre></div></div>
3032</div>
3033<div class="section">
3034<h5><a name="Example"></a>Example</h5>
3035
3036<div>
3037<div>
3038<pre class="source">SELECT *
3039FROM GleambookUsers user
3040WHERE type=&quot;advertiser&quot;;
3041</pre></div></div>
3042
3043<p>Since &#x201c;type&#x201d; is a reserved keyword in the SQL++ parser, we will get a syntax error as follows:</p>
3044
3045<div>
3046<div>
3047<pre class="source">Error: Syntax error: In line 3 &gt;&gt;WHERE type=&quot;advertiser&quot;;&lt;&lt; Encountered 'type' &quot;type&quot; at column 7.
3048==&gt; WHERE type=&quot;advertiser&quot;;
3049</pre></div></div>
3050</div></div></div></div>
3051<div class="section">
3052<h2><a name="Identifier_Resolution_Errors"></a><a name="Identifier_resolution_errors" id="Identifier_resolution_errors">Identifier Resolution Errors</a></h2>
3053<p>Referring to an undefined identifier can cause an error if the identifier cannot be successfully resolved as a valid field access.</p>
3054<div class="section">
3055<div class="section">
3056<div class="section">
3057<h5><a name="Example"></a>Example</h5>
3058
3059<div>
3060<div>
3061<pre class="source">SELECT *
3062FROM GleambookUser user;
3063</pre></div></div>
3064
3065<p>If we have a typo as above in &#x201c;GleambookUsers&#x201d; that misses the dataset name&#x2019;s ending &#x201c;s&#x201d;, we will get an identifier resolution error as follows:</p>
3066
3067<div>
3068<div>
3069<pre class="source">Error: Cannot find dataset GleambookUser in dataverse Default nor an alias with name GleambookUser!
3070</pre></div></div>
3071</div>
3072<div class="section">
3073<h5><a name="Example"></a>Example</h5>
3074
3075<div>
3076<div>
3077<pre class="source">SELECT name, message
3078FROM GleambookUsers u JOIN GleambookMessages m ON m.authorId = u.id;
3079</pre></div></div>
3080
3081<p>If the compiler cannot figure out how to resolve an unqualified field name, which will occur if there is more than one variable in scope (e.g., <tt>GleambookUsers u</tt> and <tt>GleambookMessages m</tt> as above), we will get an identifier resolution error as follows:</p>
3082
3083<div>
3084<div>
3085<pre class="source">Error: Cannot resolve ambiguous alias reference for undefined identifier name
3086</pre></div></div>
3087</div></div></div></div>
3088<div class="section">
3089<h2><a name="Type_Errors"></a><a name="Type_errors" id="Type_errors">Type Errors</a></h2>
3090<p>The SQL++ compiler does type checks based on its available type information. In addition, the SQL++ runtime also reports type errors if a data model instance it processes does not satisfy the type requirement.</p>
3091<div class="section">
3092<div class="section">
3093<div class="section">
3094<h5><a name="Example"></a>Example</h5>
3095
3096<div>
3097<div>
3098<pre class="source">abs(&quot;123&quot;);
3099</pre></div></div>
3100
3101<p>Since function <tt>abs</tt> can only process numeric input values, we will get a type error as follows:</p>
3102
3103<div>
3104<div>
3105<pre class="source">Error: Type mismatch: function abs expects its 1st input parameter to be type tinyint, smallint, integer, bigint, float or double, but the actual input type is string
3106</pre></div></div>
3107</div></div></div></div>
3108<div class="section">
3109<h2><a name="Resource_Errors"></a><a name="Resource_errors" id="Resource_errors">Resource Errors</a></h2>
3110<p>A query can potentially exhaust system resources, such as the number of open files and disk spaces. For instance, the following two resource errors could be potentially be seen when running the system:</p>
3111
3112<div>
3113<div>
3114<pre class="source">Error: no space left on device
3115Error: too many open files
3116</pre></div></div>
3117
3118<p>The &#x201c;no space left on device&#x201d; issue usually can be fixed by cleaning up disk spaces and reserving more disk spaces for the system. The &#x201c;too many open files&#x201d; issue usually can be fixed by a system administrator, following the instructions <a class="externalLink" href="https://easyengine.io/tutorials/linux/increase-open-files-limit/">here</a>.</p><!--
3119 ! Licensed to the Apache Software Foundation (ASF) under one
3120 ! or more contributor license agreements. See the NOTICE file
3121 ! distributed with this work for additional information
3122 ! regarding copyright ownership. The ASF licenses this file
3123 ! to you under the Apache License, Version 2.0 (the
3124 ! "License"); you may not use this file except in compliance
3125 ! with the License. You may obtain a copy of the License at
3126 !
3127 ! http://www.apache.org/licenses/LICENSE-2.0
3128 !
3129 ! Unless required by applicable law or agreed to in writing,
3130 ! software distributed under the License is distributed on an
3131 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3132 ! KIND, either express or implied. See the License for the
3133 ! specific language governing permissions and limitations
3134 ! under the License.
3135 !-->
3136
3137<h1><a name="DDL_and_DML_statements" id="DDL_and_DML_statements">5. DDL and DML statements</a></h1>
3138
3139<div>
3140<div>
3141<pre class="source">Statement ::= ( ( SingleStatement )? ( &quot;;&quot; )+ )* &lt;EOF&gt;
3142SingleStatement ::= DatabaseDeclaration
3143 | FunctionDeclaration
3144 | CreateStatement
3145 | DropStatement
3146 | LoadStatement
3147 | SetStatement
3148 | InsertStatement
3149 | DeleteStatement
3150 | Query
3151</pre></div></div>
3152
3153<p>In addition to queries, an implementation of SQL++ needs to support statements for data definition and manipulation purposes as well as controlling the context to be used in evaluating SQL++ expressions. This section details the DDL and DML statements supported in the SQL++ language as realized today in Apache AsterixDB.</p><!--
3154 ! Licensed to the Apache Software Foundation (ASF) under one
3155 ! or more contributor license agreements. See the NOTICE file
3156 ! distributed with this work for additional information
3157 ! regarding copyright ownership. The ASF licenses this file
3158 ! to you under the Apache License, Version 2.0 (the
3159 ! "License"); you may not use this file except in compliance
3160 ! with the License. You may obtain a copy of the License at
3161 !
3162 ! http://www.apache.org/licenses/LICENSE-2.0
3163 !
3164 ! Unless required by applicable law or agreed to in writing,
3165 ! software distributed under the License is distributed on an
3166 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3167 ! KIND, either express or implied. See the License for the
3168 ! specific language governing permissions and limitations
3169 ! under the License.
3170 !-->
3171</div>
3172<div class="section">
3173<h2><a name="Lifecycle_Management_Statements"></a><a name="Lifecycle_management_statements" id="Lifecycle_management_statements">Lifecycle Management Statements</a></h2>
3174
3175<div>
3176<div>
3177<pre class="source">CreateStatement ::= &quot;CREATE&quot; ( DatabaseSpecification
3178 | TypeSpecification
3179 | DatasetSpecification
3180 | IndexSpecification
3181 | FunctionSpecification )
3182
3183QualifiedName ::= Identifier ( &quot;.&quot; Identifier )?
3184DoubleQualifiedName ::= Identifier &quot;.&quot; Identifier ( &quot;.&quot; Identifier )?
3185</pre></div></div>
3186
3187<p>The CREATE statement in SQL++ is used for creating dataverses as well as other persistent artifacts in a dataverse. It can be used to create new dataverses, datatypes, datasets, indexes, and user-defined SQL++ functions.</p>
3188<div class="section">
3189<h3><a name="Dataverses" id="Dataverses"> Dataverses</a></h3>
3190
3191<div>
3192<div>
3193<pre class="source">DatabaseSpecification ::= &quot;DATAVERSE&quot; Identifier IfNotExists
3194</pre></div></div>
3195
3196<p>The CREATE DATAVERSE statement is used to create new dataverses. To ease the authoring of reusable SQL++ scripts, an optional IF NOT EXISTS clause is included to allow creation to be requested either unconditionally or only if the dataverse does not already exist. If this clause is absent, an error is returned if a dataverse with the indicated name already exists.</p>
3197<p>The following example creates a new dataverse named TinySocial if one does not already exist.</p>
3198<div class="section">
3199<div class="section">
3200<h5><a name="Example"></a>Example</h5>
3201
3202<div>
3203<div>
3204<pre class="source">CREATE DATAVERSE TinySocial IF NOT EXISTS;
3205</pre></div></div>
3206</div></div></div>
3207<div class="section">
3208<h3><a name="Types" id="Types"> Types</a></h3>
3209
3210<div>
3211<div>
3212<pre class="source">TypeSpecification ::= &quot;TYPE&quot; FunctionOrTypeName IfNotExists &quot;AS&quot; ObjectTypeDef
3213FunctionOrTypeName ::= QualifiedName
3214IfNotExists ::= ( &lt;IF&gt; &lt;NOT&gt; &lt;EXISTS&gt; )?
3215TypeExpr ::= ObjectTypeDef | TypeReference | ArrayTypeDef | MultisetTypeDef
3216ObjectTypeDef ::= ( &lt;CLOSED&gt; | &lt;OPEN&gt; )? &quot;{&quot; ( ObjectField ( &quot;,&quot; ObjectField )* )? &quot;}&quot;
3217ObjectField ::= Identifier &quot;:&quot; ( TypeExpr ) ( &quot;?&quot; )?
3218NestedField ::= Identifier ( &quot;.&quot; Identifier )*
3219IndexField ::= NestedField ( &quot;:&quot; TypeReference )?
3220TypeReference ::= Identifier
3221ArrayTypeDef ::= &quot;[&quot; ( TypeExpr ) &quot;]&quot;
3222MultisetTypeDef ::= &quot;{{&quot; ( TypeExpr ) &quot;}}&quot;
3223</pre></div></div>
3224
3225<p>The CREATE TYPE statement is used to create a new named datatype. This type can then be used to create stored collections or utilized when defining one or more other datatypes. Much more information about the data model is available in the <a href="../datamodel.html">data model reference guide</a>. A new type can be a object type, a renaming of another type, an array type, or a multiset type. A object type can be defined as being either open or closed. Instances of a closed object type are not permitted to contain fields other than those specified in the create type statement. Instances of an open object type may carry additional fields, and open is the default for new types if neither option is specified.</p>
3226<p>The following example creates a new object type called GleambookUser type. Since it is defined as (defaulting to) being an open type, instances will be permitted to contain more than what is specified in the type definition. The first four fields are essentially traditional typed name/value pairs (much like SQL fields). The friendIds field is a multiset of integers. The employment field is an array of instances of another named object type, EmploymentType.</p>
3227<div class="section">
3228<div class="section">
3229<h5><a name="Example"></a>Example</h5>
3230
3231<div>
3232<div>
3233<pre class="source">CREATE TYPE GleambookUserType AS {
3234 id: int,
3235 alias: string,
3236 name: string,
3237 userSince: datetime,
3238 friendIds: {{ int }},
3239 employment: [ EmploymentType ]
3240};
3241</pre></div></div>
3242
3243<p>The next example creates a new object type, closed this time, called MyUserTupleType. Instances of this closed type will not be permitted to have extra fields, although the alias field is marked as optional and may thus be NULL or MISSING in legal instances of the type. Note that the type of the id field in the example is UUID. This field type can be used if you want to have this field be an autogenerated-PK field. (Refer to the Datasets section later for more details on such fields.)</p></div>
3244<div class="section">
3245<h5><a name="Example"></a>Example</h5>
3246
3247<div>
3248<div>
3249<pre class="source">CREATE TYPE MyUserTupleType AS CLOSED {
3250 id: uuid,
3251 alias: string?,
3252 name: string
3253};
3254</pre></div></div>
3255</div></div></div>
3256<div class="section">
3257<h3><a name="Datasets" id="Datasets"> Datasets</a></h3>
3258
3259<div>
3260<div>
3261<pre class="source">DatasetSpecification ::= ( &lt;INTERNAL&gt; )? &lt;DATASET&gt; QualifiedName &quot;(&quot; QualifiedName &quot;)&quot; IfNotExists
3262 PrimaryKey ( &lt;ON&gt; Identifier )? ( &lt;HINTS&gt; Properties )?
3263 ( &quot;USING&quot; &quot;COMPACTION&quot; &quot;POLICY&quot; CompactionPolicy ( Configuration )? )?
3264 ( &lt;WITH&gt; &lt;FILTER&gt; &lt;ON&gt; Identifier )?
3265 |
3266 &lt;EXTERNAL&gt; &lt;DATASET&gt; QualifiedName &quot;(&quot; QualifiedName &quot;)&quot; IfNotExists &lt;USING&gt; AdapterName
3267 Configuration ( &lt;HINTS&gt; Properties )?
3268 ( &lt;USING&gt; &lt;COMPACTION&gt; &lt;POLICY&gt; CompactionPolicy ( Configuration )? )?
3269AdapterName ::= Identifier
3270Configuration ::= &quot;(&quot; ( KeyValuePair ( &quot;,&quot; KeyValuePair )* )? &quot;)&quot;
3271KeyValuePair ::= &quot;(&quot; StringLiteral &quot;=&quot; StringLiteral &quot;)&quot;
3272Properties ::= ( &quot;(&quot; Property ( &quot;,&quot; Property )* &quot;)&quot; )?
3273Property ::= Identifier &quot;=&quot; ( StringLiteral | IntegerLiteral )
3274FunctionSignature ::= FunctionOrTypeName &quot;@&quot; IntegerLiteral
3275PrimaryKey ::= &lt;PRIMARY&gt; &lt;KEY&gt; NestedField ( &quot;,&quot; NestedField )* ( &lt;AUTOGENERATED&gt; )?
3276CompactionPolicy ::= Identifier
3277</pre></div></div>
3278
3279<p>The CREATE DATASET statement is used to create a new dataset. Datasets are named, multisets of object type instances; they are where data lives persistently and are the usual targets for SQL++ queries. Datasets are typed, and the system ensures that their contents conform to their type definitions. An Internal dataset (the default kind) is a dataset whose content lives within and is managed by the system. It is required to have a specified unique primary key field which uniquely identifies the contained objects. (The primary key is also used in secondary indexes to identify the indexed primary data objects.)</p>
3280<p>Internal datasets contain several advanced options that can be specified when appropriate. One such option is that random primary key (UUID) values can be auto-generated by declaring the field to be UUID and putting &#x201c;AUTOGENERATED&#x201d; after the &#x201c;PRIMARY KEY&#x201d; identifier. In this case, unlike other non-optional fields, a value for the auto-generated PK field should not be provided at insertion time by the user since each object&#x2019;s primary key field value will be auto-generated by the system.</p>
3281<p>Another advanced option, when creating an Internal dataset, is to specify the merge policy to control which of the underlying LSM storage components to be merged. (The system supports Log-Structured Merge tree based physical storage for Internal datasets.) Currently the system supports four different component merging policies that can be chosen per dataset: no-merge, constant, prefix, and correlated-prefix. The no-merge policy simply never merges disk components. The constant policy merges disk components when the number of components reaches a constant number k that can be configured by the user. The prefix policy relies on both component sizes and the number of components to decide which components to merge. It works by first trying to identify the smallest ordered (oldest to newest) sequence of components such that the sequence does not contain a single component that exceeds some threshold size M and that either the sum of the component&#x2019;s sizes exceeds M or the number of components in the sequence exceeds another threshold C. If such a sequence exists, the components in the sequence are merged together to form a single component. Finally, the correlated-prefix policy is similar to the prefix policy, but it delegates the decision of merging the disk components of all the indexes in a dataset to the primary index. When the correlated-prefix policy decides that the primary index needs to be merged (using the same decision criteria as for the prefix policy), then it will issue successive merge requests on behalf of all other indexes associated with the same dataset. The system&#x2019;s default policy is the prefix policy except when there is a filter on a dataset, where the preferred policy for filters is the correlated-prefix.</p>
3282<p>Another advanced option shown in the syntax above, related to performance and mentioned above, is that a <b>filter</b> can optionally be created on a field to further optimize range queries with predicates on the filter&#x2019;s field. Filters allow some range queries to avoid searching all LSM components when the query conditions match the filter. (Refer to <a href="../filters.html">Filter-Based LSM Index Acceleration</a> for more information about filters.)</p>
3283<p>An External dataset, in contrast to an Internal dataset, has data stored outside of the system&#x2019;s control. Files living in HDFS or in the local filesystem(s) of a cluster&#x2019;s nodes are currently supported. External dataset support allows SQL++ queries to treat foreign data as though it were stored in the system, making it possible to query &#x201c;legacy&#x201d; file data (for example, Hive data) without having to physically import it. When defining an External dataset, an appropriate adapter type must be selected for the desired external data. (See the <a href="../externaldata.html">Guide to External Data</a> for more information on the available adapters.)</p>
3284<p>The following example creates an Internal dataset for storing FacefookUserType objects. It specifies that their id field is their primary key.</p>
3285<div class="section">
3286<h4><a name="Example"></a>Example</h4>
3287
3288<div>
3289<div>
3290<pre class="source">CREATE INTERNAL DATASET GleambookUsers(GleambookUserType) PRIMARY KEY id;
3291</pre></div></div>
3292
3293<p>The next example creates another Internal dataset (the default kind when no dataset kind is specified) for storing MyUserTupleType objects. It specifies that the id field should be used as the primary key for the dataset. It also specifies that the id field is an auto-generated field, meaning that a randomly generated UUID value should be assigned to each incoming object by the system. (A user should therefore not attempt to provide a value for this field.) Note that the id field&#x2019;s declared type must be UUID in this case.</p></div>
3294<div class="section">
3295<h4><a name="Example"></a>Example</h4>
3296
3297<div>
3298<div>
3299<pre class="source">CREATE DATASET MyUsers(MyUserTupleType) PRIMARY KEY id AUTOGENERATED;
3300</pre></div></div>
3301
3302<p>The next example creates an External dataset for querying LineItemType objects. The choice of the <tt>hdfs</tt> adapter means that this dataset&#x2019;s data actually resides in HDFS. The example CREATE statement also provides parameters used by the hdfs adapter: the URL and path needed to locate the data in HDFS and a description of the data format.</p></div>
3303<div class="section">
3304<h4><a name="Example"></a>Example</h4>
3305
3306<div>
3307<div>
3308<pre class="source">CREATE EXTERNAL DATASET LineItem(LineItemType) USING hdfs (
3309 (&quot;hdfs&quot;=&quot;hdfs://HOST:PORT&quot;),
3310 (&quot;path&quot;=&quot;HDFS_PATH&quot;),
3311 (&quot;input-format&quot;=&quot;text-input-format&quot;),
3312 (&quot;format&quot;=&quot;delimited-text&quot;),
3313 (&quot;delimiter&quot;=&quot;|&quot;));
3314</pre></div></div>
3315</div></div>
3316<div class="section">
3317<h3><a name="Indices" id="Indices">Indices</a></h3>
3318
3319<div>
3320<div>
3321<pre class="source">IndexSpecification ::= &lt;INDEX&gt; Identifier IfNotExists &lt;ON&gt; QualifiedName
3322 &quot;(&quot; ( IndexField ) ( &quot;,&quot; IndexField )* &quot;)&quot; ( &quot;type&quot; IndexType &quot;?&quot;)?
3323 ( (&lt;NOT&gt;)? &lt;ENFORCED&gt; )?
3324IndexType ::= &lt;BTREE&gt; | &lt;RTREE&gt; | &lt;KEYWORD&gt; | &lt;NGRAM&gt; &quot;(&quot; IntegerLiteral &quot;)&quot;
3325</pre></div></div>
3326
3327<p>The CREATE INDEX statement creates a secondary index on one or more fields of a specified dataset. Supported index types include <tt>BTREE</tt> for totally ordered datatypes, <tt>RTREE</tt> for spatial data, and <tt>KEYWORD</tt> and <tt>NGRAM</tt> for textual (string) data. An index can be created on a nested field (or fields) by providing a valid path expression as an index field identifier.</p>
3328<p>An indexed field is not required to be part of the datatype associated with a dataset if the dataset&#x2019;s datatype is declared as open <b>and</b> if the field&#x2019;s type is provided along with its name and if the <tt>ENFORCED</tt> keyword is specified at the end of the index definition. <tt>ENFORCING</tt> an open field introduces a check that makes sure that the actual type of the indexed field (if the optional field exists in the object) always matches this specified (open) field type.</p>
3329<p>The following example creates a btree index called gbAuthorIdx on the authorId field of the GleambookMessages dataset. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the author-id field.</p>
3330<div class="section">
3331<h4><a name="Example"></a>Example</h4>
3332
3333<div>
3334<div>
3335<pre class="source">CREATE INDEX gbAuthorIdx ON GleambookMessages(authorId) TYPE BTREE;
3336</pre></div></div>
3337
3338<p>The following example creates an open btree index called gbSendTimeIdx on the (non-predeclared) sendTime field of the GleambookMessages dataset having datetime type. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the sendTime field. The index is enforced so that records that do not have the &#x201c;sendTime&#x201d; field or have a mismatched type on the field cannot be inserted into the dataset.</p></div>
3339<div class="section">
3340<h4><a name="Example"></a>Example</h4>
3341
3342<div>
3343<div>
3344<pre class="source">CREATE INDEX gbSendTimeIdx ON GleambookMessages(sendTime: datetime?) TYPE BTREE ENFORCED;
3345</pre></div></div>
3346
3347<p>The following example creates a btree index called crpUserScrNameIdx on screenName, a nested field residing within a object-valued user field in the ChirpMessages dataset. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the nested screenName field. Such nested fields must be singular, i.e., one cannot index through (or on) an array-valued field.</p></div>
3348<div class="section">
3349<h4><a name="Example"></a>Example</h4>
3350
3351<div>
3352<div>
3353<pre class="source">CREATE INDEX crpUserScrNameIdx ON ChirpMessages(user.screenName) TYPE BTREE;
3354</pre></div></div>
3355
3356<p>The following example creates an rtree index called gbSenderLocIdx on the sender-location field of the GleambookMessages dataset. This index can be useful for accelerating queries that use the <a href="functions.html#spatial-intersect"><tt>spatial-intersect</tt> function</a> in a predicate involving the sender-location field.</p></div>
3357<div class="section">
3358<h4><a name="Example"></a>Example</h4>
3359
3360<div>
3361<div>
3362<pre class="source">CREATE INDEX gbSenderLocIndex ON GleambookMessages(&quot;sender-location&quot;) TYPE RTREE;
3363</pre></div></div>
3364
3365<p>The following example creates a 3-gram index called fbUserIdx on the name field of the GleambookUsers dataset. This index can be used to accelerate some similarity or substring maching queries on the name field. For details refer to the document on <a href="similarity.html#NGram_Index">similarity queries</a>.</p></div>
3366<div class="section">
3367<h4><a name="Example"></a>Example</h4>
3368
3369<div>
3370<div>
3371<pre class="source">CREATE INDEX fbUserIdx ON GleambookUsers(name) TYPE NGRAM(3);
3372</pre></div></div>
3373
3374<p>The following example creates a keyword index called fbMessageIdx on the message field of the GleambookMessages dataset. This keyword index can be used to optimize queries with token-based similarity predicates on the message field. For details refer to the document on <a href="similarity.html#Keyword_Index">similarity queries</a>.</p></div>
3375<div class="section">
3376<h4><a name="Example"></a>Example</h4>
3377
3378<div>
3379<div>
3380<pre class="source">CREATE INDEX fbMessageIdx ON GleambookMessages(message) TYPE KEYWORD;
3381</pre></div></div>
3382<!--
3383 ! Licensed to the Apache Software Foundation (ASF) under one
3384 ! or more contributor license agreements. See the NOTICE file
3385 ! distributed with this work for additional information
3386 ! regarding copyright ownership. The ASF licenses this file
3387 ! to you under the Apache License, Version 2.0 (the
3388 ! "License"); you may not use this file except in compliance
3389 ! with the License. You may obtain a copy of the License at
3390 !
3391 ! http://www.apache.org/licenses/LICENSE-2.0
3392 !
3393 ! Unless required by applicable law or agreed to in writing,
3394 ! software distributed under the License is distributed on an
3395 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3396 ! KIND, either express or implied. See the License for the
3397 ! specific language governing permissions and limitations
3398 ! under the License.
3399 !-->
3400
3401<p>The following example creates an open btree index called gbReadTimeIdx on the (non-predeclared) readTime field of the GleambookMessages dataset having datetime type. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the <tt>readTime</tt> field. The index is not enforced so that records that do not have the <tt>readTime</tt> field or have a mismatched type on the field can still be inserted into the dataset.</p></div>
3402<div class="section">
3403<h4><a name="Example"></a>Example</h4>
3404
3405<div>
3406<div>
3407<pre class="source">CREATE INDEX gbReadTimeIdx ON GleambookMessages(readTime: datetime?);
3408</pre></div></div>
3409<!--
3410 ! Licensed to the Apache Software Foundation (ASF) under one
3411 ! or more contributor license agreements. See the NOTICE file
3412 ! distributed with this work for additional information
3413 ! regarding copyright ownership. The ASF licenses this file
3414 ! to you under the Apache License, Version 2.0 (the
3415 ! "License"); you may not use this file except in compliance
3416 ! with the License. You may obtain a copy of the License at
3417 !
3418 ! http://www.apache.org/licenses/LICENSE-2.0
3419 !
3420 ! Unless required by applicable law or agreed to in writing,
3421 ! software distributed under the License is distributed on an
3422 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3423 ! KIND, either express or implied. See the License for the
3424 ! specific language governing permissions and limitations
3425 ! under the License.
3426 !-->
3427</div></div>
3428<div class="section">
3429<h3><a name="Functions" id="Functions"> Functions</a></h3>
3430<p>The create function statement creates a <b>named</b> function that can then be used and reused in SQL++ queries. The body of a function can be any SQL++ expression involving the function&#x2019;s parameters.</p>
3431
3432<div>
3433<div>
3434<pre class="source">FunctionSpecification ::= &quot;FUNCTION&quot; FunctionOrTypeName IfNotExists ParameterList &quot;{&quot; Expression &quot;}&quot;
3435</pre></div></div>
3436
3437<p>The following is an example of a CREATE FUNCTION statement which is similar to our earlier DECLARE FUNCTION example. It differs from that example in that it results in a function that is persistently registered by name in the specified dataverse (the current dataverse being used, if not otherwise specified).</p>
3438<div class="section">
3439<div class="section">
3440<h5><a name="Example"></a>Example</h5>
3441
3442<div>
3443<div>
3444<pre class="source">CREATE FUNCTION friendInfo(userId) {
3445 (SELECT u.id, u.name, len(u.friendIds) AS friendCount
3446 FROM GleambookUsers u
3447 WHERE u.id = userId)[0]
3448 };
3449</pre></div></div>
3450</div></div></div>
3451<div class="section">
3452<h3><a name="Removal" id="Removal"> Removal</a></h3>
3453
3454<div>
3455<div>
3456<pre class="source">DropStatement ::= &quot;DROP&quot; ( &quot;DATAVERSE&quot; Identifier IfExists
3457 | &quot;TYPE&quot; FunctionOrTypeName IfExists
3458 | &quot;DATASET&quot; QualifiedName IfExists
3459 | &quot;INDEX&quot; DoubleQualifiedName IfExists
3460 | &quot;FUNCTION&quot; FunctionSignature IfExists )
3461IfExists ::= ( &quot;IF&quot; &quot;EXISTS&quot; )?
3462</pre></div></div>
3463
3464<p>The DROP statement in SQL++ is the inverse of the CREATE statement. It can be used to drop dataverses, datatypes, datasets, indexes, and functions.</p>
3465<p>The following examples illustrate some uses of the DROP statement.</p>
3466<div class="section">
3467<div class="section">
3468<h5><a name="Example"></a>Example</h5>
3469
3470<div>
3471<div>
3472<pre class="source">DROP DATASET GleambookUsers IF EXISTS;
3473
3474DROP INDEX GleambookMessages.gbSenderLocIndex;
3475
3476DROP TYPE TinySocial2.GleambookUserType;
3477
3478DROP FUNCTION friendInfo@1;
3479
3480DROP DATAVERSE TinySocial;
3481</pre></div></div>
3482
3483<p>When an artifact is dropped, it will be droppped from the current dataverse if none is specified (see the DROP DATASET example above) or from the specified dataverse (see the DROP TYPE example above) if one is specified by fully qualifying the artifact name in the DROP statement. When specifying an index to drop, the index name must be qualified by the dataset that it indexes. When specifying a function to drop, since SQL++ allows functions to be overloaded by their number of arguments, the identifying name of the function to be dropped must explicitly include that information. (<tt>friendInfo@1</tt> above denotes the 1-argument function named friendInfo in the current dataverse.)</p></div></div></div>
3484<div class="section">
3485<h3><a name="Load_Statement"></a><a name="Load_statement" id="Load_statement">Load Statement</a></h3>
3486
3487<div>
3488<div>
3489<pre class="source">LoadStatement ::= &lt;LOAD&gt; &lt;DATASET&gt; QualifiedName &lt;USING&gt; AdapterName Configuration ( &lt;PRE-SORTED&gt; )?
3490</pre></div></div>
3491
3492<p>The LOAD statement is used to initially populate a dataset via bulk loading of data from an external file. An appropriate adapter must be selected to handle the nature of the desired external data. The LOAD statement accepts the same adapters and the same parameters as discussed earlier for External datasets. (See the <a href="externaldata.html">guide to external data</a> for more information on the available adapters.) If a dataset has an auto-generated primary key field, the file to be imported should not include that field in it.</p>
3493<p>The following example shows how to bulk load the GleambookUsers dataset from an external file containing data that has been prepared in ADM (Asterix Data Model) format.</p>
3494<div class="section">
3495<div class="section">
3496<h5><a name="Example"></a>Example</h5>
3497
3498<div>
3499<div>
3500<pre class="source"> LOAD DATASET GleambookUsers USING localfs
3501 ((&quot;path&quot;=&quot;127.0.0.1:///Users/bignosqlfan/tinysocialnew/gbu.adm&quot;),(&quot;format&quot;=&quot;adm&quot;));
3502</pre></div></div>
3503<!--
3504 ! Licensed to the Apache Software Foundation (ASF) under one
3505 ! or more contributor license agreements. See the NOTICE file
3506 ! distributed with this work for additional information
3507 ! regarding copyright ownership. The ASF licenses this file
3508 ! to you under the Apache License, Version 2.0 (the
3509 ! "License"); you may not use this file except in compliance
3510 ! with the License. You may obtain a copy of the License at
3511 !
3512 ! http://www.apache.org/licenses/LICENSE-2.0
3513 !
3514 ! Unless required by applicable law or agreed to in writing,
3515 ! software distributed under the License is distributed on an
3516 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3517 ! KIND, either express or implied. See the License for the
3518 ! specific language governing permissions and limitations
3519 ! under the License.
3520 !-->
3521</div></div></div></div>
3522<div class="section">
3523<h2><a name="Modification_statements" id="Modification_statements">Modification statements</a></h2>
3524<div class="section">
3525<h3><a name="INSERTs"></a><a name="Inserts" id="Inserts">INSERTs</a></h3>
3526
3527<div>
3528<div>
3529<pre class="source">InsertStatement ::= &lt;INSERT&gt; &lt;INTO&gt; QualifiedName Query
3530</pre></div></div>
3531
3532<p>The SQL++ INSERT statement is used to insert new data into a dataset. The data to be inserted comes from a SQL++ query expression. This expression can be as simple as a constant expression, or in general it can be any legal SQL++ query. If the target dataset has an auto-generated primary key field, the insert statement should not include a value for that field in it. (The system will automatically extend the provided object with this additional field and a corresponding value.) Insertion will fail if the dataset already has data with the primary key value(s) being inserted.</p>
3533<p>Inserts are processed transactionally by the system. The transactional scope of each insert transaction is the insertion of a single object plus its affiliated secondary index entries (if any). If the query part of an insert returns a single object, then the INSERT statement will be a single, atomic transaction. If the query part returns multiple objects, each object being inserted will be treated as a separate tranaction. The following example illustrates a query-based insertion.</p>
3534<div class="section">
3535<div class="section">
3536<h5><a name="Example"></a>Example</h5>
3537
3538<div>
3539<div>
3540<pre class="source">INSERT INTO UsersCopy (SELECT VALUE user FROM GleambookUsers user)
3541</pre></div></div>
3542</div></div></div>
3543<div class="section">
3544<h3><a name="UPSERTs"></a><a name="Upserts" id="Upserts">UPSERTs</a></h3>
3545
3546<div>
3547<div>
3548<pre class="source">UpsertStatement ::= &lt;UPSERT&gt; &lt;INTO&gt; QualifiedName Query
3549</pre></div></div>
3550
3551<p>The SQL++ UPSERT statement syntactically mirrors the INSERT statement discussed above. The difference lies in its semantics, which for UPSERT are &#x201c;add or replace&#x201d; instead of the INSERT &#x201c;add if not present, else error&#x201d; semantics. Whereas an INSERT can fail if another object already exists with the specified key, the analogous UPSERT will replace the previous object&#x2019;s value with that of the new object in such cases.</p>
3552<p>The following example illustrates a query-based upsert operation.</p>
3553<div class="section">
3554<div class="section">
3555<h5><a name="Example"></a>Example</h5>
3556
3557<div>
3558<div>
3559<pre class="source">UPSERT INTO UsersCopy (SELECT VALUE user FROM GleambookUsers user)
3560</pre></div></div>
3561
3562<p>*Editor&#x2019;s note: Upserts currently work in AQL but are not yet enabled (at the moment) in SQL++.</p></div></div></div>
3563<div class="section">
3564<h3><a name="DELETEs"></a><a name="Deletes" id="Deletes">DELETEs</a></h3>
3565
3566<div>
3567<div>
3568<pre class="source">DeleteStatement ::= &lt;DELETE&gt; &lt;FROM&gt; QualifiedName ( ( &lt;AS&gt; )? Variable )? ( &lt;WHERE&gt; Expression )?
3569</pre></div></div>
3570
3571<p>The SQL++ DELETE statement is used to delete data from a target dataset. The data to be deleted is identified by a boolean expression involving the variable bound to the target dataset in the DELETE statement.</p>
3572<p>Deletes are processed transactionally by the system. The transactional scope of each delete transaction is the deletion of a single object plus its affiliated secondary index entries (if any). If the boolean expression for a delete identifies a single object, then the DELETE statement itself will be a single, atomic transaction. If the expression identifies multiple objects, then each object deleted will be handled as a separate transaction.</p>
3573<p>The following examples illustrate single-object deletions.</p>
3574<div class="section">
3575<div class="section">
3576<h5><a name="Example"></a>Example</h5>
3577
3578<div>
3579<div>
3580<pre class="source">DELETE FROM GleambookUsers user WHERE user.id = 8;
3581</pre></div></div>
3582</div>
3583<div class="section">
3584<h5><a name="Example"></a>Example</h5>
3585
3586<div>
3587<div>
3588<pre class="source">DELETE FROM GleambookUsers WHERE id = 5;
3589</pre></div></div>
3590<!--
3591 ! Licensed to the Apache Software Foundation (ASF) under one
3592 ! or more contributor license agreements. See the NOTICE file
3593 ! distributed with this work for additional information
3594 ! regarding copyright ownership. The ASF licenses this file
3595 ! to you under the Apache License, Version 2.0 (the
3596 ! "License"); you may not use this file except in compliance
3597 ! with the License. You may obtain a copy of the License at
3598 !
3599 ! http://www.apache.org/licenses/LICENSE-2.0
3600 !
3601 ! Unless required by applicable law or agreed to in writing,
3602 ! software distributed under the License is distributed on an
3603 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3604 ! KIND, either express or implied. See the License for the
3605 ! specific language governing permissions and limitations
3606 ! under the License.
3607 !-->
3608
3609<h1><a name="Reserved_keywords" id="Reserved_keywords">Appendix 1. Reserved keywords</a></h1><!--
3610 ! Licensed to the Apache Software Foundation (ASF) under one
3611 ! or more contributor license agreements. See the NOTICE file
3612 ! distributed with this work for additional information
3613 ! regarding copyright ownership. The ASF licenses this file
3614 ! to you under the Apache License, Version 2.0 (the
3615 ! "License"); you may not use this file except in compliance
3616 ! with the License. You may obtain a copy of the License at
3617 !
3618 ! http://www.apache.org/licenses/LICENSE-2.0
3619 !
3620 ! Unless required by applicable law or agreed to in writing,
3621 ! software distributed under the License is distributed on an
3622 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3623 ! KIND, either express or implied. See the License for the
3624 ! specific language governing permissions and limitations
3625 ! under the License.
3626 !-->
3627
3628<p>All reserved keywords are listed in the following table:</p>
3629<table border="0" class="table table-striped">
3630<thead>
3631
3632<tr class="a">
3633<th> </th>
3634<th> </th>
3635<th> </th>
3636<th> </th>
3637<th> </th>
3638<th> </th></tr>
3639</thead><tbody>
3640
3641<tr class="b">
3642<td> AND </td>
3643<td> ANY </td>
3644<td> APPLY </td>
3645<td> AS </td>
3646<td> ASC </td>
3647<td> AT </td></tr>
3648<tr class="a">
3649<td> AUTOGENERATED </td>
3650<td> BETWEEN </td>
3651<td> BTREE </td>
3652<td> BY </td>
3653<td> CASE </td>
3654<td> CLOSED </td></tr>
3655<tr class="b">
3656<td> CREATE </td>
3657<td> COMPACTION </td>
3658<td> COMPACT </td>
3659<td> CONNECT </td>
3660<td> CORRELATE </td>
3661<td> DATASET </td></tr>
3662<tr class="a">
3663<td> COLLECTION </td>
3664<td> DATAVERSE </td>
3665<td> DECLARE </td>
3666<td> DEFINITION </td>
3667<td> DECLARE </td>
3668<td> DEFINITION </td></tr>
3669<tr class="b">
3670<td> DELETE </td>
3671<td> DESC </td>
3672<td> DISCONNECT </td>
3673<td> DISTINCT </td>
3674<td> DROP </td>
3675<td> ELEMENT </td></tr>
3676<tr class="a">
3677<td> ELEMENT </td>
3678<td> EXPLAIN </td>
3679<td> ELSE </td>
3680<td> ENFORCED </td>
3681<td> END </td>
3682<td> EVERY </td></tr>
3683<tr class="b">
3684<td> EXCEPT </td>
3685<td> EXIST </td>
3686<td> EXTERNAL </td>
3687<td> FEED </td>
3688<td> FILTER </td>
3689<td> FLATTEN </td></tr>
3690<tr class="a">
3691<td> FOR </td>
3692<td> FROM </td>
3693<td> FULL </td>
3694<td> FUNCTION </td>
3695<td> GROUP </td>
3696<td> HAVING </td></tr>
3697<tr class="b">
3698<td> HINTS </td>
3699<td> IF </td>
3700<td> INTO </td>
3701<td> IN </td>
3702<td> INDEX </td>
3703<td> INGESTION </td></tr>
3704<tr class="a">
3705<td> INNER </td>
3706<td> INSERT </td>
3707<td> INTERNAL </td>
3708<td> INTERSECT </td>
3709<td> IS </td>
3710<td> JOIN </td></tr>
3711<tr class="b">
3712<td> KEYWORD </td>
3713<td> LEFT </td>
3714<td> LETTING </td>
3715<td> LET </td>
3716<td> LIKE </td>
3717<td> LIMIT </td></tr>
3718<tr class="a">
3719<td> LOAD </td>
3720<td> NODEGROUP </td>
3721<td> NGRAM </td>
3722<td> NOT </td>
3723<td> OFFSET </td>
3724<td> ON </td></tr>
3725<tr class="b">
3726<td> OPEN </td>
3727<td> OR </td>
3728<td> ORDER </td>
3729<td> OUTER </td>
3730<td> OUTPUT </td>
3731<td> PATH </td></tr>
3732<tr class="a">
3733<td> POLICY </td>
3734<td> PRE-SORTED </td>
3735<td> PRIMARY </td>
3736<td> RAW </td>
3737<td> REFRESH </td>
3738<td> RETURN </td></tr>
3739<tr class="b">
3740<td> RTREE </td>
3741<td> RUN </td>
3742<td> SATISFIES </td>
3743<td> SECONDARY </td>
3744<td> SELECT </td>
3745<td> SET </td></tr>
3746<tr class="a">
3747<td> SOME </td>
3748<td> TEMPORARY </td>
3749<td> THEN </td>
3750<td> TYPE </td>
3751<td> UNKNOWN </td>
3752<td> UNNEST </td></tr>
3753<tr class="b">
3754<td> UPDATE </td>
3755<td> USE </td>
3756<td> USING </td>
3757<td> VALUE </td>
3758<td> WHEN </td>
3759<td> WHERE </td></tr>
3760<tr class="a">
3761<td> WITH </td>
3762<td> WRITE </td>
3763<td> </td>
3764<td> </td>
3765<td> </td>
3766<td> </td></tr>
3767</tbody>
3768</table><!--
3769 ! Licensed to the Apache Software Foundation (ASF) under one
3770 ! or more contributor license agreements. See the NOTICE file
3771 ! distributed with this work for additional information
3772 ! regarding copyright ownership. The ASF licenses this file
3773 ! to you under the Apache License, Version 2.0 (the
3774 ! "License"); you may not use this file except in compliance
3775 ! with the License. You may obtain a copy of the License at
3776 !
3777 ! http://www.apache.org/licenses/LICENSE-2.0
3778 !
3779 ! Unless required by applicable law or agreed to in writing,
3780 ! software distributed under the License is distributed on an
3781 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3782 ! KIND, either express or implied. See the License for the
3783 ! specific language governing permissions and limitations
3784 ! under the License.
3785 !-->
3786</div></div></div></div>
3787<div class="section">
3788<h2><a name="Appendix_2._Performance_Tuning"></a><a name="Performance_tuning" id="Performance_tuning">Appendix 2. Performance Tuning</a></h2><!--
3789 ! Licensed to the Apache Software Foundation (ASF) under one
3790 ! or more contributor license agreements. See the NOTICE file
3791 ! distributed with this work for additional information
3792 ! regarding copyright ownership. The ASF licenses this file
3793 ! to you under the Apache License, Version 2.0 (the
3794 ! "License"); you may not use this file except in compliance
3795 ! with the License. You may obtain a copy of the License at
3796 !
3797 ! http://www.apache.org/licenses/LICENSE-2.0
3798 !
3799 ! Unless required by applicable law or agreed to in writing,
3800 ! software distributed under the License is distributed on an
3801 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3802 ! KIND, either express or implied. See the License for the
3803 ! specific language governing permissions and limitations
3804 ! under the License.
3805 !-->
3806
3807<p>The SET statement can be used to override some cluster-wide configuration parameters for a specific request:</p>
3808
3809<div>
3810<div>
3811<pre class="source">SET &lt;IDENTIFIER&gt; &lt;STRING_LITERAL&gt;
3812</pre></div></div>
3813
3814<p>As parameter identifiers are qualified names (containing a &#x2018;.&#x2019;) they have to be escaped using backticks (``). Note that changing query parameters will not affect query correctness but only impact performance characteristics, such as response time and throughput.</p></div>
3815<div class="section">
3816<h2><a name="Parallelism_Parameter"></a><a name="Parallelism_parameter" id="Parallelism_parameter">Parallelism Parameter</a></h2>
3817<p>The system can execute each request using multiple cores on multiple machines (a.k.a., partitioned parallelism) in a cluster. A user can manually specify the maximum execution parallelism for a request to scale it up and down using the following parameter:</p>
3818<ul>
3819
3820<li><b>compiler.parallelism</b>: the maximum number of CPU cores can be used to process a query. There are three cases of the value <i>p</i> for compiler.parallelism:
3821<ul>
3822
3823<li>
3824
3825<p><i>p</i> &lt; 0 or <i>p</i> &gt; the total number of cores in a cluster: the system will use all available cores in the cluster;</p>
3826</li>
3827<li>
3828
3829<p><i>p</i> = 0 (the default): the system will use the storage parallelism (the number of partitions of stored datasets) as the maximum parallelism for query processing;</p>
3830</li>
3831<li>
3832
3833<p>all other cases: the system will use the user-specified number as the maximum number of CPU cores to use for executing the query.</p>
3834</li>
3835</ul>
3836</li>
3837</ul>
3838<div class="section">
3839<div class="section">
3840<div class="section">
3841<h5><a name="Example"></a>Example</h5>
3842
3843<div>
3844<div>
3845<pre class="source">SET `compiler.parallelism` &quot;16&quot;;
3846
3847SELECT u.name AS uname, m.message AS message
3848FROM GleambookUsers u JOIN GleambookMessages m ON m.authorId = u.id;
3849</pre></div></div>
3850</div></div></div></div>
3851<div class="section">
3852<h2><a name="Memory_Parameters"></a><a name="Memory_parameters" id="Memory_parameters">Memory Parameters</a></h2>
3853<p>In the system, each blocking runtime operator such as join, group-by and order-by works within a fixed memory budget, and can gracefully spill to disks if the memory budget is smaller than the amount of data they have to hold. A user can manually configure the memory budget of those operators within a query. The supported configurable memory parameters are:</p>
3854<ul>
3855
3856<li>
3857
3858<p><b>compiler.groupmemory</b>: the memory budget that each parallel group-by operator instance can use; 32MB is the default budget.</p>
3859</li>
3860<li>
3861
3862<p><b>compiler.sortmemory</b>: the memory budget that each parallel sort operator instance can use; 32MB is the default budget.</p>
3863</li>
3864<li>
3865
3866<p><b>compiler.joinmemory</b>: the memory budget that each parallel hash join operator instance can use; 32MB is the default budget.</p>
3867</li>
3868</ul>
3869<p>For each memory budget value, you can use a 64-bit integer value with a 1024-based binary unit suffix (for example, B, KB, MB, GB). If there is no user-provided suffix, &#x201c;B&#x201d; is the default suffix. See the following examples.</p>
3870<div class="section">
3871<div class="section">
3872<div class="section">
3873<h5><a name="Example"></a>Example</h5>
3874
3875<div>
3876<div>
3877<pre class="source">SET `compiler.groupmemory` &quot;64MB&quot;;
3878
3879SELECT msg.authorId, COUNT(*)
3880FROM GleambookMessages msg
3881GROUP BY msg.authorId;
3882</pre></div></div>
3883</div>
3884<div class="section">
3885<h5><a name="Example"></a>Example</h5>
3886
3887<div>
3888<div>
3889<pre class="source">SET `compiler.sortmemory` &quot;67108864&quot;;
3890
3891SELECT VALUE user
3892FROM GleambookUsers AS user
3893ORDER BY ARRAY_LENGTH(user.friendIds) DESC;
3894</pre></div></div>
3895</div>
3896<div class="section">
3897<h5><a name="Example"></a>Example</h5>
3898
3899<div>
3900<div>
3901<pre class="source">SET `compiler.joinmemory` &quot;132000KB&quot;;
3902
3903SELECT u.name AS uname, m.message AS message
3904FROM GleambookUsers u JOIN GleambookMessages m ON m.authorId = u.id;
3905</pre></div></div>
3906<!--
3907 ! Licensed to the Apache Software Foundation (ASF) under one
3908 ! or more contributor license agreements. See the NOTICE file
3909 ! distributed with this work for additional information
3910 ! regarding copyright ownership. The ASF licenses this file
3911 ! to you under the Apache License, Version 2.0 (the
3912 ! "License"); you may not use this file except in compliance
3913 ! with the License. You may obtain a copy of the License at
3914 !
3915 ! http://www.apache.org/licenses/LICENSE-2.0
3916 !
3917 ! Unless required by applicable law or agreed to in writing,
3918 ! software distributed under the License is distributed on an
3919 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3920 ! KIND, either express or implied. See the License for the
3921 ! specific language governing permissions and limitations
3922 ! under the License.
3923 !-->
3924</div></div></div></div>
3925<div class="section">
3926<h2><a name="Controlling_Index-Only-Plan_Parameter"></a><a name="Index_Only" id="Index_Only">Controlling Index-Only-Plan Parameter</a></h2>
3927<p>By default, the system tries to build an index-only plan whenever utilizing a secondary index is possible. For example, if a SELECT or JOIN query can utilize an enforced B+Tree or R-Tree index on a field, the optimizer checks whether a secondary-index search alone can generate the result that the query asks for. It mainly checks two conditions: (1) predicates used in WHERE only uses the primary key field and/or secondary key field and (2) the result does not return any other fields. If these two conditions hold, it builds an index-only plan. Since an index-only plan only searches a secondary-index to answer a query, it is faster than a non-index-only plan that needs to search the primary index. However, this index-only plan can be turned off per query by setting the following parameter.</p>
3928<ul>
3929
3930<li><b>noindexonly</b>: if this is set to true, the index-only-plan will not be applied; the default value is false.</li>
3931</ul>
3932<div class="section">
3933<div class="section">
3934<div class="section">
3935<h5><a name="Example"></a>Example</h5>
3936
3937<div>
3938<div>
3939<pre class="source">SET noindexonly 'true';
3940
3941SELECT m.message AS message
3942FROM GleambookMessages m where m.message = &quot; love product-b its shortcut-menu is awesome:)&quot;;
3943</pre></div></div>
3944<!--
3945 ! Licensed to the Apache Software Foundation (ASF) under one
3946 ! or more contributor license agreements. See the NOTICE file
3947 ! distributed with this work for additional information
3948 ! regarding copyright ownership. The ASF licenses this file
3949 ! to you under the Apache License, Version 2.0 (the
3950 ! "License"); you may not use this file except in compliance
3951 ! with the License. You may obtain a copy of the License at
3952 !
3953 ! http://www.apache.org/licenses/LICENSE-2.0
3954 !
3955 ! Unless required by applicable law or agreed to in writing,
3956 ! software distributed under the License is distributed on an
3957 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3958 ! KIND, either express or implied. See the License for the
3959 ! specific language governing permissions and limitations
3960 ! under the License.
3961 !-->
3962</div></div></div></div>
3963<div class="section">
3964<h2><a name="Appendix_3._Variable_Bindings_and_Name_Resolution"></a><a name="Variable_bindings_and_name_resolution" id="Variable_bindings_and_name_resolution">Appendix 3. Variable Bindings and Name Resolution</a></h2><!--
3965 ! Licensed to the Apache Software Foundation (ASF) under one
3966 ! or more contributor license agreements. See the NOTICE file
3967 ! distributed with this work for additional information
3968 ! regarding copyright ownership. The ASF licenses this file
3969 ! to you under the Apache License, Version 2.0 (the
3970 ! "License"); you may not use this file except in compliance
3971 ! with the License. You may obtain a copy of the License at
3972 !
3973 ! http://www.apache.org/licenses/LICENSE-2.0
3974 !
3975 ! Unless required by applicable law or agreed to in writing,
3976 ! software distributed under the License is distributed on an
3977 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
3978 ! KIND, either express or implied. See the License for the
3979 ! specific language governing permissions and limitations
3980 ! under the License.
3981 !-->
3982
3983<p>In this Appendix, we&#x2019;ll look at how variables are bound and how names are resolved. Names can appear in every clause of a query. Sometimes a name consists of just a single identifier, e.g., <tt>region</tt> or <tt>revenue</tt>. More often a name will consist of two identifiers separated by a dot, e.g., <tt>customer.address</tt>. Occasionally a name may have more than two identifiers, e.g., <tt>policy.owner.address.zipcode</tt>. <i>Resolving</i> a name means determining exactly what the (possibly multi-part) name refers to. It is necessary to have well-defined rules for how to resolve a name in cases of ambiguity. (In the absence of schemas, such cases arise more commonly, and also differently, than they do in SQL.)</p>
3984<p>The basic job of each clause in a query block is to bind variables. Each clause sees the variables bound by previous clauses and may bind additional variables. Names are always resolved with respect to the variables that are bound (&#x201c;in scope&#x201d;) at the place where the name use in question occurs. It is possible that the name resolution process will fail, which may lead to an empty result or an error message.</p>
3985<p>One important bit of background: When the system is reading a query and resolving its names, it has a list of all the available dataverses and datasets. As a result, it knows whether <tt>a.b</tt> is a valid name for dataset <tt>b</tt> in dataverse <tt>a</tt>. However, the system does not in general have knowledge of the schemas of the data inside the datasets; remember that this is a much more open world. As a result, in general the system cannot know whether any object in a particular dataset will have a field named <tt>c</tt>. These assumptions affect how errors are handled. If you try to access dataset <tt>a.b</tt> and no dataset by that name exists, you will get an error and your query will not run. However, if you try to access a field <tt>c</tt> in a collection of objects, your query will run and return <tt>missing</tt> for each object that doesn&#x2019;t have a field named <tt>c</tt> &#x2013; this is because it&#x2019;s possible that some object (someday) could have such a field.</p></div>
3986<div class="section">
3987<h2><a name="Binding_Variables"></a><a name="Binding_variables" id="Binding_variables">Binding Variables</a></h2>
3988<p>Variables can be bound in the following ways:</p>
3989<ol style="list-style-type: decimal">
3990
3991<li>
3992
3993<p>WITH and LET clauses bind a variable to the result of an expression in a straightforward way</p>
3994<p>Examples:</p>
3995<p><tt>WITH cheap_parts AS (SELECT partno FROM parts WHERE price &lt; 100)</tt> binds the variable <tt>cheap_parts</tt> to the result of the subquery.</p>
3996<p><tt>LET pay = salary + bonus</tt> binds the variable <tt>pay</tt> to the result of evaluating the expression <tt>salary + bonus</tt>.</p>
3997</li>
3998<li>
3999
4000<p>FROM, GROUP BY, and SELECT clauses have optional AS subclauses that contain an expression and a name (called an <i>iteration variable</i> in a FROM clause, or an alias in GROUP BY or SELECT.)</p>
4001<p>Examples:</p>
4002<p><tt>FROM customer AS c, order AS o</tt></p>
4003<p><tt>GROUP BY salary + bonus AS total_pay</tt></p>
4004<p><tt>SELECT MAX(price) AS highest_price</tt></p>
4005<p>An AS subclause always binds the name (as a variable) to the result of the expression (or, in the case of a FROM clause, to the <i>individual members</i> of the collection identified by the expression.)</p>
4006<p>It&#x2019;s always a good practice to use the keyword AS when defining an alias or iteration variable. However, as in SQL, the syntax allows the keyword AS to be omitted. For example, the FROM clause above could have been written like this:</p>
4007<p><tt>FROM customer c, order o</tt></p>
4008<p>Omitting the keyword AS does not affect the binding of variables. The FROM clause in this example binds variables c and o whether the keyword AS is used or not.</p>
4009<p>In certain cases, a variable is automatically bound even if no alias or variable-name is specified. Whenever an expression could have been followed by an AS subclause, if the expression consists of a simple name or a path expression, that expression binds a variable whose name is the same as the simple name or the last step in the path expression. Here are some examples:</p>
4010<p><tt>FROM customer, order</tt> binds iteration variables named <tt>customer</tt> and <tt>order</tt></p>
4011<p><tt>GROUP BY address.zipcode</tt> binds a variable named <tt>zipcode</tt></p>
4012<p><tt>SELECT item[0].price</tt> binds a variable named <tt>price</tt></p>
4013<p>Note that a FROM clause iterates over a collection (usually a dataset), binding a variable to each member of the collection in turn. The name of the collection remains in scope, but it is not a variable. For example, consider this FROM clause used in a self-join:</p>
4014<p><tt>FROM customer AS c1, customer AS c2</tt></p>
4015<p>This FROM clause joins the customer dataset to itself, binding the iteration variables c1 and c2 to objects in the left-hand-side and right-hand-side of the join, respectively. After the FROM clause, c1 and c2 are in scope as variables, and customer remains accessible as a dataset name but not as a variable.</p>
4016</li>
4017<li>
4018
4019<p>Special rules for GROUP BY:</p>
4020<ol style="list-style-type: decimal">
4021
4022<li>
4023
4024<p>If a GROUP BY clause specifies an expression that has no explicit alias, it binds a pseudo-variable that is lexicographically identical to the expression itself. For example:</p>
4025<p><tt>GROUP BY salary + bonus</tt> binds a pseudo-variable named <tt>salary + bonus</tt>.</p>
4026<p>This rule allows subsequent clauses to refer to the grouping expression (salary + bonus) even though its constituent variables (salary and bonus) are no longer in scope. For example, the following query is valid:</p>
4027
4028<div>
4029<div>
4030<pre class="source">FROM employee
4031GROUP BY salary + bonus
4032HAVING salary + bonus &gt; 1000
4033SELECT salary + bonus, COUNT(*) AS how_many
4034</pre></div></div>
4035
4036<p>While it might have been more elegant to explicitly require an alias in cases like this, the pseudo-variable rule is retained for SQL compatibility. Note that the expression <tt>salary + bonus</tt> is not <i>actually</i> evaluated in the HAVING and SELECT clauses (and could not be since <tt>salary</tt> and <tt>bonus</tt> are no longer individually in scope). Instead, the expression <tt>salary + bonus</tt> is treated as a reference to the pseudo-variable defined in the GROUP BY clause.</p>
4037</li>
4038<li>
4039
4040<p>A GROUP BY clause may be followed by a GROUP AS clause that binds a variable to the group. The purpose of this variable is to make the individual objects inside the group visible to subqueries that may need to iterate over them.</p>
4041<p>The GROUP AS variable is bound to a multiset of objects. Each object represents one of the members of the group. Since the group may have been formed from a join, each of the member-objects contains a nested object for each variable bound by the nearest FROM clause (and its LET subclause, if any). These nested objects, in turn, contain the actual fields of the group-member. To understand this process, consider the following query fragment:</p>
4042
4043<div>
4044<div>
4045<pre class="source">FROM parts AS p, suppliers AS s
4046WHERE p.suppno = s.suppno
4047GROUP BY p.color GROUP AS g
4048</pre></div></div>
4049
4050<p>Suppose that the objects in <tt>parts</tt> have fields <tt>partno</tt>, <tt>color</tt>, and <tt>suppno</tt>. Suppose that the objects in suppliers have fields <tt>suppno</tt> and <tt>location</tt>.</p>
4051<p>Then, for each group formed by the GROUP BY, the variable g will be bound to a multiset with the following structure:</p>
4052
4053<div>
4054<div>
4055<pre class="source">[ { &quot;p&quot;: { &quot;partno&quot;: &quot;p1&quot;, &quot;color&quot;: &quot;red&quot;, &quot;suppno&quot;: &quot;s1&quot; },
4056 &quot;s&quot;: { &quot;suppno&quot;: &quot;s1&quot;, &quot;location&quot;: &quot;Denver&quot; } },
4057 { &quot;p&quot;: { &quot;partno&quot;: &quot;p2&quot;, &quot;color&quot;: &quot;red&quot;, &quot;suppno&quot;: &quot;s2&quot; },
4058 &quot;s&quot;: { &quot;suppno&quot;: &quot;s2&quot;, &quot;location&quot;: &quot;Atlanta&quot; } },
4059 ...
4060]
4061</pre></div></div>
4062</li>
4063</ol>
4064</li>
4065</ol></div>
4066<div class="section">
4067<h2><a name="Scoping" id="Scoping">Scoping</a></h2>
4068<p>In general, the variables that are in scope at a particular position are those variables that were bound earlier in the current query block, in outer (enclosing) query blocks, or in a WITH clause at the beginning of the query. More specific rules follow.</p>
4069<p>The clauses in a query block are conceptually processed in the following order:</p>
4070<ul>
4071
4072<li>FROM (followed by LET subclause, if any)</li>
4073<li>WHERE</li>
4074<li>GROUP BY (followed by LET subclause, if any)</li>
4075<li>HAVING</li>
4076<li>SELECT or SELECT VALUE</li>
4077<li>ORDER BY</li>
4078<li>OFFSET</li>
4079<li>LIMIT</li>
4080</ul>
4081<p>During processing of each clause, the variables that are in scope are those variables that are bound in the following places:</p>
4082<ol style="list-style-type: decimal">
4083
4084<li>
4085
4086<p>In earlier clauses of the same query block (as defined by the ordering given above).</p>
4087<p>Example: <tt>FROM orders AS o SELECT o.date</tt> The variable <tt>o</tt> in the SELECT clause is bound, in turn, to each object in the dataset <tt>orders</tt>.</p>
4088</li>
4089<li>
4090
4091<p>In outer query blocks in which the current query block is nested. In case of duplication, the innermost binding wins.</p>
4092</li>
4093<li>
4094
4095<p>In the WITH clause (if any) at the beginning of the query.</p>
4096</li>
4097</ol>
4098<p>However, in a query block where a GROUP BY clause is present:</p>
4099<ol style="list-style-type: decimal">
4100
4101<li>
4102
4103<p>In clauses processed before GROUP BY, scoping rules are the same as though no GROUP BY were present.</p>
4104</li>
4105<li>
4106
4107<p>In clauses processed after GROUP BY, the variables bound in the nearest FROM-clause (and its LET subclause, if any) are removed from scope and replaced by the variables bound in the GROUP BY clause (and its LET subclause, if any). However, this replacement does not apply inside the arguments of the five SQL special aggregating functions (MIN, MAX, AVG, SUM, and COUNT). These functions still need to see the individual data items over which they are computing an aggregation. For example, after <tt>FROM employee AS e GROUP BY deptno</tt>, it would not be valid to reference <tt>e.salary</tt>, but <tt>AVG(e.salary)</tt> would be valid.</p>
4108</li>
4109</ol>
4110<p>Special case: In an expression inside a FROM clause, a variable is in scope if it was bound in an earlier expression in the same FROM clause. Example:</p>
4111
4112<div>
4113<div>
4114<pre class="source">FROM orders AS o, o.items AS i
4115</pre></div></div>
4116
4117<p>The reason for this special case is to support iteration over nested collections.</p>
4118<p>Note that, since the SELECT clause comes <i>after</i> the WHERE and GROUP BY clauses in conceptual processing order, any variables defined in SELECT are not visible in WHERE or GROUP BY. Therefore the following query will not return what might be the expected result (since in the WHERE clause, <tt>pay</tt> will be interpreted as a field in the <tt>emp</tt> object rather than as the computed value <tt>salary + bonus</tt>):</p>
4119
4120<div>
4121<div>
4122<pre class="source">SELECT name, salary + bonus AS pay
4123FROM emp
4124WHERE pay &gt; 1000
4125ORDER BY pay
4126</pre></div></div>
4127
4128<p>The likely intent of the query above can be accomplished as follows:</p>
4129
4130<div>
4131<div>
4132<pre class="source">FROM emp AS e
4133LET pay = e.salary + e.bonus
4134WHERE pay &gt; 1000
4135SELECT e.name, pay
4136ORDER BY pay
4137</pre></div></div>
4138</div>
4139<div class="section">
4140<h2><a name="Resolving_Names"></a><a name="Resolving_names" id="Resolving_names">Resolving Names</a></h2>
4141<p>The process of name resolution begins with the leftmost identifier in the name. The rules for resolving the leftmost identifier are:</p>
4142<ol style="list-style-type: decimal">
4143
4144<li>
4145
4146<p><i>In a FROM clause</i>: Names in a FROM clause identify the collections over which the query block will iterate. These collections may be stored datasets or may be the results of nested query blocks. A stored dataset may be in a named dataverse or in the default dataverse. Thus, if the two-part name <tt>a.b</tt> is in a FROM clause, a might represent a dataverse and <tt>b</tt> might represent a dataset in that dataverse. Another example of a two-part name in a FROM clause is <tt>FROM orders AS o, o.items AS i</tt>. In <tt>o.items</tt>, <tt>o</tt> represents an order object bound earlier in the FROM clause, and items represents the items object inside that order.</p>
4147<p>The rules for resolving the leftmost identifier in a FROM clause (including a JOIN subclause), or in the expression following IN in a quantified predicate, are as follows:</p>
4148<ol style="list-style-type: decimal">
4149
4150<li>
4151
4152<p>If the identifier matches a variable-name that is in scope, it resolves to the binding of that variable. (Note that in the case of a subquery, an in-scope variable might have been bound in an outer query block; this is called a correlated subquery.)</p>
4153</li>
4154<li>
4155
4156<p>Otherwise, if the identifier is the first part of a two-part name like <tt>a.b</tt>, the name is treated as dataverse.dataset. If the identifier stands alone as a one-part name, it is treated as the name of a dataset in the default dataverse. An error will result if the designated dataverse or dataset does not exist.</p>
4157</li>
4158</ol>
4159</li>
4160<li>
4161
4162<p><i>Elsewhere in a query block</i>: In clauses other than FROM, a name typically identifies a field of some object. For example, if the expression <tt>a.b</tt> is in a SELECT or WHERE clause, it&#x2019;s likely that <tt>a</tt> represents an object and <tt>b</tt> represents a field in that object.</p>
4163<p>The rules for resolving the leftmost identifier in clauses other than the ones listed in Rule 1 are:</p>
4164<ol style="list-style-type: decimal">
4165
4166<li>
4167
4168<p>If the identifier matches a variable-name that is in scope, it resolves to the binding of that variable. (In the case of a correlated subquery, the in-scope variable might have been bound in an outer query block.)</p>
4169</li>
4170<li>
4171
4172<p>(The &#x201c;Single Variable Rule&#x201d;): Otherwise, if the FROM clause (or a LET clause if there is no FROM clause) in the current query block binds exactly one variable, the identifier is treated as a field access on the object bound to that variable. For example, in the query <tt>FROM customer SELECT address</tt>, the identifier address is treated as a field in the object bound to the variable customer. At runtime, if the object bound to customer has no <tt>address</tt> field, the <tt>address</tt> expression will return <tt>missing</tt>. If the FROM clause (and its LET subclause, if any) in the current query block binds multiple variables, name resolution fails with an &#x201c;ambiguous name&#x201d; error. Note that the Single Variable Rule searches for bound variables only in the current query block, not in outer (containing) blocks. The purpose of this rule is to permit the compiler to resolve field-references unambiguously without relying on any schema information.</p>
4173<p>Exception: In a query that has a GROUP BY clause, the Single Variable Rule does not apply in any clauses that occur after the GROUP BY because, in these clauses, the variables bound by the FROM clause are no longer in scope. In clauses after GROUP BY, only Rule 2.1 applies.</p>
4174</li>
4175</ol>
4176</li>
4177<li>
4178
4179<p>In an ORDER BY clause following a UNION ALL expression:</p>
4180<p>The leftmost identifier is treated as a field-access on the objects that are generated by the UNION ALL. For example:</p>
4181
4182<div>
4183<div>
4184<pre class="source">query-block-1
4185UNION ALL
4186query-block-2
4187ORDER BY salary
4188</pre></div></div>
4189
4190<p>In the result of this query, objects that have a foo field will be ordered by the value of this field; objects that have no foo field will appear at at the beginning of the query result (in ascending order) or at the end (in descending order.)</p>
4191</li>
4192<li>
4193
4194<p>Once the leftmost identifier has been resolved, the following dots and identifiers in the name (if any) are treated as a path expression that navigates to a field nested inside that object. The name resolves to the field at the end of the path. If this field does not exist, the value <tt>missing</tt> is returned.</p>
4195</li>
4196</ol></div>
4197 </div>
4198 </div>
4199 </div>
4200 <hr/>
4201 <footer>
4202 <div class="container-fluid">
4203 <div class="row-fluid">
4204<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
4205 feather logo, and the Apache AsterixDB project logo are either
4206 registered trademarks or trademarks of The Apache Software
4207 Foundation in the United States and other countries.
4208 All other marks mentioned may be trademarks or registered
4209 trademarks of their respective owners.
4210 </div>
4211 </div>
4212 </div>
4213 </footer>
4214 </body>
4215</html>