Merge branch 'master' of https://code.google.com/p/asterixdb
BECAUSE GIT IS MAKING ME?
diff --git a/asterix-doc/src/site/markdown/AsterixQueryLanguageReference.md b/asterix-doc/src/site/markdown/AsterixQueryLanguageReference.md
index 1ed876e..319212a 100644
--- a/asterix-doc/src/site/markdown/AsterixQueryLanguageReference.md
+++ b/asterix-doc/src/site/markdown/AsterixQueryLanguageReference.md
@@ -1,13 +1,32 @@
# The Asterix Query Language, Version 1.0
## 1. Introduction
-This document provides an overview of the Asterix Query language.
-
+This document is intended to serve as a reference-style guide to the full syntax
+and semantics of the Asterix Query Language (AQL), the language for talking to AsterixDB.
+This guide covers both the data manipulation language (DML) aspects of AQL, including
+its support for queries and data modification, as well as its data definition language
+(DDL) aspects.
+ New AsterixDB users are encouraged to read and work through the (friendlier) guide
+"AsterixDB 101: An ADM and AQL Primer" before attempting to make use of this document.
+In addition, readers are advised to read and understand the Asterix Data Model (ADM)
+reference guide, as a basic understanding of ADM is a prerequisite to understanding AQL.
+In what follows, we detail the features of the AQL language in a grammar-guided manner:
+We list and briefly explain each of the productions in the AQL grammar, offering brief
+examples for clarity in cases where doing so seems needed.
## 2. Expressions
- Expression ::= ( OperatorExpr | IfThenElse | FLWOGR | QuantifiedExpression )
+ Expression ::= ( OperatorExpr | IfThenElse | FLWOR | QuantifiedExpression )
+AQL is a fully composable expression language.
+Each AQL query is an expression that returns a collection
+of zero or more Asterix Data Model (ADM) instances.
+There are four major kinds of query expression in AQL.
+At the top level, a query expression can be an
+OperatorExpr (similar to a mathematical expression),
+an IfThenElse (to choose between two alternative values),
+a FLWOR expression (the heart of AQL, pronounced "flower expression"),
+or a QuantifiedExpression (which yields a boolean value).
### Primary Expressions
@@ -18,7 +37,15 @@
| DatasetAccessExpression
| ListConstructor
| RecordConstructor
-
+
+The most basic building block for any AQL expression is the PrimaryExpr.
+This can be a simple literal (constant) value,
+a reference to a query variable that is in scope,
+a parenthesized expression,
+a function call,
+an expression accessing the ADM contents of a dataset,
+a newly constructed list of ADM instances,
+or a newly constructed ADM record.
#### Literals
@@ -31,50 +58,88 @@
| "false"
StringLiteral ::= <STRING_LITERAL>
+Literals (constants) in AQL can be strings, integers, floating point values,
+double values, boolean constants, or the constant value null.
+The null value in AQL has "unknown" or "missing" value semantics, similar to
+(though not identical to) nulls in the relational query language SQL.
+
+The following are some simple examples of AQL literals.
+Since AQL is an expression language, each example is also a complete, legal AQL query (!).
+
##### Examples
"a string"
42
-
#### Variable References
VariableRef ::= <VARIABLE>
-##### Example
+A variable in AQL can be bound to any legal ADM value.
+A variable reference refers to the value to which an in-scope variable is bound.
+(E.g., a variable binding may originate from one of the for or let clauses of a
+FLWOR expression or from an input parameter in the context of an AQL function body.)
- $id
-
+##### Examples
+
+ $tweet
+ $id
#### Parenthesized Expressions
-
+
ParenthesizedExpression ::= "(" Expression ")"
+As in most languages, an expression may be parenthesized.
+
+Since AQL is an expression language, the following example expression is
+actually also a complete, legal AQL query whose result is the value 2.
+(As such, you can have Big Fun explaining to your boss how AsterixDB and AQL can turn
+your 1000-node shared-nothing Big Data cluster into a $5M calculator in its spare time.)
+
##### Example
( 1 + 1 )
-
#### Function Calls
FunctionCallExpr ::= FunctionOrTypeName "(" ( Expression ( "," Expression )* )? ")"
+Functions are included in AQL, like most languages, as a way to package useful
+functionality or to componentize complicated or reusable AQL computations.
+A function call is a legal AQL query expression that represents the ADM value
+resulting from the evaluation of its body expression with the given parameter
+bindings; the parameter value bindings can themselves be any AQL expressions.
+
+The following example is a (built-in) function call expression whose value is 8.
+
##### Example
string-length("a string")
-
#### Dataset Access
DatasetAccessExpression ::= "dataset" ( ( Identifier ( "." Identifier )? )
| ( "(" Expression ")" ) )
Identifier ::= <IDENTIFIER> | StringLiteral
+Querying Big Data is the main point of AsterixDB and AQL.
+Data in AsterixDB reside in datasets (collections of ADM records),
+each of which in turn resides in some namespace known as a dataverse (data universe).
+Data access in a query expression is accomplished via a DatasetAccessExpression.
+Dataset access expressions are most commonly used in FLWOR expressions, where variables
+are bound to their contents.
+
+
+The following are three examples of legal dataset access expressions.
+The first one accesses a dataset called Customers in the dataverse called SalesDV.
+The second one accesses the Customers dataverse in whatever the current dataverse is.
+The third one does the same thing as the second but uses a slightly older AQL syntax.
+
##### Examples
- dataset customers
- dataset (string-join("customers", $country))
-
+ dataset SalesDV.Customers
+ dataset Customers
+ dataset("Customers")
#### Constructors
@@ -84,17 +149,31 @@
RecordConstructor ::= "{" ( FieldBinding ( "," FieldBinding )* )? "}"
FieldBinding ::= Expression ":" Expression
+A major feature of AQL is its ability to construct new ADM data instances.
+This is accomplished using its constructors for each of the major ADM complex object structures,
+namely lists (ordered or unordered) and records.
+Ordered lists are like JSON arrays, while unordered lists have bag (multiset) semantics.
+Records are built from attributes that are field-name/field-value pairs, again like JSON.
+(See the AsterixDB Data Model document for more details on each.)
+
+The following examples illustrate how to construct a new ordered list with 3 items,
+a new unordered list with 4 items, and a new record with 2 fields, respectively.
+List elements can be homogeneous (as in the first example), which is the common case,
+or they may be heterogeneous (as in the second example).
+The data values and field name values used to construct lists and records in constructors are all simply AQL expressions.
+Thus the list elements, field names, and field values used in constructors can be simple literals (as in these three examples)
+or they can come from query variable references or even arbitrarily complex AQL expressions.
+
##### Examples
[ "a", "b", "c" ]
-
- {{ 42, "forty-two", "AsterixDB!" }}
-
- {
- "project name" : "AsterixDB"
- "project members" : {{ "vinayakb", "dtabass", "chenli" }}
- }
+ {{ 42, "forty-two", "AsterixDB!", 3.14f }}
+
+ {
+ "project name": "AsterixDB"
+ "project members": {{ "vinayakb", "dtabass", "chenli" }}
+ }
### Path Expressions
@@ -102,48 +181,69 @@
Field ::= "." Identifier
Index ::= "[" ( Expression | "?" ) "]"
+Components of complex types in ADM are accessed via path expressions.
+Path access can be applied to the result of an AQL expression that yields an instance of such a type, e.g., a record or list instance.
+For records, path access is based on field names.
+For ordered lists, path access is based on (zero-based) array-style indexing.
+AQL also supports an "I'm feeling lucky" style index accessor, [?], for selecting an arbitrary element from an ordered list.
+Attempts to access non-existent fields or list elements produce a null (i.e., missing information) result as opposed to signaling a runtime error.
+
+The following examples illustrate field access for a record, index-based element access for an ordered list, and also a composition thereof.
+
##### Examples
- { "list" : [ "a", "b", "c"] }.list
-
- [ "a", "b", "c"][2]
-
- { "list" : [ "a", "b", "c"] }.list[2]
+ ({"list": [ "a", "b", "c"]}).list
+ (["a", "b", "c"])[2]
+
+ ({ "list": [ "a", "b", "c"]}).list[2]
### Logical Expressions
OperatorExpr ::= AndExpr ( "or" AndExpr )*
AndExpr ::= RelExpr ( "and" RelExpr )*
-
+
+As in most languages, boolean expressions can be built up from smaller expressions by combining them with the logical connectives and/or.
+Legal boolean values in AQL are true, false, and null.
+(Nulls in AQL are treated much like SQL treats its unknown truth value in boolean expressions.)
+
+The following is an example of a conjuctive range predicate in AQL.
+It will yield true if $a is bound to 4, null if $a is bound to null, and false otherwise.
+
##### Example
$a > 3 and $a < 5
-
### Comparison Expressions
RelExpr ::= AddExpr ( ( "<" | ">" | "<=" | ">=" | "=" | "!=" | "~=" ) AddExpr )?
-
+
+AQL has the usual list of suspects, plus one, for comparing pairs of atomic values.
+The "plus one" is the last operator listed above, which is the "roughly equal" operator provided for similarity queries.
+(See the separate document on AsterixDB Similarity Queries for more details on similarity matching.)
+
+An example comparison expression (which yields the boolean value true) is shown below.
+
##### Example
5 > 3
-
### Arithmetic Expressions
AddExpr ::= MultExpr ( ( "+" | "-" ) MultExpr )*
MultExpr ::= UnaryExpr ( ( "*" | "/" | "%" | <CARET> | "idiv" ) UnaryExpr )*
UnaryExpr ::= ( ( "+" | "-" ) )? ValueExpr
+AQL also supports the usual cast of characters for arithmetic expressions.
+The example below evaluates to 25.
+
##### Example
3 ^ 2 + 4 ^ 2
+### FLWOR Expression
-### FLWOGR Expression
-
- FLWOGR ::= ( ForClause | LetClause ) ( Clause )* "return" Expression
+ FLWOR ::= ( ForClause | LetClause ) ( Clause )* "return" Expression
Clause ::= ForClause | LetClause | WhereClause | OrderbyClause
| GroupClause | LimitClause | DistinctClause
ForClause ::= "for" Variable ( "at" Variable )? "in" ( Expression )
@@ -151,19 +251,46 @@
WhereClause ::= "where" Expression
OrderbyClause ::= "order" "by" Expression ( ( "asc" ) | ( "desc" ) )?
( "," Expression ( ( "asc" ) | ( "desc" ) )? )*
- GroupClause ::= "group" "by" ( Variable ":=" )? Expression ( "," ( Variable ":=" )? Expression )*
+ GroupClause ::= "group" "by" ( Variable ":=" )? Expression ( "," ( Variable ":=" )? Expression )*
"with" VariableRef ( "," VariableRef )*
LimitClause ::= "limit" Expression ( "offset" Expression )?
DistinctClause ::= "distinct" "by" Expression ( "," Expression )*
Variable ::= <VARIABLE>
+The heart of AQL is the FLWOR (for-let-where-orderby-return) expression.
+The roots of this expression were borrowed from the expression of the same name in XQuery.
+A FLWOR expression starts with one or more clauses that establish variable bindings.
+A for clause binds a variable incrementally to each element of its associated expression;
+it includes an optional positional variable for counting/numbering the bindings.
+By default no ordering is implied or assumed by a for clause.
+A let clause binds a variable to the collection of elements computed by its associated expression.
+
+Following the initial for or let clause(s), a FLWOR expression may contain an arbitrary sequence of other clauses.
+The where clause in a FLWOR expression filters the preceding bindings via a boolean expression, much like a where clause does in a SQL query.
+The order by clause in a FLWOR expression induces an ordering on the data.
+The group by clause, discussed further below, forms groups based on its group by expressions,
+optionally naming the expressions' values (which together form the grouping key for the expression).
+The with subclause of a group by clause specifies the variable(s) whose values should be grouped based
+on the grouping key(s); following the grouping clause, only the grouping key(s) and the variables named
+in the with subclause remain in scope, and the named grouping variables now contain lists formed from their input values.
+The limit clause caps the number of values returned, optionally starting its result count from a specified offset.
+(Web applications can use this feature for doing pagination.)
+The distinct clause is similar to the group-by clause, but it forms no groups; it serves only to eliminate duplicate values.
+As indicated by the grammar, the clauses in an AQL query can appear in any order.
+To interpret a query, one can think of data as flowing down through the query from the first clause to the return clause.
+
+The following example shows a FLWOR expression that selects and returns one user from the dataset FacebookUsers.
##### Example
for $user in dataset FacebookUsers
where $user.id = 8
return $user
-
+
+The next example shows a FLWOR expression that joins two datasets, FacebookUsers and FacebookMessages,
+returning user/message pairs.
+The results contain one record per pair, with result records containing the user's name and an entire message.
+
##### Example
for $user in dataset FacebookUsers
@@ -173,8 +300,11 @@
{
"uname": $user.name,
"message": $message.message
- };
-
+ };
+
+In the next example, a let clause is used to bind a variable to all of a user's FacebookMessages.
+The query returns one record per user, with result records containing the user's name and the set of all messages by that user.
+
##### Example
for $user in dataset FacebookUsers
@@ -186,15 +316,22 @@
{
"uname": $user.name,
"messages": $messages
- };
-
+ };
+
+The following example returns all TwitterUsers ordered by their followers count (most followers first) and language.
+Null is treated as being smaller than any other value if nulls are encountered in the ordering key(s).
+
##### Example
-
+
for $user in dataset TwitterUsers
order by $user.followers_count desc, $user.lang asc
return $user
-
-* null is smaller than any other value
+
+The next example illustrates the use of the group by clause in AQL.
+After the group by clause in the query, only variables that are either in the group by list or in the with list are in scope.
+The variables in the clause's with list will each contain a collection of items following the group by clause;
+the collected items are the values that the source variable was bound to in the tuples that formed the group.
+Null is handled as a single value for grouping.
##### Example
@@ -207,9 +344,7 @@
"message" : $messages
}
-* after group by only variables that are either in the group-by-list or in the with-list are in scope
-* the variables in the with-clause contain a collection of items after the group by clause (all the values that the variable was bound to in the tuples that make up the group)
-* null is handled as a single value for grouping
+The use of the limit clause is illustrated in thise next example.
##### Example
@@ -218,8 +353,15 @@
limit 2
return $user
-##### Example (currently not working)
-
+The final example shows how AQL's distinct by clause works.
+Each variable in scope before the distinct clause is also in scope after the distinct clause.
+This clause works similarly to group by, but for each variable that contains more than
+one value after the distinct by clause, one value is picked nondeterministically.
+(If the variable is in the disctict by list, then its value will be deterministic.)
+Nulls are treated as a single value when they occur in a grouping field.
+
+##### Example
+
for $x in dataset FacebookMessages
distinct by $x.sender-location
return
@@ -228,25 +370,32 @@
"message" : $x.message
}
-* every variable that is in-scope before the distinct clause is also in scope after the distinct clause
-* works a lot like group by, but for every variable that contains more than one value after the distinct-by clause, one value is picked non-deterministically
-* if the variable is in the disctict-by list, then value is deterministic
-* null is a single value
-
### Conditional Expression
-
+
IfThenElse ::= "if" "(" Expression ")" "then" Expression "else" Expression
+A conditional expression is useful for choosing between two alternative values based on a
+boolean condition. If its first (if) expression is true, its second (then) expression's
+value is returned, and otherwise its third (else) expression is returned.
+
+The following example illustrates the form of a conditional expression.
##### Example
if (2 < 3) then "yes" else "no"
-
### Quantified Expressions
-
- QuantifiedExpression ::= ( ( "some" ) | ( "every" ) ) Variable "in" Expression
+
+ QuantifiedExpression ::= ( ( "some" ) | ( "every" ) ) Variable "in" Expression
( "," Variable "in" Expression )* "satisfies" Expression
-
+
+Quantified expressions are used for expressing existential or universal predicates involving the elements of a collection.
+
+The following pair of examples, each of which returns true, illustrate the use of a quantified
+expression to test that every (or some) element in the set [1, 2, 3] of integers is less than three.
+It is useful to note that if the set were instead the empty set, the first expression would yield true
+("every" value in an empty set satisfies the condition) while the second expression would yield false
+(since there isn't "some" value, as there are no values in the set, that satisfies the condition).
+
##### Examples
every $x in [ 1, 2, 3] satisfies $x < 3
@@ -264,9 +413,9 @@
| InsertStatement
| DeleteStatement
| Query
-
-### Declarations
-
+
+### Declarations
+
DataverseDeclaration ::= "use" "dataverse" Identifier
SetStatement ::= "set" Identifier StringLiteral
FunctionDeclaration ::= "declare" "function" Identifier ParameterList "{" Expression "}"
@@ -275,7 +424,7 @@
##### Example
use dataverse TinySocial;
-
+
##### Example
set simfunction "jaccard";
@@ -307,7 +456,6 @@
DataverseSpecification ::= "dataverse" Identifier IfNotExists ( "with format" StringLiteral )?
-
##### Example
create dataverse TinySocial;
@@ -335,7 +483,6 @@
employment: [EmploymentType]
}
-
#### Datasets
DatasetSpecification ::= "internal"? "dataset" QualifiedName "(" Identifier ")" IfNotExists
@@ -351,7 +498,6 @@
FunctionSignature ::= FunctionOrTypeName "@" <INTEGER_LITERAL>
PrimaryKey ::= "primary" "key" Identifier ( "," Identifier )*
-
##### Example
create internal dataset FacebookUsers(FacebookUserType) primary key id;
@@ -385,7 +531,6 @@
create index fbMessageIdx on FacebookMessages(message) type keyword;
-
#### Functions
FunctionSpecification ::= "function" FunctionOrTypeName IfNotExists ParameterList "{" Expression "}"
@@ -396,7 +541,6 @@
$a + $b
};
-
#### Removal
DropStatement ::= "drop" ( "dataverse" Identifier IfExists
@@ -425,7 +569,6 @@
##### Example
drop function add;
-
### Import/Export Statements
@@ -436,7 +579,6 @@
load dataset FacebookUsers using localfs
(("path"="localhost:///Users/zuck/AsterixDB/load/fbu.adm"),("format"="adm"));
-
### Modification Statements
InsertStatement ::= "insert" "into" "dataset" QualifiedName Query
@@ -450,7 +592,6 @@
delete $user from dataset FacebookUsers where $user.id = 8;
-
### Queries
Query ::= Expression
@@ -459,4 +600,5 @@
for $praise in {{ "great", "brilliant", "awesome" }}
return
- string-concat(["AsterixDB is ", $praise])
+ string-concat(["AsterixDB is ", $praise]
+