SQL++ doc/grammar cleanup

- remove comments that are addressed
- adapt grammar according to feedback

Change-Id: I6b4f5c7ae48c022a6b8f8c48b3927e1981b70598
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1233
Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Reviewed-by: Yingyi Bu <buyingyi@gmail.com>
Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
index 0e834e6..79f9da0 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
@@ -25,8 +25,9 @@
                        | <TRUE>
                        | <FALSE>
     StringLiteral  ::= "\'" (<ESCAPE_APOS> | ~["\'"])* "\'"
-                       | "\"" (<ESCAPE_APOS> | ~["\'"])* "\""
+                       | "\"" (<ESCAPE_QUOT> | ~["\'"])* "\""
     <ESCAPE_APOS>  ::= "\\\'"
+    <ESCAPE_QUOT>  ::= "\\\""
     IntegerLiteral ::= <DIGITS>
     <DIGITS>       ::= ["0" - "9"]+
     FloatLiteral   ::= <DIGITS> ( "f" | "F" )
@@ -36,10 +37,6 @@
                      | <DIGITS> ( "." <DIGITS> )?
                      | "." <DIGITS>
 
-> MC: I tentatively deleted the following unused ESCAPE_QUOTE definition: &lt;ESCAPE_QUOT&gt;  ::= "\\\""
-> 		&lt;ESCAPE_QUOT&gt;  ::= "\\\""
-> Also, I moved the DelimitedIdentifier down further per TW's suggestion.
-
 Literals (constants) in SQL++ can be strings, integers, floating point values, double values, boolean constants, or special constant values like `NULL` and `MISSING`. The `NULL` value is like a `NULL` in SQL; it is used to represent an unknown field value. The specialy value `MISSING` is only meaningful in the context of SQL++ field accesses; it occurs when the accessed field simply does not exist at all in a record being accessed.
 
 The following are some simple examples of SQL++ literals.
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
index 04d8e5b..66fd8f1 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
@@ -8,8 +8,6 @@
 
 The following shows the (rich) grammar for the `SELECT` statement in SQL++.
 
-> TW: Should we replace SelectElement with SelectValue? MC: Yes, and done below.
-
     SelectStatement    ::= ( WithClause )?
                            SelectSetOperation (OrderbyClause )? ( LimitClause )?
     SelectSetOperation ::= SelectBlock (<UNION> <ALL> ( SelectBlock | Subquery ) )*
@@ -642,8 +640,6 @@
     GROUP BY msg.authorId AS uid GROUP AS `$1`(msg AS msg);
 
 > TW: We really need to do something about `COLL_SQL-COUNT`.
-> MC: You mean about its name? And inconsistent dashing? I agree...!  :-)
-> Also, do we need to say anything about the (mandatory) double parens here?
 
 The same sort of rewritings apply to the function symbols `SUM`, `MAX`, `MIN`, and `AVG`.
 In contrast to the SQL++ collection aggregate functions, these special SQL-92 function symbols
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
index 9dc6947..e83e47d 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
@@ -15,9 +15,6 @@
 manipulation purposes as well as controlling the context to be used in evaluating SQL++ expressions.
 This section details the DDL and DML statements supported in the SQL++ language as realized in Apache AsterixDB.
 
-> TW: AsterixDB?
-> MC: Good question here - I eradicated the preceding references except in the Intro, which needs a rewrite, but here it is really still about AsterixDB, I think?  (Since most of these statements will be hidden in the Couchbase case?)
-
 ## <a id="Declarations">Declarations</a>
 
     DatabaseDeclaration ::= "USE" Identifier
@@ -55,7 +52,6 @@
 
     [
       { "id": 2, "name": "IsbelDull", "friendCount": 2 }
-
     ]
 
 ## <a id="Lifecycle_management_statements">Lifecycle management statements</a>
@@ -74,17 +70,12 @@
 
 ### <a id="Dataverses"> Dataverses</a>
 
-    DatabaseSpecification ::= "DATAVERSE" Identifier IfNotExists ( "WITH" "FORMAT" StringLiteral )?
+    DatabaseSpecification ::= "DATAVERSE" Identifier IfNotExists
 
 The CREATE DATAVERSE statement is used to create new dataverses.
 To ease the authoring of reusable SQL++ scripts, an optional IF NOT EXISTS clause is included to allow
 creation to be requested either unconditionally or only if the dataverse does not already exist.
 If this clause is absent, an error is returned if a dataverse with the indicated name already exists.
-(Note: The `WITH FORMAT` clause in the syntax above is a placeholder for possible `future functionality
-that can safely be ignored here.)
-
-> MC: Should we get rid of WITH FORMAT? (I think we should - here and in the system - if we ever do it
-I would actually expect it to be more fine-grained than the dataverse level.)
 
 The following example creates a new dataverse named TinySocial if one does not already exist.
 
@@ -94,7 +85,7 @@
 
 ### <a id="Types"> Types</a>
 
-    TypeSpecification    ::= "TYPE" FunctionOrTypeName IfNotExists "AS" TypeExpr
+    TypeSpecification    ::= "TYPE" FunctionOrTypeName IfNotExists "AS" RecordTypeDef
     FunctionOrTypeName   ::= QualifiedName
     IfNotExists          ::= ( <IF> <NOT> <EXISTS> )?
     TypeExpr             ::= RecordTypeDef | TypeReference | OrderedListTypeDef | UnorderedListTypeDef
@@ -106,9 +97,6 @@
     OrderedListTypeDef   ::= "[" ( TypeExpr ) "]"
     UnorderedListTypeDef ::= "{{" ( TypeExpr ) "}}"
 
-> TW: How should we refer to the data model? "Asterix Data Model" seems system specific.
-> MC: Agreed that this is an issue. Let's first decide and I can handle the issue in a later pass.
-
 The CREATE TYPE statement is used to create a new named ADM datatype.
 This type can then be used to create stored collections or utilized when defining one or more other ADM datatypes.
 Much more information about the Asterix Data Model (ADM) is available in the [data model reference guide](datamodel.html) to ADM.
@@ -117,8 +105,6 @@
 Instances of a closed record type are not permitted to contain fields other than those specified in the create type statement.
 Instances of an open record type may carry additional fields, and open is the default for new types if neither option is specified.
 
-> MC: I had forgotten about options other than using CREATE TYPE to introduce new record types! (Are all of the other AS TypeExpr possibilities actually well-tested?)
-
 The following example creates a new ADM record type called GleambookUser type.
 Since it is defined as (defaulting to) being an open type,
 instances will be permitted to contain more than what is specified in the type definition.
@@ -171,11 +157,6 @@
     PrimaryKey           ::= <PRIMARY> <KEY> NestedField ( "," NestedField )* ( <AUTOGENERATED> )?
     CompactionPolicy     ::= Identifier
 
-> TW: Again, a lot of AsterixDB in the following paragraph.
-> Also, while I'm sure that this was always like this, the separation of `Configuration`
-> from `Properties` looks pretty confusing ...
-> MC: Not sure what we should do about all this, actually! (I don't disagree. New JSON syntax coming, too?)
-
 The CREATE DATASET statement is used to create a new dataset.
 Datasets are named, unordered collections of ADM record type instances;
 they are where data lives persistently and are the usual targets for SQL++ queries.
@@ -188,9 +169,6 @@
 One such option is that random primary key (UUID) values can be auto-generated by declaring the field to be UUID and putting "AUTOGENERATED" after the "PRIMARY KEY" identifier.
 In this case, unlike other non-optional fields, a value for the auto-generated PK field should not be provided at insertion time by the user since each record's primary key field value will be auto-generated by the system.
 
-> TW: "The Filter-Based LSM Index Acceleration" seems to be quite system specific ...
-> MC: Indeed, but that is always inescapable in DDL reference manuals, no? (We have to decide what to say where. :-))
-
 Another advanced option, when creating an Internal dataset, is to specify the merge policy to control which of the
 underlying LSM storage components to be merged.
 (AsterixDB supports Log-Structured Merge tree based physical storage for Internal datasets.)
@@ -268,8 +246,6 @@
 `ENFORCING` an open field introduces a check that makes sure that the actual type of the indexed field
 (if the optional field exists in the record) always matches this specified (open) field type.
 
-*Editor's note: The ? shown above after the type is intended to be mandatory, and we need to make that happen.*
-
 The following example creates a btree index called gbAuthorIdx on the authorId field of the GleambookMessages dataset.
 This index can be useful for accelerating exact-match queries, range search queries, and joins involving the author-id
 field.
@@ -285,8 +261,6 @@
 
     CREATE INDEX gbSendTimeIdx ON GleambookMessages(sendTime: datetime?) TYPE BTREE ENFORCED;
 
-> MC: The above works in my branch (with ? mandatory) but not in the main branch. We need to change that. :-)
-
 The following example creates a btree index called crpUserScrNameIdx on screenName,
 a nested field residing within a record-valued user field in the ChirpMessages dataset.
 This index can be useful for accelerating exact-match queries, range search queries,
@@ -389,10 +363,6 @@
 
     InsertStatement ::= <INSERT> <INTO> QualifiedName Query
 
-> TW: AsterixDB-specifc transactions semantics ...
-> Also, do we also support `UPSERT`?
-> MC: Yes to both. :-) Whoops. Wait, maybe not. We do have upsert in AQL, but not in SQL++ today, it seems. I'll document it anyway...? :-)
-
 The SQL++ INSERT statement is used to insert new data into a dataset.
 The data to be inserted comes from a SQL++ query expression.
 This expression can be as simple as a constant expression, or in general it can be any legal SQL++ query.
@@ -430,7 +400,7 @@
 
 ### <a id="Deletes">DELETEs</a>
 
-    DeleteStatement ::= <DELETE> <FROM> QualifiedName ( (<AS>)? Variable )? ( <WHERE> Expression )?
+    DeleteStatement ::= <DELETE> <FROM> QualifiedName ( ( <AS> )? Variable )? ( <WHERE> Expression )?
 
 The SQL++ DELETE statement is used to delete data from a target dataset.
 The data to be deleted is identified by a boolean expression involving the variable bound to the target dataset in the DELETE statement.
diff --git a/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj b/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj
index f330f40..6c4bc5c 100644
--- a/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj
+++ b/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj
@@ -414,7 +414,7 @@
 }
 {
   <TYPE> nameComponents = TypeName() ifNotExists = IfNotExists()
-  <AS> typeExpr = TypeExpr()
+  <AS> typeExpr = RecordTypeDef()
     {
       long numValues = -1;
       String filename = null;
@@ -683,14 +683,12 @@
 {
   String dvName = null;
   boolean ifNotExists = false;
-  String format = null;
 }
 {
   <DATAVERSE> dvName = Identifier()
   ifNotExists = IfNotExists()
-  ( LOOKAHEAD(1) <WITH> <FORMAT> format = ConstantString() )?
     {
-      return new CreateDataverseStatement(new Identifier(dvName), format, ifNotExists);
+      return new CreateDataverseStatement(new Identifier(dvName), null, ifNotExists);
     }
 }
 
@@ -3086,7 +3084,6 @@
   | <FILTER : "filter">
   | <FLATTEN : "flatten">
   | <FOR : "for">
-  | <FORMAT : "format">
   | <FROM : "from">
   | <FULL : "full">
   | <FUNCTION : "function">