Made the correlated-prefix the default policy for filtered-datasets. Documented merge policies and filters.

Change-Id: I6c93d87b9fcee1f69601f5fd05192e56dd288879
Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/176
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>
diff --git a/asterix-app/src/main/java/edu/uci/ics/asterix/aql/translator/AqlTranslator.java b/asterix-app/src/main/java/edu/uci/ics/asterix/aql/translator/AqlTranslator.java
index 0caa397..5bac705 100644
--- a/asterix-app/src/main/java/edu/uci/ics/asterix/aql/translator/AqlTranslator.java
+++ b/asterix-app/src/main/java/edu/uci/ics/asterix/aql/translator/AqlTranslator.java
@@ -494,13 +494,21 @@
 
                     String ngName = ngNameId != null ? ngNameId.getValue() : configureNodegroupForDataset(dd,
                             dataverseName, mdTxnCtx);
-                    if (compactionPolicy == null) {
-                        compactionPolicy = GlobalConfig.DEFAULT_COMPACTION_POLICY_NAME;
-                        compactionPolicyProperties = GlobalConfig.DEFAULT_COMPACTION_POLICY_PROPERTIES;
-                    } else {
-                        validateCompactionPolicy(compactionPolicy, compactionPolicyProperties, mdTxnCtx, false);
-                    }
                     String filterField = ((InternalDetailsDecl) dd.getDatasetDetailsDecl()).getFilterField();
+                    if (compactionPolicy == null) {
+    					if (filterField != null) {
+    						// If the dataset has a filter and the user didn't specify a merge policy, then we will pick the
+    						// correlated-prefix as the default merge policy.
+    						compactionPolicy = GlobalConfig.DEFAULT_FILTERED_DATASET_COMPACTION_POLICY_NAME;
+    						compactionPolicyProperties = GlobalConfig.DEFAULT_COMPACTION_POLICY_PROPERTIES;
+    					} else {
+    						compactionPolicy = GlobalConfig.DEFAULT_COMPACTION_POLICY_NAME;
+    						compactionPolicyProperties = GlobalConfig.DEFAULT_COMPACTION_POLICY_PROPERTIES;
+    					}
+    				} else {
+    					validateCompactionPolicy(compactionPolicy,
+    							compactionPolicyProperties, mdTxnCtx, false);
+    				}
                     if (filterField != null) {
                         aRecordType.validateFilterField(filterField);
                     }
diff --git a/asterix-common/src/main/java/edu/uci/ics/asterix/common/config/GlobalConfig.java b/asterix-common/src/main/java/edu/uci/ics/asterix/common/config/GlobalConfig.java
index ab1c8ef..b52f8c7 100644
--- a/asterix-common/src/main/java/edu/uci/ics/asterix/common/config/GlobalConfig.java
+++ b/asterix-common/src/main/java/edu/uci/ics/asterix/common/config/GlobalConfig.java
@@ -38,6 +38,8 @@
     public static int DEFAULT_INPUT_DATA_COLUMN = 0;
 
     public static final String DEFAULT_COMPACTION_POLICY_NAME = "prefix";
+    
+    public static final String DEFAULT_FILTERED_DATASET_COMPACTION_POLICY_NAME = "correlated-prefix";
 
     public static final Map<String, String> DEFAULT_COMPACTION_POLICY_PROPERTIES;
     static {
diff --git a/asterix-doc/src/site/markdown/aql/filters.md b/asterix-doc/src/site/markdown/aql/filters.md
new file mode 100644
index 0000000..6c29b62
--- /dev/null
+++ b/asterix-doc/src/site/markdown/aql/filters.md
@@ -0,0 +1,139 @@
+# Filter-Based LSM Index Acceleration #
+
+## <a id="toc">Table of Contents</a> ##
+
+* [Motivation](#Motivation)
+* [Filters in AsterixDB](#FiltersInAsterixDB)
+* [Filters and Merge Policies](#FiltersAndMergePolicies)
+
+## <a id="Motivation">Motivation</a> <font size="4"><a
+   href="#toc">[Back to TOC]</a></font> ##
+
+Traditional relational databases usually employ conventional index
+structures such as B+ trees due to their low read latency.  However,
+such traditional index structures use in-place writes to perform
+updates, resulting in costly random writes to disk. Today's emerging
+applications often involve insert-intensive workloads for which the
+cost of random writes prohibits efficient ingestion of
+data. Consequently, popular NoSQL systems such as Cassandra, HBase,
+LevelDB, BigTable, etc. have adopted Log-Structured Merge (LSM) Trees
+as their storage structure. LSM-trees avoids the cost of random writes
+by batching updates into a component of the index that resides in main
+memory -- an \textit{in-memory component}. When the space occupancy of
+the in-memory component exceeds a specified threshold, its entries are
+\textit{flushed} to disk forming a new component -- a \textit{disk
+component}. As disk components accumulate on disk, they are
+periodically merged together subject to a \textit{merge policy} that
+decides when and what to merge. The benefit of the LSM-trees comes at
+the cost of possibly sacrificing read efficiency, but, it has been
+shown in previous studies that these inefficiencies can be mostly
+mitigated.
+
+AsterixDB has also embraced LSM-trees, not just by using them as
+primary indexes, but also by using the same LSM-ification technique
+for all of its secondary index structures. In particular, AsterixDB
+adopted a generic framework for converting a class of indexes (that
+includes conventional B+ trees, R trees, and inverted indexes) into
+LSM-based secondary indexes, allowing higher data ingestion rates. In
+fact, for certain index structures, our results have shown that using
+an LSM-based version of an index can be made to significantly
+outperform its conventional counterpart for \textit{both} ingestion
+and query speed (an example of such an index being the R-tree for
+spatial data).
+
+Since an LSM-based index naturally partitions data into multiple disk
+components, it is possible, when answering certain queries, to exploit
+partitioning to only access some components and safely filter out the
+remaining components, thus reducing query times. For instance,
+referring to our
+[TinySocial](primer.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB)
+example, suppose a user always retrieves tweets from the
+`TweetMessages` dataset based on the `send-time` field (e.g., tweets
+posted in the last 24 hours). Since there is not a secondary index on
+the `send-time` field, the only available option for AsterixDB would
+be to scan the whole `TweetMessages` dataset and then apply the
+predicate as a post-processing step. However, if disk components of
+the primary index were tagged with the minimum and maximum timestamp
+values of the records they contain, we could utilize the tagged
+information to directly access the primary index and prune components
+that do not match the query predicate. Thus, we could save substantial
+cost by avoiding scanning the whole dataset and only access the
+relevant components. We simply call such tagging information that are
+associated with components, filters. (Note that even if there were a
+secondary index on `send-time` field, using filters could save
+substantial cost by avoiding accessing the secondary index, followed
+by probing the primary index for every fetched entry.) Moreover, the
+same filtering technique can also be used with any secondary LSM index
+(e.g., an LSM R-tree), in case the query contains multiple predicates
+(e.g., spatial and temporal predicates), to obtain similar pruning
+power.
+
+## <a id="FiltersInAsterixDB">Filters in AsterixDB</a> <font
+   size="4"><a href="#toc">[Back to TOC]</a></font> ##
+
+
+We have added support for LSM-based filters to all of AsterixDB's
+index types. To enable the use of filters, the user must specify the
+filter's key when creating a dataset, as shown below:
+
+#### Creating a Dataset with a Filter  ####
+
+        create dataset Tweets(TweetType) primary key tweetid with filter on send-time;
+
+
+Filters can be created on any totally ordered datatype (i.e., any
+field that can be indexed using a B+ -tree), such as integers,
+doubles, floats, UUIDs, datetimes, etc.
+
+
+When a dataset with a filter is created, the name of the filter's key
+field is persisted in the ``dataset'' dataset (which is the metadata
+dataset that stores the details of each dataset in an AsterixDB
+instance) so that DML operations against the dataset can recognize the
+existence of filters and can update them or utilize them
+accordingly. Creating a dataset with a filter in AsterixDB implies
+that the primary and all secondary indexes of that dataset will
+maintain filters on their disk components. Once a filtered dataset is
+created, the user can use the dataset normally (just like any other
+dataset). AsterixDB will automatically maintain the filters and will
+leverage them to efficiently answer queries whenever possible (i.e.,
+when a query has predicates on the filter's key).
+
+
+## <a id="FiltersAndMergePolicies">Filters and Merge Policies</a>
+   <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
+
+
+The AsterixDB default merge policy, the prefix merge policy, relies on
+component sizes and the number of components to decide which
+components to merge. This merge policy has proven to provide excellent
+performance for both ingestion and queries. However, when evaluating
+our filtering solution with the prefix policy, we observed a behavior
+that can reduce filter effectiveness. In particular, we noticed that
+under the prefix merge policy, the disk components of a secondary
+index tend to be constantly merged into a single component. This is
+because the prefix policy relies on a single size parameter for all of
+the indexes of a dataset. This parameter is typically chosen based on
+the sizes of the disk components of the primary index, which tend to
+be much larger than the sizes of the secondary indexes' disk
+components. This difference caused the prefix merge policy to behave
+similarly to the constant merge policy (i.e., relatively poorly) when
+applied to secondary indexes in the sense that the secondary indexes
+are constantly merged into a single disk component. Consequently, the
+effectiveness of filters on secondary indexes was greatly reduced
+under the prefix-merge policy, but they were still effective when
+probing the primary index.  Based on this behavior, we developed a new
+merge policy, an improved version of the prefix policy, called the
+correlated-prefix policy. The basic idea of this policy is that it
+delegates the decision of merging the disk components of all the
+indexes in a dataset to the primary index. When the policy decides
+that the primary index needs to be merged (using the same decision
+criteria as for the prefix policy), then it will issue successive
+merge requests to the I/O scheduler on behalf of all other indexes
+associated with the same dataset. The end result is that secondary
+indexes will always have the same number of disk components as their
+primary index under the correlated-prefix merge policy. This has
+improved query performance, since disk components of secondary indexes
+now have a much better chance of being pruned.
+
+
diff --git a/asterix-doc/src/site/markdown/aql/manual.md b/asterix-doc/src/site/markdown/aql/manual.md
index 7c3a8fe..a892fb9 100644
--- a/asterix-doc/src/site/markdown/aql/manual.md
+++ b/asterix-doc/src/site/markdown/aql/manual.md
@@ -564,9 +564,12 @@
 #### Datasets
 
     DatasetSpecification ::= "internal"? "dataset" QualifiedName "(" Identifier ")" IfNotExists
-                             PrimaryKey ( "on" Identifier )? ( "hints" Properties )? 
+                             PrimaryKey ( "on" Identifier )? ( "hints" Properties )?
+                             ( "using" "compaction" "policy" CompactionPolicy ( Configuration )? )? 
+                             ( "with filter on" Identifier )?
                            | "external" "dataset" QualifiedName "(" Identifier ")" IfNotExists 
                              "using" AdapterName Configuration ( "hints" Properties )?
+                             ( "using" "compaction" "policy" CompactionPolicy ( Configuration )? )? 
     AdapterName          ::= Identifier
     Configuration        ::= "(" ( KeyValuePair ( "," KeyValuePair )* )? ")"
     KeyValuePair         ::= "(" StringLiteral "=" StringLiteral ")"
@@ -574,6 +577,7 @@
     Property             ::= Identifier "=" ( StringLiteral | IntegerLiteral )
     FunctionSignature    ::= FunctionOrTypeName "@" IntegerLiteral
     PrimaryKey           ::= "primary" "key" Identifier ( "," Identifier )*
+    CompactionPolicy     ::= Identifier
 
 The create dataset statement is used to create a new dataset.
 Datasets are named, unordered collections of ADM record instances; they
@@ -582,12 +586,39 @@
 An Internal dataset (the default) is a dataset that is stored in and managed by AsterixDB.
 It must have a specified unique primary key that can be used to partition data across nodes of an AsterixDB cluster.
 The primary key is also used in secondary indexes to uniquely identify the indexed primary data records.
+Optionally, a filter can be created on a field to further optimize range queries with predicates on the filter's field.
+(Refer to [Filter-Based LSM Index Acceleration](filters.html) for more information about filters.)
+
 An External dataset is stored outside of AsterixDB (currently datasets in HDFS or on the local filesystem(s) of the cluster's nodes are supported).
 External dataset support allows AQL queries to treat external data as though it were stored in AsterixDB,
 making it possible to query "legacy" file data (e.g., Hive data) without having to physically import it into AsterixDB.
 For an external dataset, an appropriate adapter must be selected to handle the nature of the desired external data.
 (See the [guide to external data](externaldata.html) for more information on the available adapters.)
 
+When creating a dataset, it is possible to choose a merge policy that controls
+which of the underlaying LSM storage components to be merged.  Currently,
+AsterixDB provides four different merge policies that can be
+configured per dataset: no-merge, constant, prefix, and correlated-prefix. The
+no-merge policy simply never merges disk components. While the constant policy merges disk components when the
+number of components reaches some constant number k, which can be
+configured by the user. The prefix policy relies on component sizes and the number of
+components to decide which components to merge. Specifically, it works
+by first trying to identify the smallest ordered (oldest to newest)
+sequence of components such that the sequence does not contain a
+single component that exceeds some threshold size M and that either
+the sum of the component's sizes exceeds M or the number of
+components in the sequence exceeds another threshold C. If such a
+sequence of components exists, then each of the components in the
+sequence are merged together to form a single component. Finally, the correlated-prefix is similar to the prefix policy but it
+delegates the decision of merging the disk components of all the
+indexes in a dataset to the primary index. When the policy decides
+that the primary index needs to be merged (using the same decision
+criteria as for the prefix policy), then it will issue successive
+merge requests on behalf of all other indexes
+associated with the same dataset. The default policy for
+AsterixDB is the prefix policy except when there is a filter on a dataset, where the preferred policy for filters is the correlated-prefix.
+
+
 The following example creates an internal dataset for storing FacefookUserType records.
 It specifies that their id field is their primary key.
 
diff --git a/asterix-doc/src/site/site.xml b/asterix-doc/src/site/site.xml
index feef881..37c3474 100644
--- a/asterix-doc/src/site/site.xml
+++ b/asterix-doc/src/site/site.xml
@@ -79,6 +79,7 @@
       <item name="AQL Allen's Relations Functions" href="aql/allens.html"/>
       <item name="AQL Support of Similarity Queries" href="aql/similarity.html"/>
       <item name="Accessing External Data" href="aql/externaldata.html"/>
+      <item name="Filter-Based LSM Index Acceleration" href="aql/filters.html"/>
       <item name="REST API to AsterixDB" href="api.html"/>
     </menu>