Include stable docs, use Apache URLs
Change-Id: Iadb5074e130d4a21b2af123aa405e9fc21a14aed
Reviewed-on: https://asterix-gerrit.ics.uci.edu/519
Reviewed-by: Till Westmann <tillw@apache.org>
diff --git a/docs/0.8.7-incubating/aql/allens.html b/docs/0.8.7-incubating/aql/allens.html
new file mode 100644
index 0000000..ec566c2
--- /dev/null
+++ b/docs/0.8.7-incubating/aql/allens.html
@@ -0,0 +1,660 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2015-11-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+ <head>
+ <meta charset="UTF-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+ <meta name="Date-Revision-yyyymmdd" content="20151124" />
+ <meta http-equiv="Content-Language" content="en" />
+ <title>AsterixDB – AsterixDB Temporal Functions: Allens Relations</title>
+ <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+ <link rel="stylesheet" href="../css/site.css" />
+ <link rel="stylesheet" href="../css/print.css" media="print" />
+
+
+ <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+
+
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');</script>
+
+ </head>
+ <body class="topBarDisabled">
+
+
+
+
+ <div class="container-fluid">
+ <div id="banner">
+ <div class="pull-left">
+ <a href="http://asterixdb.apache.org/" id="bannerLeft">
+ <img src="../images/asterixlogo.png" alt="AsterixDB"/>
+ </a>
+ </div>
+ <div class="pull-right"> </div>
+ <div class="clear"><hr/></div>
+ </div>
+
+ <div id="breadcrumbs">
+ <ul class="breadcrumb">
+
+
+ <li id="publishDate">Last Published: 2015-11-24</li>
+
+
+
+ <li id="projectVersion" class="pull-right">Version: 0.8.7-incubating</li>
+
+ <li class="divider pull-right">|</li>
+
+ <li class="pull-right"> <a href="../index.html" title="Documentation Home">
+ Documentation Home</a>
+ </li>
+
+ </ul>
+ </div>
+
+
+ <div class="row-fluid">
+ <div id="leftColumn" class="span3">
+ <div class="well sidebar-nav">
+
+
+ <ul class="nav nav-list">
+ <li class="nav-header">Documentation</li>
+
+ <li>
+
+ <a href="../install.html" title="Installing and Managing AsterixDB using Managix">
+ <i class="none"></i>
+ Installing and Managing AsterixDB using Managix</a>
+ </li>
+
+ <li>
+
+ <a href="../yarn.html" title="Deploying AsterixDB using YARN">
+ <i class="none"></i>
+ Deploying AsterixDB using YARN</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer.html" title="AsterixDB 101: An ADM and AQL Primer">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer-sql-like.html" title="AsterixDB 101: An ADM and AQL Primer (For SQL Fans)">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer (For SQL Fans)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/js-sdk.html" title="AsterixDB Javascript SDK">
+ <i class="none"></i>
+ AsterixDB Javascript SDK</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/datamodel.html" title="Asterix Data Model (ADM)">
+ <i class="none"></i>
+ Asterix Data Model (ADM)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/manual.html" title="Asterix Query Language (AQL)">
+ <i class="none"></i>
+ Asterix Query Language (AQL)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/functions.html" title="AQL Functions">
+ <i class="none"></i>
+ AQL Functions</a>
+ </li>
+
+ <li class="active">
+
+ <a href="#"><i class="none"></i>AQL Allen's Relations Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/similarity.html" title="AQL Support of Similarity Queries">
+ <i class="none"></i>
+ AQL Support of Similarity Queries</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/externaldata.html" title="Accessing External Data">
+ <i class="none"></i>
+ Accessing External Data</a>
+ </li>
+
+ <li>
+
+ <a href="../feeds/tutorial.html" title="Support for Data Ingestion in AsterixDB">
+ <i class="none"></i>
+ Support for Data Ingestion in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../udf.html" title="Support for User Defined Functions in AsterixDB">
+ <i class="none"></i>
+ Support for User Defined Functions in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
+ <i class="none"></i>
+ Filter-Based LSM Index Acceleration</a>
+ </li>
+
+ <li>
+
+ <a href="../api.html" title="HTTP API to AsterixDB">
+ <i class="none"></i>
+ HTTP API to AsterixDB</a>
+ </li>
+ </ul>
+
+
+
+ <hr class="divider" />
+
+ <div id="poweredBy">
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <a href="https://code.google.com/p/hyracks/" title="Hyracks" class="builtBy">
+ <img class="builtBy" alt="Hyracks" src="../images/hyrax_ts.png" />
+ </a>
+ </div>
+ </div>
+ </div>
+
+
+ <div id="bodyColumn" class="span9" >
+
+ <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements. See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership. The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License. You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied. See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>AsterixDB Temporal Functions: Allen’s Relations</h1>
+<div class="section">
+<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
+
+<ul>
+
+<li><a href="#AboutAllensRelations">About Allen’s Relations</a></li>
+
+<li><a href="#AllensRelatonsFunctions">Allen’s Relations Functions</a></li>
+</ul></div>
+<div class="section">
+<h2><a name="About_Allens_Relations_Back_to_TOC"></a><a name="AboutAllensRelations" id="AboutAllensRelations">About Allen’s Relations</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>AsterixDB supports Allen’s relations over interval types. Allen’s relations are also called Allen’s interval algebra. There are totally 13 base relations described by this algebra, and all of them are supported in AsterixDB (note that <tt>interval-equals</tt> is supported by the <tt>=</tt> comparison symbol so there is no extra function for it). </p>
+<p>A detailed description of Allen’s relations can be found from its <a class="externalLink" href="http://en.wikipedia.org/wiki/Allen's_interval_algebra">wikipedia entry</a>. </p></div>
+<div class="section">
+<h2><a name="Allens_Relations_Functions_Back_to_TOC"></a><a name="AllensRelatonsFunctions" id="AllensRelatonsFunctions">Allen’s Relations Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="interval-before_interval-after"></a>interval-before, interval-after</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-before(interval1, interval2)
+interval-after(interval1, interval2)
+</pre></div></div></li>
+
+<li>
+<p>These two functions check whether an interval happens before/after another interval. </p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval1</tt>, <tt>interval2</tt>: two intervals to be compared</li>
+ </ul></li>
+
+<li>
+<p>Return Value:</p>
+<p>A <tt>boolean</tt> value. Specifically, <tt>interval-before(interval1, interval2)</tt> is true if and only if <tt>interval1.end < interval2.start</tt>, and <tt>interval-after(interval1, interval2)</tt> is true if and only if <tt>interval1.start > interval2.end</tt>. If any of the two inputs is <tt>null</tt>, <tt>null</tt> is returned.</p></li>
+
+<li>
+<p>Examples:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+let $itv2 := interval-from-date("2005-05-01", "2012-09-09")
+return {"interval-before": interval-before($itv1, $itv2), "interval-after": interval-after($itv2, $itv1)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "interval-before": true, "interval-after": true }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-covers_interval-covered-by"></a>interval-covers, interval-covered-by</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-covers(interval1, interval2)
+interval-covered-by(interval1, interval2)
+</pre></div></div></li>
+
+<li>
+<p>These two functions check whether one interval covers the other interval.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval1</tt>, <tt>interval2</tt>: two intervals to be compared</li>
+ </ul></li>
+
+<li>
+<p>Return Value:</p>
+<p>A <tt>boolean</tt> value. Specifically, <tt>interval-covers(interval1, interval2)</tt> is true if and only if</p>
+
+<div class="source">
+<div class="source">
+<pre>interval1.start <= interval2.start
+AND interval1.end >= interval2.end
+</pre></div></div>
+<p><tt>interval-covered-by(interval1, interval2)</tt> is true if and only if</p>
+
+<div class="source">
+<div class="source">
+<pre>interval2.start <= interval1.start
+AND interval2.end >= interval1.end
+</pre></div></div>
+<p>For both functions, if any of the two inputs is <tt>null</tt>, <tt>null</tt> is returned.</p></li>
+
+<li>
+<p>Examples:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+let $itv2 := interval-from-date("2000-03-01", "2004-09-09")
+let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+let $itv4 := interval-from-date("2004-09-10", "2012-08-01")
+return {"interval-covers": interval-covers($itv1, $itv2), "interval-covered-by": interval-covered-by($itv3, $itv4)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "interval-covers": true, "interval-covered-by": true }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-overlaps_interval-overlapped-by"></a>interval-overlaps, interval-overlapped-by</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-overlaps(interval1, interval2)
+interval-overlapped-by(interval1, interval2)
+</pre></div></div></li>
+
+<li>
+<p>These functions check whether two intervals overlap with each other.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval1</tt>, <tt>interval2</tt>: two intervals to be compared</li>
+ </ul></li>
+
+<li>
+<p>Return Value:</p>
+<p>A <tt>boolean</tt> value. Specifically, <tt>interval-overlaps(interval1, interval2)</tt> is true if and only if</p>
+
+<div class="source">
+<div class="source">
+<pre>interval1.start < interval2.start
+AND interval2.end > interval1.end
+AND interval1.end > interval2.start
+</pre></div></div>
+<p><tt>interval-overlapped-by(interval1, interval2)</tt> is true if and only if</p>
+
+<div class="source">
+<div class="source">
+<pre>interval2.start < interval1.start
+AND interval1.end > interval2.end
+AND interval2.end > interval1.start
+</pre></div></div>
+<p>For all these functions, if any of the two inputs is <tt>null</tt>, <tt>null</tt> is returned.</p>
+<p>Note that <tt>interval-overlaps</tt> and <tt>interval-overlapped-by</tt> are following the Allen’s relations on the definition of overlap.</p></li>
+
+<li>
+<p>Examples:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+let $itv2 := interval-from-date("2004-05-01", "2012-09-09")
+let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+let $itv4 := interval-from-date("2004-09-10", "2006-12-31")
+return {"overlaps": interval-overlaps($itv1, $itv2),
+ "overlapped-by": interval-overlapped-by($itv3, $itv4)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "overlaps": true, "overlapped-by": true }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-overlapping"></a>interval-overlapping</h3>
+<p>Note that <tt>interval-overlapping</tt> is not an Allen’s Relation, but syntactic sugar we added for the case that the intersect of two intervals is not empty. Basically this function returns true if any of these functions return true: <tt>interval-overlaps</tt>, <tt>interval-overlapped-by</tt>, <tt>interval-covers</tt>, or <tt>interval-covered-by</tt>.</p>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-overlapping(interval1, interval2)
+</pre></div></div></li>
+
+<li>
+<p>This functions check whether two intervals share any points with each other. </p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval1</tt>, <tt>interval2</tt>: two intervals to be compared</li>
+ </ul></li>
+
+<li>
+<p>Return Value:</p>
+<p>A <tt>boolean</tt> value. Specifically, <tt>interval-overlapping(interval1, interval2)</tt> is true if</p>
+
+<div class="source">
+<div class="source">
+<pre>(interval2.start >= interval1.start
+AND interval2.start < interval1.end)
+OR
+(interval2.end > interval1.start
+AND interval2.end <= interval1.end)
+</pre></div></div>
+<p>If any of the two inputs is <tt>null</tt>, <tt>null</tt> is returned.</p></li>
+
+<li>
+<p>Examples:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+let $itv2 := interval-from-date("2004-05-01", "2012-09-09")
+let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+let $itv4 := interval-from-date("2004-09-10", "2006-12-31")
+return {"overlapping1": interval-overlapping($itv1, $itv2),
+ "overlapping2": interval-overlapping($itv3, $itv4)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "overlapping1": true, "overlapping2": true }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-meets_interval-met-by"></a>interval-meets, interval-met-by</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-meets(interval1, interval2)
+interval-met-by(interval1, interval2)
+</pre></div></div></li>
+
+<li>
+<p>These two functions check whether an interval meets with another interval. </p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval1</tt>, <tt>interval2</tt>: two intervals to be compared</li>
+ </ul></li>
+
+<li>
+<p>Return Value:</p>
+<p>A <tt>boolean</tt> value. Specifically, <tt>interval-meets(interval1, interval2)</tt> is true if and only if <tt>interval1.end = interval2.start</tt>, and <tt>interval-met-by(interval1, interval2)</tt> is true if and only if <tt>interval1.start = interval2.end</tt>. If any of the two inputs is <tt>null</tt>, <tt>null</tt> is returned.</p></li>
+
+<li>
+<p>Examples:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+let $itv2 := interval-from-date("2005-01-01", "2012-09-09")
+let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+let $itv4 := interval-from-date("2004-09-10", "2006-08-01")
+return {"meets": interval-meets($itv1, $itv2), "metby": interval-met-by($itv3, $itv4)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "meets": true, "metby": true }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-starts_interval-started-by"></a>interval-starts, interval-started-by</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-starts(interval1, interval2)
+interval-started-by(interval1, interval2)
+</pre></div></div></li>
+
+<li>
+<p>These two functions check whether one interval starts with the other interval.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval1</tt>, <tt>interval2</tt>: two intervals to be compared</li>
+ </ul></li>
+
+<li>
+<p>Return Value:</p>
+<p>A <tt>boolean</tt> value. Specifically, <tt>interval-starts(interval1, interval2)</tt> returns true if and only if</p>
+
+<div class="source">
+<div class="source">
+<pre>interval1.start = interval2.start
+AND interval1.end <= interval2.end
+</pre></div></div>
+<p><tt>interval-started-by(interval1, interval2)</tt> returns true if and only if</p>
+
+<div class="source">
+<div class="source">
+<pre>interval1.start = interval2.start
+AND interval2.end <= interval1.end
+</pre></div></div>
+<p>For both functions, if any of the two inputs is <tt>null</tt>, <tt>null</tt> is returned.</p></li>
+
+<li>
+<p>Examples:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+let $itv2 := interval-from-date("2000-01-01", "2012-09-09")
+let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+let $itv4 := interval-from-date("2006-08-01", "2006-08-01")
+return {"interval-starts": interval-starts($itv1, $itv2), "interval-started-by": interval-started-by($itv3, $itv4)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "interval-starts": true, "interval-started-by": true }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-ends_interval-ended-by"></a>interval-ends, interval-ended-by</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-ends(interval1, interval2)
+interval-ended-by(interval1, interval2)
+</pre></div></div></li>
+
+<li>
+<p>These two functions check whether one interval ends with the other interval.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval1</tt>, <tt>interval2</tt>: two intervals to be compared</li>
+ </ul></li>
+
+<li>
+<p>Return Value:</p>
+<p>A <tt>boolean</tt> value. Specifically, <tt>interval-ends(interval1, interval2)</tt> returns true if and only if</p>
+
+<div class="source">
+<div class="source">
+<pre>interval1.end = interval2.end
+AND interval1.start >= interval2.start
+</pre></div></div>
+<p><tt>interval-ended-by(interval1, interval2)</tt> returns true if and only if</p>
+
+<div class="source">
+<div class="source">
+<pre>interval2.end = interval1.end
+AND interval2.start >= interval1.start
+</pre></div></div>
+<p>For both functions, if any of the two inputs is <tt>null</tt>, <tt>null</tt> is returned.</p></li>
+
+<li>
+<p>Examples:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+let $itv2 := interval-from-date("1998-01-01", "2005-01-01")
+let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+let $itv4 := interval-from-date("2006-09-10", "2007-03-01")
+return {"interval-ends": interval-ends($itv1, $itv2), "interval-ended-by": interval-ended-by($itv3, $itv4) }
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "interval-ends": true, "interval-ended-by": true }
+</pre></div></div></li>
+</ul></div></div>
+ </div>
+ </div>
+ </div>
+
+ <hr/>
+
+ <footer>
+ <div class="container-fluid">
+ <div class="row span12">Copyright © 2015
+ <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+ All Rights Reserved.
+
+ </div>
+
+ <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+ feather logo, and the Apache AsterixDB project logo are either
+ registered trademarks or trademarks of The Apache Software
+ Foundation in the United States and other countries.
+ All other marks mentioned may be trademarks or registered
+ trademarks of their respective owners.</div>
+
+
+ </div>
+ </footer>
+ </body>
+</html>
diff --git a/docs/0.8.7-incubating/aql/datamodel.html b/docs/0.8.7-incubating/aql/datamodel.html
new file mode 100644
index 0000000..7f756e6
--- /dev/null
+++ b/docs/0.8.7-incubating/aql/datamodel.html
@@ -0,0 +1,783 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2015-11-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+ <head>
+ <meta charset="UTF-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+ <meta name="Date-Revision-yyyymmdd" content="20151124" />
+ <meta http-equiv="Content-Language" content="en" />
+ <title>AsterixDB – Asterix Data Model (ADM)</title>
+ <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+ <link rel="stylesheet" href="../css/site.css" />
+ <link rel="stylesheet" href="../css/print.css" media="print" />
+
+
+ <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+
+
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');</script>
+
+ </head>
+ <body class="topBarDisabled">
+
+
+
+
+ <div class="container-fluid">
+ <div id="banner">
+ <div class="pull-left">
+ <a href="http://asterixdb.apache.org/" id="bannerLeft">
+ <img src="../images/asterixlogo.png" alt="AsterixDB"/>
+ </a>
+ </div>
+ <div class="pull-right"> </div>
+ <div class="clear"><hr/></div>
+ </div>
+
+ <div id="breadcrumbs">
+ <ul class="breadcrumb">
+
+
+ <li id="publishDate">Last Published: 2015-11-24</li>
+
+
+
+ <li id="projectVersion" class="pull-right">Version: 0.8.7-incubating</li>
+
+ <li class="divider pull-right">|</li>
+
+ <li class="pull-right"> <a href="../index.html" title="Documentation Home">
+ Documentation Home</a>
+ </li>
+
+ </ul>
+ </div>
+
+
+ <div class="row-fluid">
+ <div id="leftColumn" class="span3">
+ <div class="well sidebar-nav">
+
+
+ <ul class="nav nav-list">
+ <li class="nav-header">Documentation</li>
+
+ <li>
+
+ <a href="../install.html" title="Installing and Managing AsterixDB using Managix">
+ <i class="none"></i>
+ Installing and Managing AsterixDB using Managix</a>
+ </li>
+
+ <li>
+
+ <a href="../yarn.html" title="Deploying AsterixDB using YARN">
+ <i class="none"></i>
+ Deploying AsterixDB using YARN</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer.html" title="AsterixDB 101: An ADM and AQL Primer">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer-sql-like.html" title="AsterixDB 101: An ADM and AQL Primer (For SQL Fans)">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer (For SQL Fans)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/js-sdk.html" title="AsterixDB Javascript SDK">
+ <i class="none"></i>
+ AsterixDB Javascript SDK</a>
+ </li>
+
+ <li class="active">
+
+ <a href="#"><i class="none"></i>Asterix Data Model (ADM)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/manual.html" title="Asterix Query Language (AQL)">
+ <i class="none"></i>
+ Asterix Query Language (AQL)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/functions.html" title="AQL Functions">
+ <i class="none"></i>
+ AQL Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/allens.html" title="AQL Allen's Relations Functions">
+ <i class="none"></i>
+ AQL Allen's Relations Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/similarity.html" title="AQL Support of Similarity Queries">
+ <i class="none"></i>
+ AQL Support of Similarity Queries</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/externaldata.html" title="Accessing External Data">
+ <i class="none"></i>
+ Accessing External Data</a>
+ </li>
+
+ <li>
+
+ <a href="../feeds/tutorial.html" title="Support for Data Ingestion in AsterixDB">
+ <i class="none"></i>
+ Support for Data Ingestion in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../udf.html" title="Support for User Defined Functions in AsterixDB">
+ <i class="none"></i>
+ Support for User Defined Functions in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
+ <i class="none"></i>
+ Filter-Based LSM Index Acceleration</a>
+ </li>
+
+ <li>
+
+ <a href="../api.html" title="HTTP API to AsterixDB">
+ <i class="none"></i>
+ HTTP API to AsterixDB</a>
+ </li>
+ </ul>
+
+
+
+ <hr class="divider" />
+
+ <div id="poweredBy">
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <a href="https://code.google.com/p/hyracks/" title="Hyracks" class="builtBy">
+ <img class="builtBy" alt="Hyracks" src="../images/hyrax_ts.png" />
+ </a>
+ </div>
+ </div>
+ </div>
+
+
+ <div id="bodyColumn" class="span9" >
+
+ <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements. See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership. The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License. You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied. See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>Asterix Data Model (ADM)</h1>
+<div class="section">
+<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
+
+<ul>
+
+<li><a href="#PrimitiveTypes">Primitive Types</a>
+
+<ul>
+
+<li><a href="#PrimitiveTypesBoolean">Boolean</a></li>
+
+<li><a href="#PrimitiveTypesInt">Int8 / Int16 / Int32 / Int64</a></li>
+
+<li><a href="#PrimitiveTypesFloat">Float</a></li>
+
+<li><a href="#PrimitiveTypesDouble">Double</a></li>
+
+<li><a href="#PrimitiveTypesString">String</a></li>
+
+<li><a href="#PrimitiveTypesPoint">Point</a></li>
+
+<li><a href="#PrimitiveTypesLine">Line</a></li>
+
+<li><a href="#PrimitiveTypesRectangle">Rectangle</a></li>
+
+<li><a href="#PrimitiveTypesCircle">Circle</a></li>
+
+<li><a href="#PrimitiveTypesPolygon">Polygon</a></li>
+
+<li><a href="#PrimitiveTypesDate">Date</a></li>
+
+<li><a href="#PrimitiveTypesTime">Time</a></li>
+
+<li><a href="#PrimitiveTypesDateTime">Datetime</a></li>
+
+<li><a href="#PrimitiveTypesDuration">Duration/Year-month-duration/Day-time-duration</a></li>
+
+<li><a href="#PrimitiveTypesInterval">Interval</a></li>
+
+<li><a href="#PrimitiveTypesUUID">UUID</a></li>
+ </ul></li>
+
+<li><a href="#DerivedTypes">Derived Types</a>
+
+<ul>
+
+<li><a href="#DerivedTypesRecord">Record</a></li>
+
+<li><a href="#DerivedTypesOrderedList">OrderedList</a></li>
+
+<li><a href="#DerivedTypesUnorderedList">UnorderedList</a></li>
+ </ul></li>
+</ul>
+<p>An instance of Asterix data model (ADM) can be a <i><i>primitive type</i></i> (<tt>int32</tt>, <tt>int64</tt>, <tt>string</tt>, <tt>float</tt>, <tt>double</tt>, <tt>date</tt>, <tt>time</tt>, <tt>datetime</tt>, etc. or <tt>null</tt>) or a <i><i>derived type</i></i>.</p></div>
+<div class="section">
+<h2><a name="Primitive_Types_Back_to_TOC"></a><a name="PrimitiveTypes" id="PrimitiveTypes">Primitive Types</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="BooleanBack_to_TOC"></a><a name="PrimitiveTypesBoolean" id="PrimitiveTypesBoolean">Boolean</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>boolean</tt> data type can have one of the two values: <i><i>true</i></i> or <i><i>false</i></i>.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $t := true
+let $f := false
+return { "true": $t, "false": $f }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "true": true, "false": false }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="Int8__Int16__Int32__Int64_Back_to_TOC"></a><a name="PrimitiveTypesInt" id="PrimitiveTypesInt">Int8 / Int16 / Int32 / Int64</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p>Integer types using 8, 16, 32, or 64 bits. The ranges of these types are:</p>
+
+<ul>
+
+<li><tt>int8</tt>: -127 to 127</li>
+
+<li><tt>int16</tt>: -32767 to 32767</li>
+
+<li><tt>int32</tt>: -2147483647 to 2147483647</li>
+
+<li><tt>int64</tt>: -9223372036854775808 to 9223372036854775807</li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v8 := int8("125")
+let $v16 := int16("32765")
+let $v32 := 294967295
+let $v64 := int64("1700000000000000000")
+return { "int8": $v8, "int16": $v16, "int32": $v32, "int64": $v64}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "int8": 125i8, "int16": 32765i16, "int32": 294967295, "int64": 1700000000000000000i64 }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="Float_Back_to_TOC"></a><a name="PrimitiveTypesFloat" id="PrimitiveTypesFloat">Float</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>float</tt> represents approximate numeric data values using 4 bytes. The range of a float value can be from 2^(-149) to (2-2^(-23)·2^(127) for both positive and negative. Beyond these ranges will get <tt>INF</tt> or <tt>-INF</tt>.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := float("NaN")
+let $v2 := float("INF")
+let $v3 := float("-INF")
+let $v4 := float("-2013.5")
+return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": NaNf, "v2": Infinityf, "v3": -Infinityf, "v4": -2013.5f }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="Double_Back_to_TOC"></a><a name="PrimitiveTypesDouble" id="PrimitiveTypesDouble">Double</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>double</tt> represents approximate numeric data values using 8 bytes. The range of a double value can be from (2^(-1022)) to (2-2^(-52))·2^(1023) for both positive and negative. Beyond these ranges will get <tt>INF</tt> or <tt>-INF</tt>.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := double("NaN")
+let $v2 := double("INF")
+let $v3 := double("-INF")
+let $v4 := double("-2013.593823748327284")
+return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": NaNd, "v2": Infinityd, "v3": -Infinityd, "v4": -2013.5938237483274d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="String_Back_to_TOC"></a><a name="PrimitiveTypesString" id="PrimitiveTypesString">String</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>string</tt> represents a sequence of characters.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := string("This is a string.")
+let $v2 := string("\"This is a quoted string\"")
+return { "v1": $v1, "v2": $v2 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": "This is a string.", "v2": "\"This is a quoted string\"" }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="Point_Back_to_TOC"></a><a name="PrimitiveTypesPoint" id="PrimitiveTypesPoint">Point</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>point</tt> is the fundamental two-dimensional building block for spatial types. It consists of two <tt>double</tt> coordinates x and y.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := point("80.10d, -10E5")
+let $v2 := point("5.10E-10d, -10E5")
+return { "v1": $v1, "v2": $v2 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": point("80.1,-1000000.0"), "v2": point("5.1E-10,-1000000.0") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="Line_Back_to_TOC"></a><a name="PrimitiveTypesLine" id="PrimitiveTypesLine">Line</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>line</tt> consists of two points that represent the start and the end points of a line segment.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := line("10.1234,11.1e-1 +10.2E-2,-11.22")
+let $v2 := line("0.1234,-1.00e-10 +10.5E-2,-01.02")
+return { "v1": $v1, "v2": $v2 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": line("10.1234,1.11 0.102,-11.22"), "v2": line("0.1234,-1.0E-10 0.105,-1.02") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="RectangleBack_to_TOC"></a><a name="PrimitiveTypesRectangle" id="PrimitiveTypesRectangle">Rectangle</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>rectangle</tt> consists of two points that represent the <i><i>bottom left</i></i> and <i><i>upper right</i></i> corners of a rectangle.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := rectangle("5.1,11.8 87.6,15.6548")
+let $v2 := rectangle("0.1234,-1.00e-10 5.5487,0.48765")
+return { "v1": $v1, "v2": $v2 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": rectangle("5.1,11.8 87.6,15.6548"), "v2": rectangle("0.1234,-1.0E-10 5.5487,0.48765") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="CircleBack_to_TOC"></a><a name="PrimitiveTypesCircle" id="PrimitiveTypesCircle">Circle</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>circle</tt> consists of one point that represents the center of the circle and a radius of type <tt>double</tt>.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := circle("10.1234,11.1e-1 +10.2E-2")
+let $v2 := circle("0.1234,-1.00e-10 +10.5E-2")
+return { "v1": $v1, "v2": $v2 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": circle("10.1234,1.11 0.102"), "v2": circle("0.1234,-1.0E-10 0.105") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="PolygonBack_to_TOC"></a><a name="PrimitiveTypesPolygon" id="PrimitiveTypesPolygon">Polygon</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>polygon</tt> consists of <i><i>n</i></i> points that represent the vertices of a <i><i>simple closed</i></i> polygon.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := polygon("-1.2,+1.3e2 -2.14E+5,2.15 -3.5e+2,03.6 -4.6E-3,+4.81")
+let $v2 := polygon("-1.0,+10.5e2 -02.15E+50,2.5 -1.0,+3.3e3 -2.50E+05,20.15 +3.5e+2,03.6 -4.60E-3,+4.75 -2,+1.0e2 -2.00E+5,20.10 30.5,03.25 -4.33E-3,+4.75")
+return { "v1": $v1, "v2": $v2 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": polygon("-1.2,130.0 -214000.0,2.15 -350.0,3.6 -0.0046,4.81"), "v2": polygon("-1.0,1050.0 -2.15E50,2.5 -1.0,3300.0 -250000.0,20.15 350.0,3.6 -0.0046,4.75 -2.0,100.0 -200000.0,20.1 30.5,3.25 -0.00433,4.75") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="DateBack_to_TOC"></a><a name="PrimitiveTypesDate" id="PrimitiveTypesDate">Date</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>date</tt> represents a time point along the Gregorian calendar system specified by the year, month and day. ASTERIX supports the date from <tt>-9999-01-01</tt> to <tt>9999-12-31</tt>.</p>
+<p>A date value can be represented in two formats, extended format and basic format.</p>
+
+<ul>
+
+<li>Extended format is represented as <tt>[-]yyyy-mm-dd</tt> for <tt>year-month-day</tt>. Each field should be padded if there are less digits than the format specified.</li>
+
+<li>Basic format is in the format of <tt>[-]yyyymmdd</tt>.</li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := date("2013-01-01")
+let $v2 := date("-19700101")
+return { "v1": $v1, "v2": $v2 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": date("2013-01-01"), "v2": date("-1970-01-01") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="TimeBack_to_TOC"></a><a name="PrimitiveTypesTime" id="PrimitiveTypesTime">Time</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>time</tt> type describes the time within the range of a day. It is represented by three fields: hour, minute and second. Millisecond field is optional as the fraction of the second field. Its extended format is as <tt>hh:mm:ss[.mmm]</tt> and the basic format is <tt>hhmmss[mmm]</tt>. The value domain is from <tt>00:00:00.000</tt> to <tt>23:59:59.999</tt>.</p>
+<p>Timezone field is optional for a time value. Timezone is represented as <tt>[+|-]hh:mm</tt> for extended format or <tt>[+|-]hhmm</tt> for basic format. Note that the sign designators cannot be omitted. <tt>Z</tt> can also be used to represent the UTC local time. If no timezone information is given, it is UTC by default.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := time("12:12:12.039Z")
+let $v2 := time("000000000-0800")
+return { "v1": $v1, "v2": $v2 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": time("12:12:12.039Z"), "v2": time("08:00:00.000Z") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="DatetimeBack_to_TOC"></a><a name="PrimitiveTypesDateTime" id="PrimitiveTypesDateTime">Datetime</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p>A <tt>datetime</tt> value is a combination of an <tt>date</tt> and <tt>time</tt>, representing a fixed time point along the Gregorian calendar system. The value is among <tt>-9999-01-01 00:00:00.000</tt> and <tt>9999-12-31 23:59:59.999</tt>.</p>
+<p>A <tt>datetime</tt> value is represented as a combination of the representation of its <tt>date</tt> part and <tt>time</tt> part, separated by a separator <tt>T</tt>. Either extended or basic format can be used, and the two parts should be the same format.</p>
+<p>Millisecond field and timezone field are optional, as specified in the <tt>time</tt> type.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := datetime("2013-01-01T12:12:12.039Z")
+let $v2 := datetime("-19700101T000000000-0800")
+return { "v1": $v1, "v2": $v2 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": datetime("2013-01-01T12:12:12.039Z"), "v2": datetime("-1970-01-01T08:00:00.000Z") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="DurationYear-month-durationDay-time-durationBack_to_TOC"></a><a name="PrimitiveTypesDuration" id="PrimitiveTypesDuration">Duration/Year-month-duration/Day-time-duration</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>duration</tt> represents a duration of time. A duration value is specified by integers on at least one of the following fields: year, month, day, hour, minute, second, and millisecond.</p>
+<p>A duration value is in the format of <tt>[-]PnYnMnDTnHnMn.mmmS</tt>. The millisecond part (as the fraction of the second field) is optional, and when no millisecond field is used, the decimal point should also be absent.</p>
+<p>Negative durations are also supported for the arithmetic operations between time instance types (<tt>date</tt>, <tt>time</tt> and <tt>datetime</tt>), and is used to roll the time back for the given duration. For example <tt>date("2012-01-01") + duration("-P3D")</tt> will return <tt>date("2011-12-29")</tt>.</p>
+<p>There are also two sub-duration types, namely <tt>year-month-duration</tt> and <tt>day-time-duration</tt>. <tt>year-month-duration</tt> represents only the years and months of a duration, while <tt>day-time-duration</tt> represents only the day to millisecond fields. Different from the <tt>duration</tt> type, both these two subtypes are totally ordered, so they can be used for comparison and index construction.</p>
+<p>Note that a canonical representation of the duration is always returned, regardless whether the duration is in the canonical representation or not from the user’s input. More information about canonical representation can be found from <a class="externalLink" href="http://www.w3.org/TR/xpath-functions/#canonical-dayTimeDuration">XPath dayTimeDuration Canonical Representation</a> and <a class="externalLink" href="http://www.w3.org/TR/xpath-functions/#canonical-yearMonthDuration">yearMonthDuration Canonical Representation</a>.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := duration("P100Y12MT12M")
+let $v2 := duration("-PT20.943S")
+return { "v1": $v1, "v2": $v2 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": duration("P101YT12M"), "v2": duration("-PT20.943S") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="IntervalBack_to_TOC"></a><a name="PrimitiveTypesInterval" id="PrimitiveTypesInterval">Interval</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>interval</tt> represents inclusive-exclusive ranges of time. It is defined by two time point values with the same temporal type(<tt>date</tt>, <tt>time</tt> or <tt>datetime</tt>).</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := interval-from-date(date("2013-01-01"), date("20130505"))
+let $v2 := interval-from-time(time("00:01:01"), time("213901049+0800"))
+let $v3 := interval-from-datetime(datetime("2013-01-01T00:01:01"), datetime("20130505T213901049+0800"))
+return { "v1": $v1, "v2": $v2, "v3": $v3 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": interval-date("2013-01-01, 2013-05-05"), "v2": interval-time("00:01:01.000Z, 13:39:01.049Z"), "v3": interval-datetime("2013-01-01T00:01:01.000Z, 2013-05-05T13:39:01.049Z") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="UUIDBack_to_TOC"></a><a name="PrimitiveTypesUUID" id="PrimitiveTypesUUID">UUID</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p><tt>uuid</tt> represents a UUID value, which stands for Universally unique identifier. It is defined by a canonical format using hexadecimal text with inserted hyphen characters. (E.g.: 5a28ce1e-6a74-4201-9e8f-683256e5706f). This type is generally used to store auto-generated primary key values.</p>
+
+<ul>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
+return { "v1":$v1 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": uuid("5c848e5c-6b6a-498f-8452-8847a2957421") }
+</pre></div></div></li>
+</ul></div></div>
+<div class="section">
+<h2><a name="Derived_TypesBack_to_TOC"></a><a name="DerivedTypes" id="DerivedTypes">Derived Types</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="RecordBack_to_TOC"></a><a name="DerivedTypesRecord" id="DerivedTypesRecord">Record</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p>A <tt>record</tt> contains a set of fields, where each field is described by its name and type. A record type is either open or closed. Open records can contain fields that are not part of the type definition, while closed records cannot. Syntactically, record constructors are surrounded by curly braces “{…}”.</p>
+<p>An example would be</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 213508, "name": "Alice Bob" }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="OrderedListBack_to_TOC"></a><a name="DerivedTypesOrderedList" id="DerivedTypesOrderedList">OrderedList</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p>An <tt>orderedList</tt> is a sequence of values for which the order is determined by creation or insertion. OrderedList constructors are denoted by brackets: “[…]”.</p>
+<p>An example would be</p>
+
+<div class="source">
+<div class="source">
+<pre> ["alice", 123, "bob", null]
+</pre></div></div></div>
+<div class="section">
+<h3><a name="UnorderedListBack_to_TOC"></a><a name="DerivedTypesUnorderedList" id="DerivedTypesUnorderedList">UnorderedList</a><font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p>An <tt>unorderedList</tt> is an unordered sequence of values, similar to bags in SQL. UnorderedList constructors are denoted by two opening flower braces followed by data and two closing flower braces, like “{{…}}”.</p>
+<p>An example would be</p>
+
+<div class="source">
+<div class="source">
+<pre> {{"hello", 9328, "world", [1, 2, null]}}
+</pre></div></div></div></div>
+ </div>
+ </div>
+ </div>
+
+ <hr/>
+
+ <footer>
+ <div class="container-fluid">
+ <div class="row span12">Copyright © 2015
+ <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+ All Rights Reserved.
+
+ </div>
+
+ <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+ feather logo, and the Apache AsterixDB project logo are either
+ registered trademarks or trademarks of The Apache Software
+ Foundation in the United States and other countries.
+ All other marks mentioned may be trademarks or registered
+ trademarks of their respective owners.</div>
+
+
+ </div>
+ </footer>
+ </body>
+</html>
diff --git a/docs/0.8.7-incubating/aql/externaldata.html b/docs/0.8.7-incubating/aql/externaldata.html
new file mode 100644
index 0000000..b0ca194
--- /dev/null
+++ b/docs/0.8.7-incubating/aql/externaldata.html
@@ -0,0 +1,633 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2015-11-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+ <head>
+ <meta charset="UTF-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+ <meta name="Date-Revision-yyyymmdd" content="20151124" />
+ <meta http-equiv="Content-Language" content="en" />
+ <title>AsterixDB – Accessing External Data in AsterixDB</title>
+ <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+ <link rel="stylesheet" href="../css/site.css" />
+ <link rel="stylesheet" href="../css/print.css" media="print" />
+
+
+ <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+
+
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');</script>
+
+ </head>
+ <body class="topBarDisabled">
+
+
+
+
+ <div class="container-fluid">
+ <div id="banner">
+ <div class="pull-left">
+ <a href="http://asterixdb.apache.org/" id="bannerLeft">
+ <img src="../images/asterixlogo.png" alt="AsterixDB"/>
+ </a>
+ </div>
+ <div class="pull-right"> </div>
+ <div class="clear"><hr/></div>
+ </div>
+
+ <div id="breadcrumbs">
+ <ul class="breadcrumb">
+
+
+ <li id="publishDate">Last Published: 2015-11-24</li>
+
+
+
+ <li id="projectVersion" class="pull-right">Version: 0.8.7-incubating</li>
+
+ <li class="divider pull-right">|</li>
+
+ <li class="pull-right"> <a href="../index.html" title="Documentation Home">
+ Documentation Home</a>
+ </li>
+
+ </ul>
+ </div>
+
+
+ <div class="row-fluid">
+ <div id="leftColumn" class="span3">
+ <div class="well sidebar-nav">
+
+
+ <ul class="nav nav-list">
+ <li class="nav-header">Documentation</li>
+
+ <li>
+
+ <a href="../install.html" title="Installing and Managing AsterixDB using Managix">
+ <i class="none"></i>
+ Installing and Managing AsterixDB using Managix</a>
+ </li>
+
+ <li>
+
+ <a href="../yarn.html" title="Deploying AsterixDB using YARN">
+ <i class="none"></i>
+ Deploying AsterixDB using YARN</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer.html" title="AsterixDB 101: An ADM and AQL Primer">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer-sql-like.html" title="AsterixDB 101: An ADM and AQL Primer (For SQL Fans)">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer (For SQL Fans)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/js-sdk.html" title="AsterixDB Javascript SDK">
+ <i class="none"></i>
+ AsterixDB Javascript SDK</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/datamodel.html" title="Asterix Data Model (ADM)">
+ <i class="none"></i>
+ Asterix Data Model (ADM)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/manual.html" title="Asterix Query Language (AQL)">
+ <i class="none"></i>
+ Asterix Query Language (AQL)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/functions.html" title="AQL Functions">
+ <i class="none"></i>
+ AQL Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/allens.html" title="AQL Allen's Relations Functions">
+ <i class="none"></i>
+ AQL Allen's Relations Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/similarity.html" title="AQL Support of Similarity Queries">
+ <i class="none"></i>
+ AQL Support of Similarity Queries</a>
+ </li>
+
+ <li class="active">
+
+ <a href="#"><i class="none"></i>Accessing External Data</a>
+ </li>
+
+ <li>
+
+ <a href="../feeds/tutorial.html" title="Support for Data Ingestion in AsterixDB">
+ <i class="none"></i>
+ Support for Data Ingestion in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../udf.html" title="Support for User Defined Functions in AsterixDB">
+ <i class="none"></i>
+ Support for User Defined Functions in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
+ <i class="none"></i>
+ Filter-Based LSM Index Acceleration</a>
+ </li>
+
+ <li>
+
+ <a href="../api.html" title="HTTP API to AsterixDB">
+ <i class="none"></i>
+ HTTP API to AsterixDB</a>
+ </li>
+ </ul>
+
+
+
+ <hr class="divider" />
+
+ <div id="poweredBy">
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <a href="https://code.google.com/p/hyracks/" title="Hyracks" class="builtBy">
+ <img class="builtBy" alt="Hyracks" src="../images/hyrax_ts.png" />
+ </a>
+ </div>
+ </div>
+ </div>
+
+
+ <div id="bodyColumn" class="span9" >
+
+ <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements. See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership. The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License. You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied. See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>Accessing External Data in AsterixDB</h1>
+<div class="section">
+<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
+
+<ul>
+
+<li><a href="#Introduction">Introduction</a></li>
+
+<li><a href="#IntroductionAdapterForAnExternalDataset">Adapter for an External Dataset</a></li>
+
+<li><a href="#IntroductionCreatingAnExternalDataset">Creating an External Dataset</a></li>
+
+<li><a href="#WritingQueriesAgainstAnExternalDataset">Writing Queries against an External Dataset</a></li>
+
+<li><a href="#BuildingIndexesOverExternalDatasets">Building Indexes over External Datasets</a></li>
+
+<li><a href="#ExternalDataSnapshot">External Data Snapshots</a></li>
+
+<li><a href="#FAQ">Frequently Asked Questions</a></li>
+</ul></div>
+<div class="section">
+<h2><a name="Introduction_Back_to_TOC"></a><a name="Introduction" id="Introduction">Introduction</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>Data that needs to be processed by AsterixDB could be residing outside AsterixDB storage. Examples include data files on a distributed file system such as HDFS or on the local file system of a machine that is part of an AsterixDB cluster. For AsterixDB to process such data, an end-user may create a regular dataset in AsterixDB (a.k.a. an internal dataset) and load the dataset with the data. AsterixDB also supports ‘‘external datasets’’ so that it is not necessary to “load” all data prior to using it. This also avoids creating multiple copies of data and the need to keep the copies in sync.</p>
+<div class="section">
+<h3><a name="Adapter_for_an_External_Dataset_Back_to_TOC"></a><a name="IntroductionAdapterForAnExternalDataset" id="IntroductionAdapterForAnExternalDataset">Adapter for an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p>External data is accessed using wrappers (adapters in AsterixDB) that abstract away the mechanism of connecting with an external service, receiving its data and transforming the data into ADM records that are understood by AsterixDB. AsterixDB comes with built-in adapters for common storage systems such as HDFS or the local file system.</p></div>
+<div class="section">
+<h3><a name="Creating_an_External_Dataset_Back_to_TOC"></a><a name="IntroductionCreatingAnExternalDataset" id="IntroductionCreatingAnExternalDataset">Creating an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h3>
+<p>As an example we consider the Lineitem dataset from the <a class="externalLink" href="http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSTPCHLinkedData/tpch.sql">TPCH schema</a>. We assume that you have successfully created an AsterixDB instance following the instructions at <a href="../install.html">Installing AsterixDB Using Managix</a>. <i>For constructing an example, we assume a single machine setup..</i></p>
+<p>Similar to a regular dataset, an external dataset has an associated datatype. We shall first create the datatype associated with each record in Lineitem data. Paste the following in the query textbox on the webpage at <a class="externalLink" href="http://127.0.0.1:19001">http://127.0.0.1:19001</a> and hit ‘Execute’.</p>
+
+<div class="source">
+<div class="source">
+<pre> create dataverse ExternalFileDemo;
+ use dataverse ExternalFileDemo;
+
+ create type LineitemType as closed {
+ l_orderkey:int32,
+ l_partkey: int32,
+ l_suppkey: int32,
+ l_linenumber: int32,
+ l_quantity: double,
+ l_extendedprice: double,
+ l_discount: double,
+ l_tax: double,
+ l_returnflag: string,
+ l_linestatus: string,
+ l_shipdate: string,
+ l_commitdate: string,
+ l_receiptdate: string,
+ l_shipinstruct: string,
+ l_shipmode: string,
+ l_comment: string}
+</pre></div></div>
+<p>Here, we describe two scenarios.</p>
+<div class="section">
+<h4><a name="a1_Data_file_resides_on_the_local_file_system_of_a_host"></a>1) Data file resides on the local file system of a host</h4>
+<p>Prerequisite: The host is a part of the ASTERIX cluster.</p>
+<p>Earlier, we assumed a single machine ASTERIX setup. To satisfy the prerequisite, log-in to the machine running ASTERIX.</p>
+
+<ul>
+
+<li>Download the <a href="../data/lineitem.tbl">data file</a> to an appropriate location. We denote this location by SOURCE_PATH.</li>
+</ul>
+<p>ASTERIX provides a built-in adapter for data residing on the local file system. The adapter is referred by its alias- ‘localfs’. We create an external dataset named Lineitem and use the ‘localfs’ adapter.</p>
+
+<div class="source">
+<div class="source">
+<pre> create external dataset Lineitem(LineitemType)
+ using localfs
+</pre></div></div>
+<p>Above, the definition is not complete as we need to provide a set of parameters that are specific to the source file.</p>
+
+<table border="0" class="table table-striped">
+
+<tr class="a">
+
+<td> Parameter </td>
+
+<td> Description </td>
+</tr>
+
+<tr class="b">
+
+<td> path </td>
+
+<td> A fully qualified path of the form <tt>host://<absolute path></tt>.
+ Use a comma separated list if there are multiple files.
+ E.g. <tt>host1://<absolute path></tt>, <tt>host2://<absolute path></tt> and so forth. </td>
+</tr>
+
+<tr class="a">
+
+<td> format </td>
+
+<td> The format for the content. Use 'adm' for data in ADM (ASTERIX Data Model) or <a class="externalLink" href="http://www.json.org/">JSON</a> format. Use 'delimited-text' if fields are separated by a delimiting character (eg., CSV). </td></tr>
+
+<tr class="b">
+
+<td>delimiter</td>
+
+<td>The delimiting character in the source file if format is 'delimited text'</td>
+</tr>
+</table>
+<p>As we are using a single single machine ASTERIX instance, we use 127.0.0.1 as host in the path parameter. We <i>complete the create dataset statement</i> as follows.</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse ExternalFileDemo;
+
+ create external dataset Lineitem(LineitemType)
+ using localfs
+ (("path"="127.0.0.1://SOURCE_PATH"),
+ ("format"="delimited-text"),
+ ("delimiter"="|"));
+</pre></div></div>
+<p>Please substitute SOURCE_PATH with the absolute path to the source file on the local file system.</p></div>
+<div class="section">
+<h4><a name="Common_source_of_error"></a>Common source of error</h4>
+<p>An incorrect value for the path parameter will give the following exception message when the dataset is used in a query.</p>
+
+<div class="source">
+<div class="source">
+<pre> org.apache.hyracks.algebricks.common.exceptions.AlgebricksException: org.apache.hyracks.api.exceptions.HyracksDataException: org.apache.hyracks.api.exceptions.HyracksDataException: Job failed.
+</pre></div></div>
+<p>Verify the correctness of the path parameter provided to the localfs adapter. Note that the path parameter must be an absolute path to the data file. For e.g. if you saved your file in your home directory (assume it to be /home/joe), then the path value should be</p>
+
+<div class="source">
+<div class="source">
+<pre> 127.0.0.1:///home/joe/lineitem.tbl.
+</pre></div></div>
+<p>In your web-browser, navigate to 127.0.0.1:19001 and paste the above to the query text box. Finally hit ‘Execute’.</p>
+<p>Next we move over to the the section <a href="#Writing_Queries_against_an_External_Dataset">Writing Queries against an External Dataset</a> and try a sample query against the external dataset.</p></div>
+<div class="section">
+<h4><a name="a2_Data_file_resides_on_an_HDFS_instance"></a>2) Data file resides on an HDFS instance</h4>
+<p>rerequisite: It is required that the Namenode and HDFS Datanodes are reachable from the hosts that form the AsterixDB cluster. AsterixDB provides a built-in adapter for data residing on HDFS. The HDFS adapter can be referred (in AQL) by its alias - ‘hdfs’. We can create an external dataset named Lineitem and associate the HDFS adapter with it as follows;</p>
+
+<div class="source">
+<div class="source">
+<pre> create external dataset Lineitem(LineitemType)
+ using hdfs((“hdfs”:”hdfs://localhost:54310”),(“path”:”/asterix/Lineitem.tbl”),...,(“input- format”:”rc-format”));
+</pre></div></div>
+<p>The expected parameters are described below:</p>
+
+<table border="0" class="table table-striped">
+
+<tr class="a">
+
+<td> Parameter </td>
+
+<td> Description </td>
+</tr>
+
+<tr class="b">
+
+<td> hdfs </td>
+
+<td> The HDFS URL </td>
+</tr>
+
+<tr class="a">
+
+<td> path </td>
+
+<td> The absolute path to the source HDFS file or directory. Use a comma separated list if there are multiple files or directories. </td></tr>
+
+<tr class="b">
+
+<td> input-format </td>
+
+<td> The associated input format. Use 'text-input-format' for text files , 'sequence-input-format' for hadoop sequence files, 'rc-input-format' for Hadoop Record Columnar files, or a fully qualified name of an implementation of org.apache.hadoop.mapred.InputFormat. </td>
+</tr>
+
+<tr class="a">
+
+<td> format </td>
+
+<td> The format of the input content. Use 'adm' for text data in ADM (ASTERIX Data Model) or <a class="externalLink" href="http://www.json.org/">JSON</a> format, 'delimited-text' for text delimited data that has fields separated by a delimiting character, 'binary' for other data.</td>
+</tr>
+
+<tr class="b">
+
+<td> delimiter </td>
+
+<td> The delimiting character in the source file if format is 'delimited text' </td>
+</tr>
+
+<tr class="a">
+
+<td> parser </td>
+
+<td> The parser used to parse HDFS records if the format is 'binary'. Use 'hive- parser' for data deserialized by a Hive Serde (AsterixDB can understand deserialized Hive objects) or a fully qualified class name of user- implemented parser that implements the interface org.apache.asterix.external.input.InputParser. </td>
+</tr>
+
+<tr class="b">
+
+<td> hive-serde </td>
+
+<td> The Hive serde is used to deserialize HDFS records if format is binary and the parser is hive-parser. Use a fully qualified name of a class implementation of org.apache.hadoop.hive.serde2.SerDe. </td>
+</tr>
+
+<tr class="a">
+
+<td> local-socket-path </td>
+
+<td> The UNIX domain socket path if local short-circuit reads are enabled in the HDFS instance</td>
+</tr>
+</table>
+<p><i>Difference between ‘input-format’ and ‘format’</i></p>
+<p><i>input-format</i>: Files stored under HDFS have an associated storage format. For example, TextInputFormat represents plain text files. SequenceFileInputFormat indicates binary compressed files. RCFileInputFormat corresponds to records stored in a record columnar fashion. The parameter ‘input-format’ is used to distinguish between these and other HDFS input formats.</p>
+<p><i>format</i>: The parameter ‘format’ refers to the type of the data contained in the file. For example, data contained in a file could be in json or ADM format, could be in delimited-text with fields separated by a delimiting character or could be in binary format.</p>
+<p>As an example. consider the <a href="../data/lineitem.tbl">data file</a>. The file is a text file with each line representing a record. The fields in each record are separated by the ‘|’ character.</p>
+<p>We assume the HDFS URL to be <a class="externalLink" href="hdfs://localhost:54310">hdfs://localhost:54310</a>. We further assume that the example data file is copied to HDFS at a path denoted by “/asterix/Lineitem.tbl”.</p>
+<p>The complete set of parameters for our example file are as follows. ((“hdfs”=“hdfs://localhost:54310”,(“path”=“/asterix/Lineitem.tbl”),(“input-format”=“text- input-format”),(“format”=“delimited-text”),(“delimiter”=“|”))</p></div>
+<div class="section">
+<h4><a name="Using_the_Hive_Parser"></a>Using the Hive Parser</h4>
+<p>if a user wants to create an external dataset that uses hive-parser to parse HDFS records, it is important that the datatype associated with the dataset matches the actual data in the Hive table for the correct initialization of the Hive SerDe. Here is the conversion from the supported Hive data types to AsterixDB data types:</p>
+
+<table border="0" class="table table-striped">
+
+<tr class="a">
+
+<td> Hive </td>
+
+<td> AsterixDB </td>
+</tr>
+
+<tr class="b">
+
+<td>BOOLEAN</td>
+
+<td>Boolean</td>
+</tr>
+
+<tr class="a">
+
+<td>BYTE(TINY INT)</td>
+
+<td>Int8</td>
+</tr>
+
+<tr class="b">
+
+<td>DOUBLE</td>
+
+<td>Double</td>
+</tr>
+
+<tr class="a">
+
+<td>FLOAT</td>
+
+<td>Float</td>
+</tr>
+
+<tr class="b">
+
+<td>INT</td>
+
+<td>Int32</td>
+</tr>
+
+<tr class="a">
+
+<td>LONG(BIG INT)</td>
+
+<td>Int64</td>
+</tr>
+
+<tr class="b">
+
+<td>SHORT(SMALL INT)</td>
+
+<td>Int16</td>
+</tr>
+
+<tr class="a">
+
+<td>STRING</td>
+
+<td>String</td>
+</tr>
+
+<tr class="b">
+
+<td>TIMESTAMP</td>
+
+<td>Datetime</td>
+</tr>
+
+<tr class="a">
+
+<td>DATE</td>
+
+<td>Date</td>
+</tr>
+
+<tr class="b">
+
+<td>STRUCT</td>
+
+<td>Nested Record</td>
+</tr>
+
+<tr class="a">
+
+<td>LIST</td>
+
+<td>OrderedList or UnorderedList</td>
+</tr>
+</table></div>
+<div class="section">
+<h4><a name="Examples_of_dataset_definitions_for_external_datasets"></a>Examples of dataset definitions for external datasets</h4>
+<p><i>Example 1</i>: We can modify the create external dataset statement as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre> create external dataset Lineitem('LineitemType)
+ using hdfs(("hdfs"="hdfs://localhost:54310"),("path"="/asterix/Lineitem.tbl"),("input-format"="text- input-format"),("format"="delimited-text"),("delimiter"="|"));
+</pre></div></div>
+<p><i>Example 2</i>: Here, we create an external dataset of lineitem records stored in sequence files that has content in ADM format:</p>
+
+<div class="source">
+<div class="source">
+<pre> create external dataset Lineitem('LineitemType)
+ using hdfs(("hdfs"="hdfs://localhost:54310"),("path"="/asterix/SequenceLineitem.tbl"),("input- format"="sequence-input-format"),("format"="adm"));
+</pre></div></div>
+<p><i>Example 3</i>: Here, we create an external dataset of lineitem records stored in record-columnar files that has content in binary format parsed using hive-parser with hive ColumnarSerde:</p>
+
+<div class="source">
+<div class="source">
+<pre> create external dataset Lineitem('LineitemType)
+ using hdfs(("hdfs"="hdfs://localhost:54310"),("path"="/asterix/RCLineitem.tbl"),("input-format"="rc-input-format"),("format"="binary"),("parser"="hive-parser"),("hive- serde"="org.apache.hadoop.hive.serde2.columnar.ColumnarSerde"));
+</pre></div></div></div></div></div>
+<div class="section">
+<h2><a name="Writing_Queries_against_an_External_Dataset_Back_to_TOC"></a><a name="WritingQueriesAgainstAnExternalDataset" id="WritingQueriesAgainstAnExternalDataset">Writing Queries against an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>You may write AQL queries against an external dataset in exactly the same way that queries are written against internal datasets. The following is an example of an AQL query that applies a filter and returns an ordered result.</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse ExternalFileDemo;
+
+ for $c in dataset('Lineitem')
+ where $c.l_orderkey <= 3
+ order by $c.l_orderkey, $c.l_linenumber
+ return $c
+</pre></div></div></div>
+<div class="section">
+<h2><a name="Building_Indexes_over_External_Datasets_Back_to_TOC"></a><a name="BuildingIndexesOverExternalDatasets" id="BuildingIndexesOverExternalDatasets">Building Indexes over External Datasets</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>AsterixDB supports building B-Tree and R-Tree indexes over static data stored in the Hadoop Distributed File System. To create an index, first create an external dataset over the data as follows</p>
+
+<div class="source">
+<div class="source">
+<pre> create external dataset Lineitem(LineitemType)
+ using hdfs(("hdfs"="hdfs://localhost:54310"),("path"="/asterix/Lineitem.tbl"),("input-format"="text-input- format"),("format"="delimited-text"),("delimiter"="|"));
+</pre></div></div>
+<p>You can then create a B-Tree index on this dataset instance as if the dataset was internally stored as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre> create index PartkeyIdx on Lineitem(l_partkey);
+</pre></div></div>
+<p>You could also create an R-Tree index as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre> create index IndexName on DatasetName(attribute-name) type rtree;
+</pre></div></div>
+<p>After building the indexes, the AsterixDB query compiler can use them to access the dataset and answer queries in a more cost effective manner. AsterixDB can read all HDFS input formats, but indexes over external datasets can currently be built only for HDFS datasets with ‘text-input-format’, ‘sequence-input-format’ or ‘rc-input-format’.</p></div>
+<div class="section">
+<h2><a name="External_Data_Snapshots_Back_to_TOC"></a><a name="ExternalDataSnapshots" id="ExternalDataSnapshots">External Data Snapshots</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>An external data snapshot represents the status of a dataset’s files in HDFS at a point in time. Upon creating the first index over an external dataset, AsterixDB captures and stores a snapshot of the dataset in HDFS. Only records present at the snapshot capture time are indexed, and any additional indexes created afterwards will only contain data that was present at the snapshot capture time thus preserving consistency across all indexes of a dataset. To update all indexes of an external dataset and advance the snapshot time to be the present time, a user can use the refresh external dataset command as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre> refresh external dataset DatasetName;
+</pre></div></div>
+<p>After a refresh operation commits, all of the dataset’s indexes will reflect the status of the data as of the new snapshot capture time.</p></div>
+<div class="section">
+<h2><a name="Frequently_Asked_Questions_Back_to_TOC"></a><a name="FAQ" id="FAQ">Frequently Asked Questions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>Q. I added data to my dataset in HDFS, Will the dataset indexes in AsterixDB be updated automatically?</p>
+<p>A. No, you must use the refresh external dataset statement to make the indexes aware of any changes in the dataset files in HDFS.</p>
+<p>Q. Why doesn’t AsterixDB update external indexes automatically?</p>
+<p>A. Since external data is managed by other users/systems with mechanisms that are system dependent, AsterixDB has no way of knowing exactly when data is added or deleted in HDFS, so the responsibility of refreshing indexes are left to the user. A user can use internal datasets for which AsterixDB manages the data and its indexes.</p>
+<p>Q. I created an index over an external dataset and then added some data to my HDFS dataset. Will a query that uses the index return different results from a query that doesn’t use the index?</p>
+<p>A. No, queries’ results are access path independent and the stored snapshot is used to determines which data are going to be included when processing queries.</p>
+<p>Q. I created an index over an external dataset and then deleted some of my dataset’s files in HDFS, Will indexed data access still return the records in deleted files?</p>
+<p>A. No. When AsterixDB accesses external data, with or without the use of indexes, it only access files present in the file system at runtime.</p>
+<p>Q. I submitted a refresh command on a an external dataset and a failure occurred, What has happened to my indexes?</p>
+<p>A. External Indexes Refreshes are treated as a single transaction. In case of a failure, a rollback occurs and indexes are restored to their previous state. An error message with the cause of failure is returned to the user.</p>
+<p>Q. I was trying to refresh an external dataset while some queries were accessing the data using index access method. Will the queries be affected by the refresh operation?</p>
+<p>A. Queries have access to external dataset indexes state at the time where the queries are submitted. A query that was submitted before a refresh commits will only access data under the snapshot taken before the refresh; queries that are submitted after the refresh commits will access data under the snapshot taken after the refresh.</p>
+<p>Q. What happens when I try to create an additional index while a refresh operation is in progress or vice versa?</p>
+<p>A. The create index operation will wait until the refresh commits or aborts and then the index will be built according to the external data snapshot at the end of the refresh operation. Creating indexes and refreshing datasets are mutually exclusive operations and will not be run in parallel. Multiple indexes can be created in parallel, but not multiple refresh operations.</p></div>
+ </div>
+ </div>
+ </div>
+
+ <hr/>
+
+ <footer>
+ <div class="container-fluid">
+ <div class="row span12">Copyright © 2015
+ <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+ All Rights Reserved.
+
+ </div>
+
+ <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+ feather logo, and the Apache AsterixDB project logo are either
+ registered trademarks or trademarks of The Apache Software
+ Foundation in the United States and other countries.
+ All other marks mentioned may be trademarks or registered
+ trademarks of their respective owners.</div>
+
+
+ </div>
+ </footer>
+ </body>
+</html>
diff --git a/docs/0.8.7-incubating/aql/filters.html b/docs/0.8.7-incubating/aql/filters.html
new file mode 100644
index 0000000..1347e30
--- /dev/null
+++ b/docs/0.8.7-incubating/aql/filters.html
@@ -0,0 +1,272 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2015-11-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+ <head>
+ <meta charset="UTF-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+ <meta name="Date-Revision-yyyymmdd" content="20151124" />
+ <meta http-equiv="Content-Language" content="en" />
+ <title>AsterixDB – Filter-Based LSM Index Acceleration</title>
+ <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+ <link rel="stylesheet" href="../css/site.css" />
+ <link rel="stylesheet" href="../css/print.css" media="print" />
+
+
+ <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+
+
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');</script>
+
+ </head>
+ <body class="topBarDisabled">
+
+
+
+
+ <div class="container-fluid">
+ <div id="banner">
+ <div class="pull-left">
+ <a href="http://asterixdb.apache.org/" id="bannerLeft">
+ <img src="../images/asterixlogo.png" alt="AsterixDB"/>
+ </a>
+ </div>
+ <div class="pull-right"> </div>
+ <div class="clear"><hr/></div>
+ </div>
+
+ <div id="breadcrumbs">
+ <ul class="breadcrumb">
+
+
+ <li id="publishDate">Last Published: 2015-11-24</li>
+
+
+
+ <li id="projectVersion" class="pull-right">Version: 0.8.7-incubating</li>
+
+ <li class="divider pull-right">|</li>
+
+ <li class="pull-right"> <a href="../index.html" title="Documentation Home">
+ Documentation Home</a>
+ </li>
+
+ </ul>
+ </div>
+
+
+ <div class="row-fluid">
+ <div id="leftColumn" class="span3">
+ <div class="well sidebar-nav">
+
+
+ <ul class="nav nav-list">
+ <li class="nav-header">Documentation</li>
+
+ <li>
+
+ <a href="../install.html" title="Installing and Managing AsterixDB using Managix">
+ <i class="none"></i>
+ Installing and Managing AsterixDB using Managix</a>
+ </li>
+
+ <li>
+
+ <a href="../yarn.html" title="Deploying AsterixDB using YARN">
+ <i class="none"></i>
+ Deploying AsterixDB using YARN</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer.html" title="AsterixDB 101: An ADM and AQL Primer">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer-sql-like.html" title="AsterixDB 101: An ADM and AQL Primer (For SQL Fans)">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer (For SQL Fans)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/js-sdk.html" title="AsterixDB Javascript SDK">
+ <i class="none"></i>
+ AsterixDB Javascript SDK</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/datamodel.html" title="Asterix Data Model (ADM)">
+ <i class="none"></i>
+ Asterix Data Model (ADM)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/manual.html" title="Asterix Query Language (AQL)">
+ <i class="none"></i>
+ Asterix Query Language (AQL)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/functions.html" title="AQL Functions">
+ <i class="none"></i>
+ AQL Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/allens.html" title="AQL Allen's Relations Functions">
+ <i class="none"></i>
+ AQL Allen's Relations Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/similarity.html" title="AQL Support of Similarity Queries">
+ <i class="none"></i>
+ AQL Support of Similarity Queries</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/externaldata.html" title="Accessing External Data">
+ <i class="none"></i>
+ Accessing External Data</a>
+ </li>
+
+ <li>
+
+ <a href="../feeds/tutorial.html" title="Support for Data Ingestion in AsterixDB">
+ <i class="none"></i>
+ Support for Data Ingestion in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../udf.html" title="Support for User Defined Functions in AsterixDB">
+ <i class="none"></i>
+ Support for User Defined Functions in AsterixDB</a>
+ </li>
+
+ <li class="active">
+
+ <a href="#"><i class="none"></i>Filter-Based LSM Index Acceleration</a>
+ </li>
+
+ <li>
+
+ <a href="../api.html" title="HTTP API to AsterixDB">
+ <i class="none"></i>
+ HTTP API to AsterixDB</a>
+ </li>
+ </ul>
+
+
+
+ <hr class="divider" />
+
+ <div id="poweredBy">
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <a href="https://code.google.com/p/hyracks/" title="Hyracks" class="builtBy">
+ <img class="builtBy" alt="Hyracks" src="../images/hyrax_ts.png" />
+ </a>
+ </div>
+ </div>
+ </div>
+
+
+ <div id="bodyColumn" class="span9" >
+
+ <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements. See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership. The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License. You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied. See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>Filter-Based LSM Index Acceleration</h1>
+<div class="section">
+<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
+
+<ul>
+
+<li><a href="#Motivation">Motivation</a></li>
+
+<li><a href="#FiltersInAsterixDB">Filters in AsterixDB</a></li>
+
+<li><a href="#FiltersAndMergePolicies">Filters and Merge Policies</a></li>
+</ul></div>
+<div class="section">
+<h2><a name="Motivation_Back_to_TOC"></a><a name="Motivation" id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>Traditional relational databases usually employ conventional index structures such as B+ trees due to their low read latency. However, such traditional index structures use in-place writes to perform updates, resulting in costly random writes to disk. Today’s emerging applications often involve insert-intensive workloads for which the cost of random writes prohibits efficient ingestion of data. Consequently, popular NoSQL systems such as Cassandra, HBase, LevelDB, BigTable, etc. have adopted Log-Structured Merge (LSM) Trees as their storage structure. LSM-trees avoids the cost of random writes by batching updates into a component of the index that resides in main memory – an <i>in-memory component</i>. When the space occupancy of the in-memory component exceeds a specified threshold, its entries are <i>flushed</i> to disk forming a new component – a <i>disk component</i>. As disk components accumulate on disk, they are periodically merged together subject to a <i>merge policy</i> that decides when and what to merge. The benefit of the LSM-trees comes at the cost of possibly sacrificing read efficiency, but, it has been shown in previous studies that these inefficiencies can be mostly mitigated.</p>
+<p>AsterixDB has also embraced LSM-trees, not just by using them as primary indexes, but also by using the same LSM-ification technique for all of its secondary index structures. In particular, AsterixDB adopted a generic framework for converting a class of indexes (that includes conventional B+ trees, R trees, and inverted indexes) into LSM-based secondary indexes, allowing higher data ingestion rates. In fact, for certain index structures, our results have shown that using an LSM-based version of an index can be made to significantly outperform its conventional counterpart for <i>both</i> ingestion and query speed (an example of such an index being the R-tree for spatial data).</p>
+<p>Since an LSM-based index naturally partitions data into multiple disk components, it is possible, when answering certain queries, to exploit partitioning to only access some components and safely filter out the remaining components, thus reducing query times. For instance, referring to our <a href="primer.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB">TinySocial</a> example, suppose a user always retrieves tweets from the <tt>TweetMessages</tt> dataset based on the <tt>send-time</tt> field (e.g., tweets posted in the last 24 hours). Since there is not a secondary index on the <tt>send-time</tt> field, the only available option for AsterixDB would be to scan the whole <tt>TweetMessages</tt> dataset and then apply the predicate as a post-processing step. However, if disk components of the primary index were tagged with the minimum and maximum timestamp values of the records they contain, we could utilize the tagged information to directly access the primary index and prune components that do not match the query predicate. Thus, we could save substantial cost by avoiding scanning the whole dataset and only access the relevant components. We simply call such tagging information that are associated with components, filters. (Note that even if there were a secondary index on <tt>send-time</tt> field, using filters could save substantial cost by avoiding accessing the secondary index, followed by probing the primary index for every fetched entry.) Moreover, the same filtering technique can also be used with any secondary LSM index (e.g., an LSM R-tree), in case the query contains multiple predicates (e.g., spatial and temporal predicates), to obtain similar pruning power.</p></div>
+<div class="section">
+<h2><a name="Filters_in_AsterixDB_Back_to_TOC"></a><a name="FiltersInAsterixDB" id="FiltersInAsterixDB">Filters in AsterixDB</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>We have added support for LSM-based filters to all of AsterixDB’s index types. To enable the use of filters, the user must specify the filter’s key when creating a dataset, as shown below:</p>
+<div class="section">
+<div class="section">
+<h4><a name="Creating_a_Dataset_with_a_Filter"></a>Creating a Dataset with a Filter</h4>
+
+<div class="source">
+<div class="source">
+<pre> create dataset Tweets(TweetType) primary key tweetid with filter on send-time;
+</pre></div></div>
+<p>Filters can be created on any totally ordered datatype (i.e., any field that can be indexed using a B+ -tree), such as integers, doubles, floats, UUIDs, datetimes, etc.</p>
+<p>When a dataset with a filter is created, the name of the filter’s key field is persisted in the <tt>Metadata.Dataset</tt> dataset (which is the metadata dataset that stores the details of each dataset in an AsterixDB instance) so that DML operations against the dataset can recognize the existence of filters and can update them or utilize them accordingly. Creating a dataset with a filter in AsterixDB implies that the primary and all secondary indexes of that dataset will maintain filters on their disk components. Once a filtered dataset is created, the user can use the dataset normally (just like any other dataset). AsterixDB will automatically maintain the filters and will leverage them to efficiently answer queries whenever possible (i.e., when a query has predicates on the filter’s key).</p></div></div></div>
+<div class="section">
+<h2><a name="Filters_and_Merge_Policies_Back_to_TOC"></a><a name="FiltersAndMergePolicies" id="FiltersAndMergePolicies">Filters and Merge Policies</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>The AsterixDB default merge policy, the prefix merge policy, relies on component sizes and the number of components to decide which components to merge. This merge policy has proven to provide excellent performance for both ingestion and queries. However, when evaluating our filtering solution with the prefix policy, we observed a behavior that can reduce filter effectiveness. In particular, we noticed that under the prefix merge policy, the disk components of a secondary index tend to be constantly merged into a single component. This is because the prefix policy relies on a single size parameter for all of the indexes of a dataset. This parameter is typically chosen based on the sizes of the disk components of the primary index, which tend to be much larger than the sizes of the secondary indexes’ disk components. This difference caused the prefix merge policy to behave similarly to the constant merge policy (i.e., relatively poorly) when applied to secondary indexes in the sense that the secondary indexes are constantly merged into a single disk component. Consequently, the effectiveness of filters on secondary indexes was greatly reduced under the prefix-merge policy, but they were still effective when probing the primary index. Based on this behavior, we developed a new merge policy, an improved version of the prefix policy, called the correlated-prefix policy. The basic idea of this policy is that it delegates the decision of merging the disk components of all the indexes in a dataset to the primary index. When the policy decides that the primary index needs to be merged (using the same decision criteria as for the prefix policy), then it will issue successive merge requests to the I/O scheduler on behalf of all other indexes associated with the same dataset. The end result is that secondary indexes will always have the same number of disk components as their primary index under the correlated-prefix merge policy. This has improved query performance, since disk components of secondary indexes now have a much better chance of being pruned.</p></div>
+ </div>
+ </div>
+ </div>
+
+ <hr/>
+
+ <footer>
+ <div class="container-fluid">
+ <div class="row span12">Copyright © 2015
+ <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+ All Rights Reserved.
+
+ </div>
+
+ <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+ feather logo, and the Apache AsterixDB project logo are either
+ registered trademarks or trademarks of The Apache Software
+ Foundation in the United States and other countries.
+ All other marks mentioned may be trademarks or registered
+ trademarks of their respective owners.</div>
+
+
+ </div>
+ </footer>
+ </body>
+</html>
diff --git a/docs/0.8.7-incubating/aql/functions.html b/docs/0.8.7-incubating/aql/functions.html
new file mode 100644
index 0000000..28d5fbe
--- /dev/null
+++ b/docs/0.8.7-incubating/aql/functions.html
@@ -0,0 +1,5111 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2015-11-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+ <head>
+ <meta charset="UTF-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+ <meta name="Date-Revision-yyyymmdd" content="20151124" />
+ <meta http-equiv="Content-Language" content="en" />
+ <title>AsterixDB – Asterix: Using Functions</title>
+ <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+ <link rel="stylesheet" href="../css/site.css" />
+ <link rel="stylesheet" href="../css/print.css" media="print" />
+
+
+ <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+
+
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');</script>
+
+ </head>
+ <body class="topBarDisabled">
+
+
+
+
+ <div class="container-fluid">
+ <div id="banner">
+ <div class="pull-left">
+ <a href="http://asterixdb.apache.org/" id="bannerLeft">
+ <img src="../images/asterixlogo.png" alt="AsterixDB"/>
+ </a>
+ </div>
+ <div class="pull-right"> </div>
+ <div class="clear"><hr/></div>
+ </div>
+
+ <div id="breadcrumbs">
+ <ul class="breadcrumb">
+
+
+ <li id="publishDate">Last Published: 2015-11-24</li>
+
+
+
+ <li id="projectVersion" class="pull-right">Version: 0.8.7-incubating</li>
+
+ <li class="divider pull-right">|</li>
+
+ <li class="pull-right"> <a href="../index.html" title="Documentation Home">
+ Documentation Home</a>
+ </li>
+
+ </ul>
+ </div>
+
+
+ <div class="row-fluid">
+ <div id="leftColumn" class="span3">
+ <div class="well sidebar-nav">
+
+
+ <ul class="nav nav-list">
+ <li class="nav-header">Documentation</li>
+
+ <li>
+
+ <a href="../install.html" title="Installing and Managing AsterixDB using Managix">
+ <i class="none"></i>
+ Installing and Managing AsterixDB using Managix</a>
+ </li>
+
+ <li>
+
+ <a href="../yarn.html" title="Deploying AsterixDB using YARN">
+ <i class="none"></i>
+ Deploying AsterixDB using YARN</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer.html" title="AsterixDB 101: An ADM and AQL Primer">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer-sql-like.html" title="AsterixDB 101: An ADM and AQL Primer (For SQL Fans)">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer (For SQL Fans)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/js-sdk.html" title="AsterixDB Javascript SDK">
+ <i class="none"></i>
+ AsterixDB Javascript SDK</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/datamodel.html" title="Asterix Data Model (ADM)">
+ <i class="none"></i>
+ Asterix Data Model (ADM)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/manual.html" title="Asterix Query Language (AQL)">
+ <i class="none"></i>
+ Asterix Query Language (AQL)</a>
+ </li>
+
+ <li class="active">
+
+ <a href="#"><i class="none"></i>AQL Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/allens.html" title="AQL Allen's Relations Functions">
+ <i class="none"></i>
+ AQL Allen's Relations Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/similarity.html" title="AQL Support of Similarity Queries">
+ <i class="none"></i>
+ AQL Support of Similarity Queries</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/externaldata.html" title="Accessing External Data">
+ <i class="none"></i>
+ Accessing External Data</a>
+ </li>
+
+ <li>
+
+ <a href="../feeds/tutorial.html" title="Support for Data Ingestion in AsterixDB">
+ <i class="none"></i>
+ Support for Data Ingestion in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../udf.html" title="Support for User Defined Functions in AsterixDB">
+ <i class="none"></i>
+ Support for User Defined Functions in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
+ <i class="none"></i>
+ Filter-Based LSM Index Acceleration</a>
+ </li>
+
+ <li>
+
+ <a href="../api.html" title="HTTP API to AsterixDB">
+ <i class="none"></i>
+ HTTP API to AsterixDB</a>
+ </li>
+ </ul>
+
+
+
+ <hr class="divider" />
+
+ <div id="poweredBy">
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <a href="https://code.google.com/p/hyracks/" title="Hyracks" class="builtBy">
+ <img class="builtBy" alt="Hyracks" src="../images/hyrax_ts.png" />
+ </a>
+ </div>
+ </div>
+ </div>
+
+
+ <div id="bodyColumn" class="span9" >
+
+ <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements. See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership. The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License. You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied. See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>Asterix: Using Functions</h1>
+<div class="section">
+<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
+
+<ul>
+
+<li><a href="#NumericFunctions">Numeric Functions</a></li>
+
+<li><a href="#StringFunctions">String Functions</a></li>
+
+<li><a href="#AggregateFunctions">Aggregate Functions</a></li>
+
+<li><a href="#SpatialFunctions">Spatial Functions</a></li>
+
+<li><a href="#SimilarityFunctions">Similarity Functions</a></li>
+
+<li><a href="#TokenizingFunctions">Tokenizing Functions</a></li>
+
+<li><a href="#TemporalFunctions">Temporal Functions</a></li>
+
+<li><a href="#RecordFunctions">Record Functions</a></li>
+
+<li><a href="#OtherFunctions">Other Functions</a></li>
+</ul>
+<p>Asterix provides various classes of functions to support operations on numeric, string, spatial, and temporal data. This document explains how to use these functions.</p></div>
+<div class="section">
+<h2><a name="Numeric_Functions_Back_to_TOC"></a><a name="NumericFunctions" id="NumericFunctions">Numeric Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="abs"></a>abs</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>abs(numeric_expression)
+</pre></div></div></li>
+
+<li>
+<p>Computes the absolute value of the argument.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt>/<tt>float</tt>/<tt>double</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>The absolute value of the argument with the same type as the input argument, or <tt>null</tt> if the argument is a <tt>null</tt> value.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := abs(2013)
+let $v2 := abs(-4036)
+let $v3 := abs(0)
+let $v4 := abs(float("-2013.5"))
+let $v5 := abs(double("-2013.593823748327284"))
+return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4, "v5": $v5 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": 2013, "v2": 4036, "v3": 0, "v4": 2013.5f, "v5": 2013.5938237483274d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="ceiling"></a>ceiling</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>ceiling(numeric_expression)
+</pre></div></div></li>
+
+<li>
+<p>Computes the smallest (closest to negative infinity) number with no fractional part that is not less than the value of the argument. If the argument is already equal to mathematical integer, then the result is the same as the argument.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt>/<tt>float</tt>/<tt>double</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>The ceiling value for the given number in the same type as the input argument, or <tt>null</tt> if the input is <tt>null</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := ceiling(2013)
+let $v2 := ceiling(-4036)
+let $v3 := ceiling(0.3)
+let $v4 := ceiling(float("-2013.2"))
+let $v5 := ceiling(double("-2013.893823748327284"))
+return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4, "v5": $v5 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": 2013, "v2": -4036, "v3": 1.0d, "v4": -2013.0f, "v5": -2013.0d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="floor"></a>floor</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>floor(numeric_expression)
+</pre></div></div></li>
+
+<li>
+<p>Computes the largest (closest to positive infinity) number with no fractional part that is not greater than the value. If the argument is already equal to mathematical integer, then the result is the same as the argument.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt>/<tt>float</tt>/<tt>double</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>The floor value for the given number in the same type as the input argument, or <tt>null</tt> if the input is <tt>null</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := floor(2013)
+let $v2 := floor(-4036)
+let $v3 := floor(0.8)
+let $v4 := floor(float("-2013.2"))
+let $v5 := floor(double("-2013.893823748327284"))
+return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4, "v5": $v5 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": 2013, "v2": -4036, "v3": 0.0d, "v4": -2014.0f, "v5": -2014.0d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="round"></a>round</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>round(numeric_expression)
+</pre></div></div></li>
+
+<li>
+<p>Computes the number with no fractional part that is closest (and also closest to positive infinity) to the argument.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt>/<tt>float</tt>/<tt>double</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>The rounded value for the given number in the same type as the input argument, or <tt>null</tt> if the input is <tt>null</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := round(2013)
+let $v2 := round(-4036)
+let $v3 := round(0.8)
+let $v4 := round(float("-2013.256"))
+let $v5 := round(double("-2013.893823748327284"))
+return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4, "v5": $v5 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": 2013, "v2": -4036, "v3": 1.0d, "v4": -2013.0f, "v5": -2014.0d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="round-half-to-even"></a>round-half-to-even</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>round-half-to-even(numeric_expression, [precision])
+</pre></div></div></li>
+
+<li>
+<p>Computes the closest numeric value to <tt>numeric_expression</tt> that is a multiple of ten to the power of minus <tt>precision</tt>. <tt>precision</tt> is optional and by default value <tt>0</tt> is used.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt>/<tt>float</tt>/<tt>double</tt> value.</li>
+
+<li><tt>precision</tt>: An optional integer field representing the number of digits in the fraction of the the result</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>The rounded value for the given number in the same type as the input argument, or <tt>null</tt> if the input is <tt>null</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $v1 := round-half-to-even(2013)
+let $v2 := round-half-to-even(-4036)
+let $v3 := round-half-to-even(0.8)
+let $v4 := round-half-to-even(float("-2013.256"))
+let $v5 := round-half-to-even(double("-2013.893823748327284"))
+let $v6 := round-half-to-even(double("-2013.893823748327284"), 2)
+let $v7 := round-half-to-even(2013, 4)
+let $v8 := round-half-to-even(float("-2013.256"), 5)
+return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4, "v5": $v5, "v6": $v6, "v7": $v7, "v8": $v8 }
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "v1": 2013, "v2": -4036, "v3": 1.0d, "v4": -2013.0f, "v5": -2014.0d, "v6": -2013.89d, "v7": 2013, "v8": -2013.256f }
+</pre></div></div></li>
+</ul></div></div>
+<div class="section">
+<h2><a name="String_Functions_Back_to_TOC"></a><a name="StringFunctions" id="StringFunctions">String Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="string-to-codepoint"></a>string-to-codepoint</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>string-to-codepoint(string_expression)
+</pre></div></div></li>
+
+<li>
+<p>Converts the string <tt>string_expression</tt> to its code-based representation.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> that will be converted.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>OrderedList</tt> of the code points for the string <tt>string_expression</tt>.</li>
+ </ul></li>
+</ul></div>
+<div class="section">
+<h3><a name="codepoint-to-string"></a>codepoint-to-string</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>codepoint-to-string(list_expression)
+</pre></div></div></li>
+
+<li>
+<p>Converts the ordered code-based representation <tt>list_expression</tt> to the corresponding string.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>list_expression</tt> : An <tt>OrderedList</tt> of code-points.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>string</tt> representation of <tt>list_expression</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $s := "Hello ASTERIX!"
+let $l := string-to-codepoint($s)
+let $ss := codepoint-to-string($l)
+return {"codes": $l, "string": $ss}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "codes": [ 72, 101, 108, 108, 111, 32, 65, 83, 84, 69, 82, 73, 88, 33 ], "string": "Hello ASTERIX!" }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="contains"></a>contains</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>contains(string_expression, substring_to_contain)
+</pre></div></div></li>
+
+<li>
+<p>Checks whether the string <tt>string_expression</tt> contains the string <tt>substring_to_contain</tt></p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> that might contain the given substring.</li>
+
+<li><tt>substring_to_contain</tt> : A target <tt>string</tt> that might be contained.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>boolean</tt> value, <tt>true</tt> if <tt>string_expression</tt> contains <tt>substring_to_contain</tt>, and <tt>false</tt> otherwise.</li>
+ </ul></li>
+
+<li>Note: An <a href="similarity.html#UsingIndexesToSupportSimilarityQueries">n-gram index</a> can be utilized for this function.</li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('FacebookMessages')
+where contains($i.message, "phone")
+return {"mid": $i.message-id, "message": $i.message}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "mid": 2, "message": " dislike iphone its touch-screen is horrible" }
+{ "mid": 13, "message": " dislike iphone the voice-command is bad:(" }
+{ "mid": 15, "message": " like iphone the voicemail-service is awesome" }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="like"></a>like</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>like(string_expression, string_pattern)
+</pre></div></div></li>
+
+<li>
+<p>Checks whether the string <tt>string_expression</tt> contains the string pattern <tt>string_pattern</tt>. Compared to the <tt>contains</tt> function, the <tt>like</tt> function also supports regular expressions.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> that might contain the pattern or <tt>null</tt>.</li>
+
+<li><tt>string_pattern</tt> : A pattern <tt>string</tt> that might be contained or <tt>null</tt>.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>boolean</tt> value, <tt>true</tt> if <tt>string_expression</tt> contains the pattern <tt>string_pattern</tt>, and <tt>false</tt> otherwise.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('FacebookMessages')
+where like($i.message, "%at&t%")
+return $i.message
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>" can't stand at&t the network is horrible:("
+" can't stand at&t its plan is terrible"
+" love at&t its 3G is good:)"
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="starts-with"></a>starts-with</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>starts-with(string_expression, substring_to_start_with)
+</pre></div></div></li>
+
+<li>
+<p>Checks whether the string <tt>string_expression</tt> starts with the string <tt>substring_to_start_with</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> that might start with the given string.</li>
+
+<li><tt>substring_to_start_with</tt> : A <tt>string</tt> that might be contained as the starting substring.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>boolean</tt>, returns <tt>true</tt> if <tt>string_expression</tt> starts with the string <tt>substring_to_start_with</tt>, and <tt>false</tt> otherwise.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('FacebookMessages')
+where starts-with($i.message, " like")
+return $i.message
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>" like samsung the plan is amazing"
+" like t-mobile its platform is mind-blowing"
+" like verizon the 3G is awesome:)"
+" like iphone the voicemail-service is awesome"
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="ends-with"></a>ends-with</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>ends-with(string_expression, substring_to_end_with)
+</pre></div></div></li>
+
+<li>
+<p>Checks whether the string <tt>string_expression</tt> ends with the string <tt>substring_to_end_with</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> that might end with the given string.</li>
+
+<li><tt>substring_to_end_with</tt> : A <tt>string</tt> that might be contained as the ending substring.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>boolean</tt>, returns <tt>true</tt> if <tt>string_expression</tt> ends with the string <tt>substring_to_end_with</tt>, and <tt>false</tt> otherwise.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('FacebookMessages')
+where ends-with($i.message, ":)")
+return $i.message
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>" love sprint its shortcut-menu is awesome:)"
+" like verizon the 3G is awesome:)"
+" love at&t its 3G is good:)"
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="string-concat"></a>string-concat</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>string-concat(list_expression)
+</pre></div></div></li>
+
+<li>
+<p>Concatenates a list of strings <tt>list_expression</tt> into a single string.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>list_expression</tt> : An <tt>OrderedList</tt> or <tt>UnorderedList</tt> of <tt>string</tt>s (could be <tt>null</tt>) to be concatenated.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>Returns the concatenated <tt>string</tt> value.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $i := "ASTERIX"
+let $j := " "
+let $k := "ROCKS!"
+return string-concat([$i, $j, $k])
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>"ASTERIX ROCKS!"
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="string-join"></a>string-join</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>string-join(list_expression, string_expression)
+</pre></div></div></li>
+
+<li>
+<p>Joins a list of strings <tt>list_expression</tt> with the given separator <tt>string_expression</tt> into a single string.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>list_expression</tt> : An <tt>OrderedList</tt> or <tt>UnorderedList</tt> of strings (could be <tt>null</tt>) to be joined.</li>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> as the separator.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>Returns the joined <tt>String</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $i := ["ASTERIX", "ROCKS~"]
+return string-join($i, "!! ")
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>"ASTERIX!! ROCKS~"
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="lowercase"></a>lowercase</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>lowercase(string_expression)
+</pre></div></div></li>
+
+<li>
+<p>Converts a given string <tt>string_expression</tt> to its lowercase form.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> to be converted.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>Returns a <tt>string</tt> as the lowercase form of the given <tt>string_expression</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $i := "ASTERIX"
+return lowercase($i)
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>asterix
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="uppercase"></a>uppercase</h3>
+
+<ul>
+
+<li>Syntax:</li>
+</ul>
+<p>uppercase(string_expression)</p>
+
+<ul>
+
+<li>Converts a given string <tt>string_expression</tt> to its uppercase form.</li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> to be converted.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>Returns a <tt>string</tt> as the uppercase form of the given <tt>string_expression</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $i := "asterix"
+return uppercase($i)
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>ASTERIX
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="uppercase"></a>uppercase</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>uppercase(string_expression)
+</pre></div></div></li>
+
+<li>
+<p>Converts a given string <tt>string_expression</tt> to its uppercase form.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> to be converted.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>Returns a <tt>string</tt> as the uppercase form of the given <tt>string_expression</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $i := "asterix"
+return uppercase($i)
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>ASTERIX
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="matches"></a>matches</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>matches(string_expression, string_pattern)
+</pre></div></div></li>
+
+<li>
+<p>Checks whether the strings <tt>string_expression</tt> matches the given pattern <tt>string_pattern</tt> (A Java regular expression pattern).</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> that might contain the pattern.</li>
+
+<li><tt>string_pattern</tt> : A pattern <tt>string</tt> to be matched.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>boolean</tt>, returns <tt>true</tt> if <tt>string_expression</tt> matches the pattern <tt>string_pattern</tt>, and <tt>false</tt> otherwise.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('FacebookMessages')
+where matches($i.message, "dislike iphone")
+return $i.message
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>" dislike iphone its touch-screen is horrible"
+" dislike iphone the voice-command is bad:("
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="replace"></a>replace</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>replace(string_expression, string_pattern, string_replacement[, string_flags])
+</pre></div></div></li>
+
+<li>
+<p>Checks whether the string <tt>string_expression</tt> matches the given pattern <tt>string_pattern</tt>, and replace the matched pattern <tt>string_pattern</tt> with the new pattern <tt>string_replacement</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> that might contain the pattern.</li>
+
+<li><tt>string_pattern</tt> : A pattern <tt>string</tt> to be matched.</li>
+
+<li><tt>string_replacement</tt> : A pattern <tt>string</tt> to be used as the replacement.</li>
+
+<li><tt>string_flag</tt> : (Optional) A <tt>string</tt> with flags to be used during replace.</li>
+
+<li>The following modes are enabled with these flags: dotall (s), multiline (m), case-insenitive (i), and comments and whitespace (x).</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>Returns a <tt>string</tt> that is obtained after the replacements.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('FacebookMessages')
+where matches($i.message, " like iphone")
+return replace($i.message, " like iphone", "like android")
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>"like android the voicemail-service is awesome"
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="string-length"></a>string-length</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>string-length(string_expression)
+</pre></div></div></li>
+
+<li>
+<p>Returns the length of the string <tt>string_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> or <tt>null</tt> that represents the string to be checked.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>int64</tt> that represents the length of <tt>string_expression</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('FacebookMessages')
+return {"mid": $i.message-id, "message-len": string-length($i.message)}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "mid": 1, "message-len": 43 }
+{ "mid": 2, "message-len": 44 }
+{ "mid": 3, "message-len": 33 }
+{ "mid": 4, "message-len": 43 }
+{ "mid": 5, "message-len": 46 }
+{ "mid": 6, "message-len": 43 }
+{ "mid": 7, "message-len": 37 }
+{ "mid": 8, "message-len": 33 }
+{ "mid": 9, "message-len": 34 }
+{ "mid": 10, "message-len": 50 }
+{ "mid": 11, "message-len": 38 }
+{ "mid": 12, "message-len": 52 }
+{ "mid": 13, "message-len": 42 }
+{ "mid": 14, "message-len": 27 }
+{ "mid": 15, "message-len": 45 }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="substring"></a>substring</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>substring(string_expression, offset[, length])
+</pre></div></div></li>
+
+<li>
+<p>Returns the substring from the given string <tt>string_expression</tt> based on the given start offset <tt>offset</tt> with the optional <tt>length</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> to be extracted.</li>
+
+<li><tt>offset</tt> : An <tt>int64</tt> as the starting offset of the substring in <tt>string_expression</tt>.</li>
+
+<li><tt>length</tt> : (Optional) An <tt>int64</tt> as the length of the substring.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>string</tt> that represents the substring.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('FacebookMessages')
+where string-length($i.message) > 50
+return substring($i.message, 50)
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>"G:("
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="substring-before"></a>substring-before</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>substring-before(string_expression, string_pattern)
+</pre></div></div></li>
+
+<li>
+<p>Returns the substring from the given string <tt>string_expression</tt> before the given pattern <tt>string_pattern</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> to be extracted.</li>
+
+<li><tt>string_pattern</tt> : A <tt>string</tt> pattern to be searched.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>string</tt> that represents the substring.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('FacebookMessages')
+where contains($i.message, "iphone")
+return substring-before($i.message, "iphone")
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>" dislike "
+" dislike "
+" like "
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="substring-after"></a>substring-after</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>substring-after(string_expression, string_pattern)
+</pre></div></div></li>
+
+<li>
+<p>Returns the substring from the given string <tt>string_expression</tt> after the given pattern <tt>string_pattern</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> to be extracted.</li>
+
+<li><tt>string_pattern</tt> : A <tt>string</tt> pattern to be searched.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>string</tt> that represents the substring.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('FacebookMessages')
+where contains($i.message, "iphone")
+return substring-after($i.message, "iphone")
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>" its touch-screen is horrible"
+" the voice-command is bad:("
+" the voicemail-service is awesome"
+</pre></div></div></li>
+</ul></div></div>
+<div class="section">
+<h2><a name="Aggregate_Functions_Back_to_TOC"></a><a name="AggregateFunctions" id="AggregateFunctions">Aggregate Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="count"></a>count</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>count(list)
+</pre></div></div></li>
+
+<li>
+<p>Gets the number of items in the given list.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>list</tt>: An <tt>orderedList</tt> or <tt>unorderedList</tt> containing the items to be counted, or a <tt>null</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>int64</tt> value representing the number of items in the given list. <tt>0i64</tt> is returned if the input is <tt>null</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $l1 := ['hello', 'world', 1, 2, 3]
+let $l2 := for $i in dataset TwitterUsers return $i
+return {"count1": count($l1), "count2": count($l2)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "count1": 5i64, "count2": 4i64 }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="avg"></a>avg</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>avg(num_list)
+</pre></div></div></li>
+
+<li>
+<p>Gets the average value of the items in the given list.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>num_list</tt>: An <tt>orderedList</tt> or <tt>unorderedList</tt> containing numeric or null values, or a <tt>null</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>double</tt> value representing the average of the numbers in the given list. <tt>null</tt> is returned if the input is <tt>null</tt>, or the input list contains <tt>null</tt>. Non-numeric types in the input list will cause an error.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $l := for $i in dataset TwitterUsers return $i.friends_count
+return {"avg_friend_count": avg($l)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "avg_friend_count": 191.5d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="sum"></a>sum</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>sum(num_list)
+</pre></div></div></li>
+
+<li>
+<p>Gets the sum of the items in the given list.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>num_list</tt>: An <tt>orderedList</tt> or <tt>unorderedList</tt> containing numeric or null values, or a <tt>null</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>The sum of the numbers in the given list. The returning type is decided by the item type with the highest order in the numeric type promotion order (<tt>int8</tt>-> <tt>int16</tt>-><tt>int32</tt>-><tt>int64</tt>-><tt>float</tt>-><tt>double</tt>) among items. <tt>null</tt> is returned if the input is <tt>null</tt>, or the input list contains <tt>null</tt>. Non-numeric types in the input list will cause an error.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $l := for $i in dataset TwitterUsers return $i.friends_count
+return {"sum_friend_count": sum($l)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "sum_friend_count": 766 }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="minmax"></a>min/max</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>min(num_list), max(num_list)
+</pre></div></div></li>
+
+<li>
+<p>Gets the min/max value of numeric items in the given list.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>num_list</tt>: An <tt>orderedList</tt> or <tt>unorderedList</tt> containing the items to be compared, or a <tt>null</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>The min/max value of the given list. The returning type is decided by the item type with the highest order in the numeric type promotion order (<tt>int8</tt>-> <tt>int16</tt>-><tt>int32</tt>-><tt>int64</tt>-><tt>float</tt>-><tt>double</tt>) among items. <tt>null</tt> is returned if the input is <tt>null</tt>, or the input list contains <tt>null</tt>. Non-numeric types in the input list will cause an error.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $l := for $i in dataset TwitterUsers return $i. friends_count
+return {"min_friend_count": min($l), "max_friend_count": max($l)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "min_friend_count": 18, "max_friend_count": 445 }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="sql-count"></a>sql-count</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>sql-count(list)
+</pre></div></div></li>
+
+<li>
+<p>Gets the number of non-null items in the given list.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>list</tt>: An <tt>orderedList</tt> or <tt>unorderedList</tt> containing the items to be counted, or a <tt>null</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>int64</tt> value representing the number of non-null items in the given list. The value <tt>0i64</tt> is returned if the input is <tt>null</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p></li>
+</ul>
+
+<div class="source">
+<div class="source">
+<pre> let $l1 := ['hello', 'world', 1, 2, 3, null]
+ return {"count": sql-count($l1)}
+</pre></div></div>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "count": 5i64 }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="sql-avg"></a>sql-avg</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>sql-avg(num_list)
+</pre></div></div></li>
+
+<li>
+<p>Gets the average value of the non-null items in the given list.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>num_list</tt>: An <tt>orderedList</tt> or <tt>unorderedList</tt> containing numeric or null values, or a <tt>null</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>double</tt> value representing the average of the non-null numbers in the given list. The <tt>null</tt> value is returned if the input is <tt>null</tt>. Non-numeric types in the input list will cause an error.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $l := [1.2, 2.3, 3.4, 0, null]
+return {"avg": sql-avg($l)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "avg": 1.725d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="sql-sum"></a>sql-sum</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>sql-sum(num_list)
+</pre></div></div></li>
+
+<li>
+<p>Gets the sum of the non-null items in the given list.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>num_list</tt>: An <tt>orderedList</tt> or <tt>unorderedList</tt> containing numeric or null values, or a <tt>null</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>The sum of the non-null numbers in the given list. The returning type is decided by the item type with the highest order in the numeric type promotion order (<tt>int8</tt>-> <tt>int16</tt>-><tt>int32</tt>-><tt>int64</tt>-><tt>float</tt>-><tt>double</tt>) among items. The value <tt>null</tt> is returned if the input is <tt>null</tt>. Non-numeric types in the input list will cause an error.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $l := [1.2, 2.3, 3.4, 0, null]
+return {"sum": sql-sum($l)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "sum": 6.9d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="sql-minmax"></a>sql-min/max</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>sql-min(num_list), sql-max(num_list)
+</pre></div></div></li>
+
+<li>
+<p>Gets the min/max value of the non-null numeric items in the given list.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>num_list</tt>: An <tt>orderedList</tt> or <tt>unorderedList</tt> containing the items to be compared, or a <tt>null</tt> value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>The min/max value of the given list. The returning type is decided by the item type with the highest order in the numeric type promotion order (<tt>int8</tt>-> <tt>int16</tt>-><tt>int32</tt>-><tt>int64</tt>-><tt>float</tt>-><tt>double</tt>) among items. The value <tt>null</tt> is returned if the input is <tt>null</tt>. Non-numeric types in the input list will cause an error.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $l := [1.2, 2.3, 3.4, 0, null]
+return {"min": sql-min($l), "max": sql-max($l)}
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "min": 0.0d, "max": 3.4d }
+</pre></div></div></li>
+</ul></div></div>
+<div class="section">
+<h2><a name="Spatial_Functions_Back_to_TOC"></a><a name="SpatialFunctions" id="SpatialFunctions">Spatial Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="create-point"></a>create-point</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>create-point(x, y)
+</pre></div></div></li>
+
+<li>
+<p>Creates the primitive type <tt>point</tt> using an <tt>x</tt> and <tt>y</tt> value.</p></li>
+
+<li>Arguments:</li>
+
+<li><tt>x</tt> : A <tt>double</tt> that represents the x-coordinate.</li>
+
+<li><tt>y</tt> : A <tt>double</tt> that represents the y-coordinate.</li>
+
+<li>Return Value:</li>
+
+<li>A <tt>point</tt> representing the ordered pair (<tt>x</tt>, <tt>y</tt>).</li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $c := create-point(30.0,70.0)
+return {"point": $c}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "point": point("30.0,70.0") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="create-line"></a>create-line</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>create-line(point_expression1, point_expression2)
+</pre></div></div></li>
+
+<li>
+<p>Creates the primitive type <tt>line</tt> using <tt>point_expression1</tt> and <tt>point_expression2</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>point_expression1</tt> : A <tt>point</tt> that represents the start point of the line.</li>
+
+<li><tt>point_expression2</tt> : A <tt>point</tt> that represents the end point of the line.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A spatial <tt>line</tt> created using the points provided in <tt>point_expression1</tt> and <tt>point_expression2</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $c := create-line(create-point(30.0,70.0), create-point(50.0,90.0))
+return {"line": $c}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "line": line("30.0,70.0 50.0,90.0") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="create-rectangle"></a>create-rectangle</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>create-rectangle(point_expression1, point_expression2)
+</pre></div></div></li>
+
+<li>
+<p>Creates the primitive type <tt>rectangle</tt> using <tt>point_expression1</tt> and <tt>point_expression2</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>point_expression1</tt> : A <tt>point</tt> that represents the lower-left point of the rectangle.</li>
+
+<li><tt>point_expression2</tt> : A <tt>point</tt> that represents the upper-right point of the rectangle.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A spatial <tt>rectangle</tt> created using the points provided in <tt>point_expression1</tt> and <tt>point_expression2</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $c := create-rectangle(create-point(30.0,70.0), create-point(50.0,90.0))
+return {"rectangle": $c}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "rectangle": rectangle("30.0,70.0 50.0,90.0") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="create-circle"></a>create-circle</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>create-circle(point_expression, radius)
+</pre></div></div></li>
+
+<li>
+<p>Creates the primitive type <tt>circle</tt> using <tt>point_expression</tt> and <tt>radius</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>point_expression</tt> : A <tt>point</tt> that represents the center of the circle.</li>
+
+<li><tt>radius</tt> : A <tt>double</tt> that represents the radius of the circle.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A spatial <tt>circle</tt> created using the center point and the radius provided in <tt>point_expression</tt> and <tt>radius</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $c := create-circle(create-point(30.0,70.0), 5.0)
+return {"circle": $c}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "circle": circle("30.0,70.0 5.0") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="create-polygon"></a>create-polygon</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>create-polygon(list_expression)
+</pre></div></div></li>
+
+<li>
+<p>Creates the primitive type <tt>polygon</tt> using the double values provided in the argument <tt>list_expression</tt>. Each two consecutive double values represent a point starting from the first double value in the list. Note that at least six double values should be specified, meaning a total of three points.</p></li>
+
+<li>Arguments:</li>
+
+<li><tt>list_expression</tt> : An OrderedList of doubles representing the points of the polygon.</li>
+
+<li>Return Value:</li>
+
+<li>A <tt>polygon</tt>, represents a spatial simple polygon created using the points provided in <tt>list_expression</tt>.</li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $c := create-polygon([1.0,1.0,2.0,2.0,3.0,3.0,4.0,4.0])
+return {"polygon": $c}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "polygon": polygon("1.0,1.0 2.0,2.0 3.0,3.0 4.0,4.0") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="get-xget-y"></a>get-x/get-y</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-x(point_expression) or get-y(point_expression)
+</pre></div></div></li>
+
+<li>
+<p>Returns the x or y coordinates of a point <tt>point_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>point_expression</tt> : A <tt>point</tt>.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>double</tt> representing the x or y coordinates of the point <tt>point_expression</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $point := create-point(2.3,5.0)
+return {"x-coordinate": get-x($point), "y-coordinate": get-y($point)}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "x-coordinate": 2.3d, "y-coordinate": 5.0d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="get-points"></a>get-points</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-points(spatial_expression)
+</pre></div></div></li>
+
+<li>
+<p>Returns an ordered list of the points forming the spatial object <tt>spatial_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>spatial_expression</tt> : A <tt>point</tt>, <tt>line</tt>, <tt>rectangle</tt>, <tt>circle</tt>, or <tt>polygon</tt>.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>OrderedList</tt> of the points forming the spatial object <tt>spatial_expression</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $line := create-line(create-point(100.6,99.4), create-point(-72.0,-76.9))
+let $rectangle := create-rectangle(create-point(9.2,49.0), create-point(77.8,111.1))
+let $polygon := create-polygon([1.0,1.0,2.0,2.0,3.0,3.0,4.0,4.0])
+let $line_list := get-points($line)
+let $rectangle_list := get-points($rectangle)
+let $polygon_list := get-points($polygon)
+return {"line-first-point": $line_list[0], "line-second-point": $line_list[1], "rectangle-left-bottom-point": $rectangle_list[0], "rectangle-top-upper-point": $rectangle_list[1], "polygon-first-point": $polygon_list[0], "polygon-second-point": $polygon_list[1], "polygon-third-point": $polygon_list[2], "polygon-forth-point": $polygon_list[3]}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "line-first-point": point("100.6,99.4"), "line-second-point": point("-72.0,-76.9"), "rectangle-left-bottom-point": point("9.2,49.0"), "rectangle-top-upper-point": point("77.8,111.1"), "polygon-first-point": point("1.0,1.0"), "polygon-second-point": point("2.0,2.0"), "polygon-third-point": point("3.0,3.0"), "polygon-forth-point": point("4.0,4.0") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="get-centerget-radius"></a>get-center/get-radius</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-center(circle_expression) or get-radius(circle_expression)
+</pre></div></div></li>
+
+<li>
+<p>Returns the center and the radius of a circle <tt>circle_expression</tt>, respectively.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>circle_expression</tt> : A <tt>circle</tt>.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>point</tt> or <tt>double</tt>, represent the center or radius of the circle <tt>circle_expression</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $circle := create-circle(create-point(6.0,3.0), 1.0)
+return {"circle-radius": get-radius($circle), "circle-center": get-center($circle)}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "circle-radius": 1.0d, "circle-center": point("6.0,3.0") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="spatial-distance"></a>spatial-distance</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>spatial-distance(point_expression1, point_expression2)
+</pre></div></div></li>
+
+<li>
+<p>Returns the Euclidean distance between <tt>point_expression1</tt> and <tt>point_expression2</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>point_expression1</tt> : A <tt>point</tt>.</li>
+
+<li><tt>point_expression2</tt> : A <tt>point</tt>.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>double</tt> as the Euclidean distance between <tt>point_expression1</tt> and <tt>point_expression2</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $t in dataset('TweetMessages')
+let $d := spatial-distance($t.sender-location, create-point(30.0,70.0))
+return {"point": $t.sender-location, "distance": $d}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "point": point("47.44,80.65"), "distance": 20.434678857275934d }
+{ "point": point("29.15,76.53"), "distance": 6.585089217315132d }
+{ "point": point("37.59,68.42"), "distance": 7.752709203884797d }
+{ "point": point("24.82,94.63"), "distance": 25.168816023007512d }
+{ "point": point("32.84,67.14"), "distance": 4.030533463451212d }
+{ "point": point("29.72,75.8"), "distance": 5.806754687430835d }
+{ "point": point("39.28,70.48"), "distance": 9.292405501268227d }
+{ "point": point("40.09,92.69"), "distance": 24.832321679617472d }
+{ "point": point("47.51,83.99"), "distance": 22.41250097601782d }
+{ "point": point("36.21,72.6"), "distance": 6.73231758015024d }
+{ "point": point("46.05,93.34"), "distance": 28.325926286707734d }
+{ "point": point("36.86,74.62"), "distance": 8.270671073135482d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="spatial-area"></a>spatial-area</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>spatial-area(spatial_2d_expression)
+</pre></div></div></li>
+
+<li>
+<p>Returns the spatial area of <tt>spatial_2d_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>spatial_2d_expression</tt> : A <tt>rectangle</tt>, <tt>circle</tt>, or <tt>polygon</tt>.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>double</tt> representing the area of <tt>spatial_2d_expression</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $circleArea := spatial-area(create-circle(create-point(0.0,0.0), 5.0))
+return {"Area":$circleArea}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "Area": 78.53981625d }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="spatial-intersect"></a>spatial-intersect</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>spatial-intersect(spatial_expression1, spatial_expression2)
+</pre></div></div></li>
+
+<li>
+<p>Checks whether <tt>@arg1</tt> and <tt>@arg2</tt> spatially intersect each other.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>spatial_expression1</tt> : A <tt>point</tt>, <tt>line</tt>, <tt>rectangle</tt>, <tt>circle</tt>, or <tt>polygon</tt>.</li>
+
+<li><tt>spatial_expression2</tt> : A <tt>point</tt>, <tt>line</tt>, <tt>rectangle</tt>, <tt>circle</tt>, or <tt>polygon</tt>.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>boolean</tt> representing whether <tt>spatial_expression1</tt> and <tt>spatial_expression2</tt> spatially overlap with each other.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $t in dataset('TweetMessages')
+where spatial-intersect($t.sender-location, create-rectangle(create-point(30.0,70.0), create-point(40.0,80.0)))
+return $t
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "tweetid": "4", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("39.28,70.48"), "send-time": datetime("2011-12-26T10:10:00.000Z"), "referred-topics": {{ "sprint", "voice-command" }}, "message-text": " like sprint the voice-command is mind-blowing:)" }
+{ "tweetid": "7", "user": { "screen-name": "ChangEwing_573", "lang": "en", "friends_count": 182, "statuses_count": 394, "name": "Chang Ewing", "followers_count": 32136 }, "sender-location": point("36.21,72.6"), "send-time": datetime("2011-08-25T10:10:00.000Z"), "referred-topics": {{ "samsung", "platform" }}, "message-text": " like samsung the platform is good" }
+{ "tweetid": "9", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("36.86,74.62"), "send-time": datetime("2012-07-21T10:10:00.000Z"), "referred-topics": {{ "verizon", "voicemail-service" }}, "message-text": " love verizon its voicemail-service is awesome" }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="spatial-cell"></a>spatial-cell</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>spatial-cell(point_expression1, point_expression2, x_increment, y_increment)
+</pre></div></div></li>
+
+<li>
+<p>Returns the grid cell that <tt>point_expression1</tt> belongs to.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>point_expression1</tt> : A <tt>point</tt> representing the point of interest that its grid cell will be returned.</li>
+
+<li><tt>point_expression2</tt> : A <tt>point</tt> representing the origin of the grid.</li>
+
+<li><tt>x_increment</tt> : A <tt>double</tt>, represents X increments.</li>
+
+<li><tt>y_increment</tt> : A <tt>double</tt>, represents Y increments.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>rectangle</tt> representing the grid cell that <tt>point_expression1</tt> belongs to.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $t in dataset('TweetMessages')
+group by $c := spatial-cell($t.sender-location, create-point(20.0,50.0), 5.5, 6.0) with $t
+let $num := count($t)
+return { "cell": $c, "count": $num}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "cell": rectangle("20.0,92.0 25.5,98.0"), "count": 1i64 }
+{ "cell": rectangle("25.5,74.0 31.0,80.0"), "count": 2i64 }
+{ "cell": rectangle("31.0,62.0 36.5,68.0"), "count": 1i64 }
+{ "cell": rectangle("31.0,68.0 36.5,74.0"), "count": 1i64 }
+{ "cell": rectangle("36.5,68.0 42.0,74.0"), "count": 2i64 }
+{ "cell": rectangle("36.5,74.0 42.0,80.0"), "count": 1i64 }
+{ "cell": rectangle("36.5,92.0 42.0,98.0"), "count": 1i64 }
+{ "cell": rectangle("42.0,80.0 47.5,86.0"), "count": 1i64 }
+{ "cell": rectangle("42.0,92.0 47.5,98.0"), "count": 1i64 }
+{ "cell": rectangle("47.5,80.0 53.0,86.0"), "count": 1i64 }
+</pre></div></div></li>
+</ul></div></div>
+<div class="section">
+<h2><a name="Similarity_Functions_Back_to_TOC"></a><a name="SimilarityFunctions" id="SimilarityFunctions">Similarity Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>AsterixDB supports queries with different similarity functions, including <a class="externalLink" href="http://en.wikipedia.org/wiki/Levenshtein_distance">edit distance</a> and <a class="externalLink" href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard</a>.</p>
+<div class="section">
+<h3><a name="edit-distance"></a>edit-distance</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>edit-distance(expression1, expression2)
+</pre></div></div></li>
+
+<li>
+<p>Returns the edit distance of <tt>expression1</tt> and <tt>expression2</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>expression1</tt> : A <tt>string</tt> or a homogeneous <tt>OrderedList</tt> of a comparable item type.</li>
+
+<li><tt>expression2</tt> : The same type as <tt>expression1</tt>.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>int64</tt> that represents the edit distance between <tt>expression1</tt> and <tt>expression2</tt>.</li>
+ </ul></li>
+
+<li>Note: An <a href="similarity.html#UsingIndexesToSupportSimilarityQueries">n-gram index</a> can be utilized for this function.</li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $user in dataset('FacebookUsers')
+let $ed := edit-distance($user.name, "Suzanna Tilson")
+where $ed <= 2
+return $user
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+"id": 7, "alias": "Suzanna", "name": "SuzannaTillson", "user-since": datetime("2012-08-07T10:10:00.000Z"), "friend-ids": {{ 6 }},
+"employment": [ { "organization-name": "Labzatron", "start-date": date("2011-04-19"), "end-date": null } ]
+}
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="edit-distance-check"></a>edit-distance-check</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>edit-distance-check(expression1, expression2, threshold)
+</pre></div></div></li>
+
+<li>
+<p>Checks whether <tt>expression1</tt> and <tt>expression2</tt> have an <a class="externalLink" href="http://en.wikipedia.org/wiki/Levenshtein_distance">edit distance</a> within a given threshold. The “check” version of edit distance is faster than the “non-check” version because the former can detect whether two items satisfy a given threshold using early-termination techniques, as opposed to computing their real distance. Although possible, it is not necessary for the user to write queries using the “check” versions explicitly, since a rewrite rule can perform an appropriate transformation from a “non-check” version to a “check” version.</p></li>
+
+<li>
+<p>Arguments:</p>
+
+<ul>
+
+<li><tt>expression1</tt> : A <tt>string</tt> or a homogeneous <tt>OrderedList</tt> of a comparable item type.</li>
+
+<li><tt>expression2</tt> : The same type as <tt>expression1</tt>.</li>
+
+<li><tt>threshold</tt> : An <tt>int64</tt> that represents the distance threshold.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>OrderedList</tt> with two items:
+
+<ul>
+
+<li>The first item contains a <tt>boolean</tt> value representing whether <tt>expression1</tt> and <tt>expression2</tt> are similar.</li>
+
+<li>The second item contains an <tt>int64</tt> that represents the edit distance of <tt>expression1</tt> and <tt>expression2</tt> if it is within the threshold, or 0 otherwise.</li>
+ </ul></li>
+ </ul></li>
+
+<li>Note: An <a href="similarity.html#UsingIndexesToSupportSimilarityQueries">n-gram index</a> can be utilized for this function.</li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $user in dataset('FacebookUsers')
+let $ed := edit-distance-check($user.name, "Suzanna Tilson", 2)
+where $ed[0]
+return $ed[1]
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>2
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="edit-distance-contains"></a>edit-distance-contains</h3>
+
+<ul>
+
+<li>Syntax:</li>
+</ul>
+<p>edit-distance-contains(expression1, expression2, threshold)</p>
+
+<ul>
+
+<li>
+<p>Checks whether <tt>expression1</tt> contains <tt>expression2</tt> with an <a class="externalLink" href="http://en.wikipedia.org/wiki/Levenshtein_distance">edit distance</a> within a given threshold.</p></li>
+
+<li>
+<p>Arguments:</p>
+
+<ul>
+
+<li><tt>expression1</tt> : A <tt>string</tt> or a homogeneous <tt>OrderedList</tt> of a comparable item type.</li>
+
+<li><tt>expression2</tt> : The same type as <tt>expression1</tt>.</li>
+
+<li><tt>threshold</tt> : An <tt>int32</tt> that represents the distance threshold.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>OrderedList</tt> with two items:
+
+<ul>
+
+<li>The first item contains a <tt>boolean</tt> value representing whether <tt>expression1</tt> can contain <tt>expression2</tt>.</li>
+
+<li>The second item contains an <tt>int32</tt> that represents the required edit distance for <tt>expression1</tt> to contain <tt>expression2</tt> if the first item is true.</li>
+ </ul></li>
+ </ul></li>
+
+<li>Note: An <a href="similarity.html#UsingIndexesToSupportSimilarityQueries">n-gram index</a> can be utilized for this function.</li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $i := edit-distance-contains("happy","hapr",2)
+return $i;
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>[ true, 1 ]
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="similarity-jaccard"></a>similarity-jaccard</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>similarity-jaccard(list_expression1, list_expression2)
+</pre></div></div></li>
+
+<li>
+<p>Returns the <a class="externalLink" href="http://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a> of <tt>list_expression1</tt> and <tt>list_expression2</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>list_expression1</tt> : An <tt>UnorderedList</tt> or <tt>OrderedList</tt>.</li>
+
+<li><tt>list_expression2</tt> : An <tt>UnorderedList</tt> or <tt>OrderedList</tt>.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>float</tt> that represents the Jaccard similarity of <tt>list_expression1</tt> and <tt>list_expression2</tt>.</li>
+ </ul></li>
+
+<li>Note: A <a href="similarity.html#UsingIndexesToSupportSimilarityQueries">keyword index</a> can be utilized for this function.</li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $user in dataset('FacebookUsers')
+let $sim := similarity-jaccard($user.friend-ids, [1,5,9,10])
+where $sim >= 0.6f
+return $user
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+"id": 3, "alias": "Emory", "name": "EmoryUnk", "user-since": datetime("2012-07-10T10:10:00.000Z"), "friend-ids": {{ 1, 5, 8, 9 }},
+"employment": [ { "organization-name": "geomedia", "start-date": date("2010-06-17"), "end-date": date("2010-01-26") } ]
+}
+{
+"id": 10, "alias": "Bram", "name": "BramHatch", "user-since": datetime("2010-10-16T10:10:00.000Z"), "friend-ids": {{ 1, 5, 9 }},
+"employment": [ { "organization-name": "physcane", "start-date": date("2007-06-05"), "end-date": date("2011-11-05") } ]
+}
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="similarity-jaccard-check"></a>similarity-jaccard-check</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>similarity-jaccard-check(list_expression1, list_expression2, threshold)
+</pre></div></div></li>
+
+<li>
+<p>Checks whether <tt>list_expression1</tt> and <tt>list_expression2</tt> have a <a class="externalLink" href="http://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a> greater than or equal to threshold. Again, the “check” version of Jaccard is faster than the “non-check” version.</p></li>
+
+<li>
+<p>Arguments:</p>
+
+<ul>
+
+<li><tt>list_expression1</tt> : An <tt>UnorderedList</tt> or <tt>OrderedList</tt>.</li>
+
+<li><tt>list_expression2</tt> : An <tt>UnorderedList</tt> or <tt>OrderedList</tt>.</li>
+
+<li><tt>threshold</tt> : A <tt>float</tt> that represents the similarity threshold.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>OrderedList</tt> with two items:</li>
+
+<li>The first item contains a <tt>boolean</tt> value representing whether <tt>list_expression1</tt> and <tt>list_expression2</tt> are similar.</li>
+
+<li>The second item contains a <tt>float</tt> that represents the Jaccard similarity of <tt>list_expression1</tt> and <tt>list_expression2</tt> if it is greater than or equal to the threshold, or 0 otherwise.</li>
+ </ul></li>
+
+<li>Note: A <a href="similarity.html#UsingIndexesToSupportSimilarityQueries">keyword index</a> can be utilized for this function.</li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $user in dataset('FacebookUsers')
+let $sim := similarity-jaccard-check($user.friend-ids, [1,5,9,10], 0.6f)
+where $sim[0]
+return $sim[1]
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>0.75f
+1.0f
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="Similarity_Operator_"></a>Similarity Operator ~=</h3>
+
+<ul>
+
+<li>“<tt>~=</tt>” is syntactic sugar for expressing a similarity condition with a given similarity threshold.</li>
+
+<li>The similarity function and threshold for “<tt>~=</tt>” are controlled via “set” directives.</li>
+
+<li>The “<tt>~=</tt>” operator returns a <tt>boolean</tt> value that represents whether the operands are similar.</li>
+
+<li>
+<p>Example for Jaccard similarity:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+set simfunction "jaccard";
+set simthreshold "0.6f";
+
+for $user in dataset('FacebookUsers')
+where $user.friend-ids ~= [1,5,9,10]
+return $user
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+"id": 3, "alias": "Emory", "name": "EmoryUnk", "user-since": datetime("2012-07-10T10:10:00.000Z"), "friend-ids": {{ 1, 5, 8, 9 }},
+"employment": [ { "organization-name": "geomedia", "start-date": date("2010-06-17"), "end-date": date("2010-01-26") } ]
+}
+{
+"id": 10, "alias": "Bram", "name": "BramHatch", "user-since": datetime("2010-10-16T10:10:00.000Z"), "friend-ids": {{ 1, 5, 9 }},
+"employment": [ { "organization-name": "physcane", "start-date": date("2007-06-05"), "end-date": date("2011-11-05") } ]
+}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>Example for edit-distance similarity:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+set simfunction "edit-distance";
+set simthreshold "2";
+
+for $user in dataset('FacebookUsers')
+where $user.name ~= "Suzanna Tilson"
+return $user
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected output is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+"id": 7, "alias": "Suzanna", "name": "SuzannaTillson", "user-since": datetime("2012-08-07T10:10:00.000Z"), "friend-ids": {{ 6 }},
+"employment": [ { "organization-name": "Labzatron", "start-date": date("2011-04-19"), "end-date": null } ]
+}
+</pre></div></div></li>
+</ul></div></div>
+<div class="section">
+<h2><a name="Tokenizing_Functions_Back_to_TOC"></a><a name="TokenizingFunctions" id="TokenizingFunctions">Tokenizing Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="word-tokens"></a>word-tokens</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>word-tokens(string_expression)
+</pre></div></div></li>
+
+<li>
+<p>Returns a list of word tokens of <tt>string_expression</tt> using non-alphanumeric characters as delimiters.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> that will be tokenized.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>OrderedList</tt> of <tt>string</tt> word tokens.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $t in dataset('TweetMessages')
+let $tokens := word-tokens($t.message-text)
+where $t.send-time >= datetime('2012-01-01T00:00:00')
+return {
+"tweetid": $t.tweetid,
+"word-tokens": $tokens
+}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "tweetid": "9", "word-tokens": [ "love", "verizon", "its", "voicemail", "service", "is", "awesome" ] }
+</pre></div></div></li>
+</ul>
+<!-- ### hashed-word-tokens ###
+ * Syntax:
+
+ hashed-word-tokens(string_expression)
+
+ * Returns a list of hashed word tokens of `string_expression`.
+ * Arguments:
+ * `string_expression` : A `string` that will be tokenized.
+ * Return Value:
+ * An `OrderedList` of `int32` hashed tokens.
+
+ * Example:
+
+ use dataverse TinySocial;
+
+ for $t in dataset('TweetMessages')
+ let $tokens := hashed-word-tokens($t.message-text)
+ where $t.send-time >= datetime('2012-01-01T00:00:00')
+ return {
+ "tweetid": $t.tweetid,
+ "hashed-word-tokens": $tokens
+ }
+
+
+ * The expected result is:
+
+ { "tweetid": "9", "hashed-word-tokens": [ -1217719622, -447857469, -1884722688, -325178649, 210976949, 285049676, 1916743959 ] }
+
+
+### counthashed-word-tokens ###
+ * Syntax:
+
+ counthashed-word-tokens(string_expression)
+
+ * Returns a list of hashed word tokens of `string_expression`. The hashing mechanism gives duplicate tokens different hash values, based on the occurrence count of that token.
+ * Arguments:
+ * `string_expression` : A `String` that will be tokenized.
+ * Return Value:
+ * An `OrderedList` of `Int32` hashed tokens.
+ * Example:
+
+ use dataverse TinySocial;
+
+ for $t in dataset('TweetMessages')
+ let $tokens := counthashed-word-tokens($t.message-text)
+ where $t.send-time >= datetime('2012-01-01T00:00:00')
+ return {
+ "tweetid": $t.tweetid,
+ "counthashed-word-tokens": $tokens
+ }
+
+
+ * The expected result is:
+
+ { "tweetid": "9", "counthashed-word-tokens": [ -1217719622, -447857469, -1884722688, -325178649, 210976949, 285049676, 1916743959 ] }
+
+
+### gram-tokens ###
+ * Syntax:
+
+ gram-tokens(string_expression, gram_length, boolean_expression)
+
+ * Returns a list of gram tokens of `string_expression`, which can be obtained by scanning the characters using a sliding window of a fixed length.
+ * Arguments:
+ * `string_expression` : A `String` that will be tokenized.
+ * `gram_length` : An `Int32` as the length of grams.
+ * `boolean_expression` : A `Boolean` value to indicate whether to generate additional grams by pre- and postfixing `string_expression` with special characters.
+ * Return Value:
+ * An `OrderedList` of String gram tokens.
+
+ * Example:
+
+ use dataverse TinySocial;
+
+ for $t in dataset('TweetMessages')
+ let $tokens := gram-tokens($t.message-text, 3, true)
+ where $t.send-time >= datetime('2012-01-01T00:00:00')
+ return {
+ "tweetid": $t.tweetid,
+ "gram-tokens": $tokens
+ }
+
+
+ * The expected result is:
+
+ {
+ "tweetid": "9",
+ "gram-tokens": [ "## ", "# l", " lo", "lov", "ove", "ve ", "e v", " ve", "ver", "eri", "riz", "izo", "zon", "on ", "n i", " it", "its", "ts ", "s v", " vo", "voi", "oic", "ice",
+ "cem", "ema", "mai", "ail", "il-", "l-s", "-se", "ser", "erv", "rvi", "vic", "ice", "ce ", "e i", " is", "is ", "s a", " aw", "awe", "wes", "eso", "som", "ome", "me$", "e$$" ]
+ }
+
+
+### hashed-gram-tokens ###
+ * Syntax:
+
+ hashed-gram-tokens(string_expression, gram_length, boolean_expression)
+
+ * Returns a list of hashed gram tokens of `string_expression`.
+ * Arguments:
+ * `string_expression` : A `String` that will be tokenized.
+ * `gram_length` : An `Int32` as the length of grams.
+ * `boolean_expression` : A `Boolean` to indicate whether to generate additional grams by pre- and postfixing `string_expression` with special characters.
+ * Return Value:
+ * An `OrderedList` of `Int32` hashed gram tokens.
+
+ * Example:
+
+ use dataverse TinySocial;
+
+ for $t in dataset('TweetMessages')
+ let $tokens := hashed-gram-tokens($t.message-text, 3, true)
+ where $t.send-time >= datetime('2012-01-01T00:00:00')
+ return {
+ "tweetid": $t.tweetid,
+ "hashed-gram-tokens": $tokens
+ }
+
+
+ * The expected result is:
+
+ {
+ "tweetid": "9",
+ "hashed-gram-tokens": [ 40557178, -2002241593, 161665899, -856104603, -500544946, 693410611, 395674299, -1015235909, 1115608337, 1187999872, -31006095, -219180466, -1676061637,
+ 1040194153, -1339307841, -1527110163, -1884722688, -179148713, -431014627, -1789789823, -1209719926, 684519765, -486734513, 1734740619, -1971673751, -932421915, -2064668066,
+ -937135958, -790946468, -69070309, 1561601454, 26169001, -160734571, 1330043462, -486734513, -18796768, -470303314, 113421364, 1615760212, 1688217556, 1223719184, 536568131,
+ 1682609873, 2935161, -414769471, -1027490137, 1602276102, 1050490461 ]
+ }
+
+
+### counthashed-gram-tokens ###
+ * Syntax:
+
+ counthashed-gram-tokens(string_expression, gram_length, boolean_expression)
+
+ * Returns a list of hashed gram tokens of `string_expression`. The hashing mechanism gives duplicate tokens different hash values, based on the occurrence count of that token.
+ * Arguments:
+ * `string_expression` : A `String` that will be tokenized.
+ * `gram_length` : An `Int32`, length of grams to generate.
+ * `boolean_expression` : A `Boolean`, whether to generate additional grams by pre- and postfixing `string_expression` with special characters.
+ * Return Value:
+ * An `OrderedList` of `Int32` hashed gram tokens.
+
+ * Example:
+
+ use dataverse TinySocial;
+
+ for $t in dataset('TweetMessages')
+ let $tokens := counthashed-gram-tokens($t.message-text, 3, true)
+ where $t.send-time >= datetime('2012-01-01T00:00:00')
+ return {
+ "tweetid": $t.tweetid,
+ "counthashed-gram-tokens": $tokens
+ }
+
+
+ * The expected result is:
+
+ {
+ "tweetid": "9",
+ "counthashed-gram-tokens": [ 40557178, -2002241593, 161665899, -856104603, -500544946, 693410611, 395674299, -1015235909, 1115608337, 1187999872, -31006095, -219180466, -1676061637,
+ 1040194153, -1339307841, -1527110163, -1884722688, -179148713, -431014627, -1789789823, -1209719926, 684519765, -486734513, 1734740619, -1971673751, -932421915, -2064668066, -937135958,
+ -790946468, -69070309, 1561601454, 26169001, -160734571, 1330043462, -486734512, -18796768, -470303314, 113421364, 1615760212, 1688217556, 1223719184, 536568131, 1682609873, 2935161,
+ -414769471, -1027490137, 1602276102, 1050490461 ]
+ } --></div></div>
+<div class="section">
+<h2><a name="Temporal_Functions_Back_to_TOC"></a><a name="TemporalFunctions" id="TemporalFunctions">Temporal Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="get-yearget-monthget-dayget-hourget-minuteget-secondget-millisecond"></a>get-year/get-month/get-day/get-hour/get-minute/get-second/get-millisecond</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-year/get-month/get-day/get-hour/get-minute/get-second/get-millisecond(temporal_expression)
+</pre></div></div></li>
+
+<li>
+<p>Accessors for accessing fields in a temporal value</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>temporal_expression</tt> : a temporal value represented as one of the following types: <tt>date</tt>, <tt>datetime</tt>, <tt>time</tt>, and <tt>duration</tt>.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>int64</tt> value representing the field to be extracted.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $c1 := date("2010-10-30")
+let $c2 := datetime("1987-11-19T23:49:23.938")
+let $c3 := time("12:23:34.930+07:00")
+let $c4 := duration("P3Y73M632DT49H743M3948.94S")
+
+return {"year": get-year($c1), "month": get-month($c2), "day": get-day($c1), "hour": get-hour($c3), "min": get-minute($c4), "second": get-second($c2), "ms": get-millisecond($c4)}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "year": 2010, "month": 11, "day": 30, "hour": 5, "min": 28, "second": 23, "ms": 94 }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="adjust-datetime-for-timezone"></a>adjust-datetime-for-timezone</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>adjust-datetime-for-timezone(datetime_expression, string_expression)
+</pre></div></div></li>
+
+<li>
+<p>Adjusts the given datetime <tt>datetime_expression</tt> by applying the timezone information <tt>string_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>datetime_expression</tt> : A <tt>datetime</tt> value to be adjusted.</li>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> representing the timezone information.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>string</tt> value representing the new datetime after being adjusted by the timezone information.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('TweetMessages')
+return {"adjusted-send-time": adjust-datetime-for-timezone($i.send-time, "+08:00"), "message": $i.message-text}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "adjusted-send-time": "2008-04-26T18:10:00.000+08:00", "message": " love t-mobile its customization is good:)" }
+{ "adjusted-send-time": "2010-05-13T18:10:00.000+08:00", "message": " like verizon its shortcut-menu is awesome:)" }
+{ "adjusted-send-time": "2006-11-04T18:10:00.000+08:00", "message": " like motorola the speed is good:)" }
+{ "adjusted-send-time": "2011-12-26T18:10:00.000+08:00", "message": " like sprint the voice-command is mind-blowing:)" }
+{ "adjusted-send-time": "2006-08-04T18:10:00.000+08:00", "message": " can't stand motorola its speed is terrible:(" }
+{ "adjusted-send-time": "2010-05-07T18:10:00.000+08:00", "message": " like iphone the voice-clarity is good:)" }
+{ "adjusted-send-time": "2011-08-25T18:10:00.000+08:00", "message": " like samsung the platform is good" }
+{ "adjusted-send-time": "2005-10-14T18:10:00.000+08:00", "message": " like t-mobile the shortcut-menu is awesome:)" }
+{ "adjusted-send-time": "2012-07-21T18:10:00.000+08:00", "message": " love verizon its voicemail-service is awesome" }
+{ "adjusted-send-time": "2008-01-26T18:10:00.000+08:00", "message": " hate verizon its voice-clarity is OMG:(" }
+{ "adjusted-send-time": "2008-03-09T18:10:00.000+08:00", "message": " can't stand iphone its platform is terrible" }
+{ "adjusted-send-time": "2010-02-13T18:10:00.000+08:00", "message": " like samsung the voice-command is amazing:)" }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="adjust-time-for-timezone"></a>adjust-time-for-timezone</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>adjust-time-for-timezone(time_expression, string_expression)
+</pre></div></div></li>
+
+<li>
+<p>Adjusts the given time <tt>time_expression</tt> by applying the timezone information <tt>string_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>time_expression</tt> : A <tt>time</tt> value to be adjusted.</li>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> representing the timezone information.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>string</tt> value representing the new time after being adjusted by the timezone information.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('TweetMessages')
+return {"adjusted-send-time": adjust-time-for-timezone(time-from-datetime($i.send-time), "+08:00"), "message": $i.message-text}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "adjusted-send-time": "18:10:00.000+08:00", "message": " love t-mobile its customization is good:)" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " like verizon its shortcut-menu is awesome:)" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " like motorola the speed is good:)" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " like sprint the voice-command is mind-blowing:)" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " can't stand motorola its speed is terrible:(" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " like iphone the voice-clarity is good:)" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " like samsung the platform is good" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " like t-mobile the shortcut-menu is awesome:)" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " love verizon its voicemail-service is awesome" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " hate verizon its voice-clarity is OMG:(" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " can't stand iphone its platform is terrible" }
+{ "adjusted-send-time": "18:10:00.000+08:00", "message": " like samsung the voice-command is amazing:)" }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="calendar-duration-from-datetime"></a>calendar-duration-from-datetime</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>calendar-duration-from-datetime(datetime_expression, duration_expression)
+</pre></div></div></li>
+
+<li>
+<p>Gets a user-friendly representation of the duration <tt>duration_expression</tt> based on the given datetime <tt>datetime_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>datetime_expression</tt> : A <tt>datetime</tt> value to be used as the reference time point.</li>
+
+<li><tt>duration_expression</tt> : A <tt>duration</tt> value to be converted.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>duration</tt> value with the duration as <tt>duration_expression</tt> but with a user-friendly representation.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('TweetMessages')
+where $i.send-time > datetime("2011-01-01T00:00:00")
+return {"since-2011": subtract-datetime($i.send-time, datetime("2011-01-01T00:00:00")), "since-2011-user-friendly": calendar-duration-from-datetime($i.send-time, subtract-datetime($i.send-time, datetime("2011-01-01T00:00:00")))}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "since-2011": duration("P359DT10H10M"), "since-2011-user-friendly": duration("P11M23DT10H10M") }
+{ "since-2011": duration("P236DT10H10M"), "since-2011-user-friendly": duration("P7M23DT10H10M") }
+{ "since-2011": duration("P567DT10H10M"), "since-2011-user-friendly": duration("P1Y6M18DT10H10M") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="get-year-month-durationget-day-time-duration"></a>get-year-month-duration/get-day-time-duration</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-year-month-duration/get-day-time-duration(duration_expression)
+</pre></div></div></li>
+
+<li>
+<p>Extracts the correct <tt>duration</tt> subtype from <tt>duration_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>duration_expression</tt> : A <tt>duration</tt> value to be converted.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>year-month-duration</tt> value or a <tt>day-time-duration</tt> value.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $i := get-year-month-duration(duration("P12M50DT10H"))
+return $i;
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>year-month-duration("P1Y")
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="months-from-year-month-durationmilliseconds-from-day-time-duration"></a>months-from-year-month-duration/milliseconds-from-day-time-duration</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>months-from-year-month-duration/milliseconds-from-day-time-duration(duration_expression)
+</pre></div></div></li>
+
+<li>
+<p>Extracts the number of months or the number of milliseconds from the <tt>duration</tt> subtype.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>duration_expression</tt> : A <tt>duration</tt> of the correct subtype.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>int64</tt> representing the number or months/milliseconds.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $i := months-from-year-month-duration(get-year-month-duration(duration("P5Y7MT50M")))
+return $i;
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>67
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="duration-from-monthsduration-from-ms"></a>duration-from-months/duration-from-ms</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>duration-from-months/duration-from-ms(number_expression)
+</pre></div></div></li>
+
+<li>
+<p>Creates a <tt>duration</tt> from <tt>number_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>number_expression</tt> : An <tt>int64</tt> representing the number of months/milliseconds</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>duration</tt> containing <tt>number_expression</tt> value for months/milliseconds</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $i := duration-from-months(8)
+return $i;
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>duration("P8M")
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="duration-from-interval"></a>duration-from-interval</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>duration-from-interval(interval_expression)
+</pre></div></div></li>
+
+<li>
+<p>Creates a <tt>duration</tt> from <tt>interval_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval_expression</tt> : An <tt>interval</tt> value</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>duration</tt> repesenting the time in the <tt>interval_expression</tt></li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-from-date("2010-10-30", "2010-12-21")
+let $itv2 := interval-from-datetime("2012-06-26T01:01:01.111", "2012-07-27T02:02:02.222")
+let $itv3 := interval-from-time("12:32:38", "20:29:20")
+
+return { "dr1" : duration-from-interval($itv1),
+ "dr2" : duration-from-interval($itv2),
+ "dr3" : duration-from-interval($itv3),
+ "dr4" : duration-from-interval(null) }
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "dr1": day-time-duration("P52D"),
+ "dr2": day-time-duration("P31DT1H1M1.111S"),
+ "dr3": day-time-duration("PT7H56M42S"),
+ "dr4": null }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="current-date"></a>current-date</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>current-date()
+</pre></div></div></li>
+
+<li>
+<p>Gets the current date.</p></li>
+
+<li>Arguments: None</li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>date</tt> value of the date when the function is called.</li>
+ </ul></li>
+</ul></div>
+<div class="section">
+<h3><a name="current-time"></a>current-time</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>current-time()
+</pre></div></div></li>
+
+<li>
+<p>Get the current time</p></li>
+
+<li>Arguments: None</li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>time</tt> value of the time when the function is called.</li>
+ </ul></li>
+</ul></div>
+<div class="section">
+<h3><a name="current-datetime"></a>current-datetime</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>current-datetime()
+</pre></div></div></li>
+
+<li>
+<p>Get the current datetime</p></li>
+
+<li>Arguments: None</li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>datetime</tt> value of the datetime when the function is called.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>{"current-date": current-date(),
+"current-time": current-time(),
+"current-datetime": current-datetime()}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "current-date": date("2013-04-06"),
+"current-time": time("00:48:44.093Z"),
+"current-datetime": datetime("2013-04-06T00:48:44.093Z") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="get-date-from-datetime"></a>get-date-from-datetime</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-date-from-datetime(datetime_expression)
+</pre></div></div></li>
+
+<li>
+<p>Gets the date value from the given datetime value <tt>datetime_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>datetime_expression</tt>: A <tt>datetime</tt> value to be extracted from.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>date</tt> value from the datetime.</li>
+ </ul></li>
+</ul></div>
+<div class="section">
+<h3><a name="get-time-from-datetime"></a>get-time-from-datetime</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-time-from-datetime(datetime_expression)
+</pre></div></div></li>
+
+<li>
+<p>Get the time value from the given datetime value <tt>datetime_expression</tt></p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>datetime_expression</tt>: A <tt>datetime</tt> value to be extracted from</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>time</tt> value from the datetime.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $i in dataset('TweetMessages')
+where $i.send-time > datetime("2011-01-01T00:00:00")
+return {"send-date": get-date-from-datetime($i.send-time), "send-time": get-time-from-datetime($i.send-time)}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "send-date": date("2011-12-26"), "send-time": time("10:10:00.000Z") }
+{ "send-date": date("2011-08-25"), "send-time": time("10:10:00.000Z") }
+{ "send-date": date("2012-07-21"), "send-time": time("10:10:00.000Z") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="day-of-week"></a>day-of-week</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>day-of-week(date_expression)
+</pre></div></div></li>
+
+<li>
+<p>Finds the day of the week for a given date (1-7)</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>date_expression</tt>: A <tt>date</tt> value (Can also be a <tt>datetime</tt>)</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>int8</tt> representing the day of the week (1-7)</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $i := day-of-week( datetime("2012-12-30T12:12:12.039Z"))
+return $i;
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>7
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="date-from-unix-time-in-days"></a>date-from-unix-time-in-days</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>date-from-unix-time-in-days(numeric_expression)
+</pre></div></div></li>
+
+<li>
+<p>Gets a date representing the time after <tt>numeric_expression</tt> days since 1970-01-01.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt> value representing the number of days.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>date</tt> value as the time after <tt>numeric_expression</tt> days since 1970-01-01.</li>
+ </ul></li>
+</ul></div>
+<div class="section">
+<h3><a name="datetime-from-unix-time-in-ms"></a>datetime-from-unix-time-in-ms</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>datetime-from-unix-time-in-ms(numeric_expression)
+</pre></div></div></li>
+
+<li>
+<p>Gets a datetime representing the time after <tt>numeric_expression</tt> milliseconds since 1970-01-01T00:00:00Z.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt> value representing the number of milliseconds.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>datetime</tt> value as the time after <tt>numeric_expression</tt> milliseconds since 1970-01-01T00:00:00Z.</li>
+ </ul></li>
+</ul></div>
+<div class="section">
+<h3><a name="datetime-from-unix-time-in-secs"></a>datetime-from-unix-time-in-secs</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>datetime-from-unix-time-in-secs(numeric_expression)
+</pre></div></div></li>
+
+<li>
+<p>Gets a datetime representing the time after <tt>numeric_expression</tt> seconds since 1970-01-01T00:00:00Z.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt> value representing the number of seconds.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>datetime</tt> value as the time after <tt>numeric_expression</tt> seconds since 1970-01-01T00:00:00Z.</li>
+ </ul></li>
+</ul></div>
+<div class="section">
+<h3><a name="datetime-from-date-time"></a>datetime-from-date-time</h3>
+
+<ul>
+
+<li>Syntax:</li>
+</ul>
+<p>datetime-from-date-time(date_expression,time_expression)</p>
+
+<ul>
+
+<li>Gets a datetime representing the combination of <tt>date_expression</tt> and <tt>time_expression</tt>
+
+<ul>
+
+<li>Arguments:</li>
+
+<li><tt>date_expression</tt>: A <tt>date</tt> value</li>
+
+<li><tt>time_expression</tt> A <tt>time</tt> value</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>datetime</tt> value by combining <tt>date_expression</tt> and <tt>time_expression</tt></li>
+ </ul></li>
+</ul></div>
+<div class="section">
+<h3><a name="time-from-unix-time-in-ms"></a>time-from-unix-time-in-ms</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>time-from-unix-time-in-ms(numeric_expression)
+</pre></div></div></li>
+
+<li>
+<p>Gets a time representing the time after <tt>numeric_expression</tt> milliseconds since 00:00:00.000Z.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt> value representing the number of milliseconds.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>time</tt> value as the time after <tt>numeric_expression</tt> milliseconds since 00:00:00.000Z.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $d := date-from-unix-time-in-days(15800)
+let $dt := datetime-from-unix-time-in-ms(1365139700000)
+let $t := time-from-unix-time-in-ms(3748)
+return {"date": $d, "datetime": $dt, "time": $t}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "date": date("2013-04-05"), "datetime": datetime("2013-04-05T05:28:20.000Z"), "time": time("00:00:03.748Z") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="parse-dateparse-timeparse-datetime"></a>parse-date/parse-time/parse-datetime</h3>
+
+<ul>
+
+<li>Syntax:</li>
+</ul>
+<p>parse-date/parse-time/parse-datetime(date_expression,formatting_expression)</p>
+
+<ul>
+
+<li>Creates a <tt>date/time/date-time</tt> value by treating <tt>date_expression</tt> with formatting <tt>formatting_expression</tt></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>date_expression</tt>: A <tt>string</tt> value representing the <tt>date/time/datetime</tt>.</li>
+
+<li><tt>formatting_expression</tt> A <tt>string</tt> value providing the formatting for <tt>date_expression</tt>.Characters used to create date expression:</li>
+
+<li><tt>h</tt> hours</li>
+
+<li><tt>m</tt> minutes</li>
+
+<li><tt>s</tt> seconds</li>
+
+<li><tt>n</tt> milliseconds</li>
+
+<li><tt>a</tt> am/pm</li>
+
+<li><tt>z</tt> timezone</li>
+
+<li><tt>Y</tt> year</li>
+
+<li><tt>M</tt> month</li>
+
+<li><tt>D</tt> day</li>
+
+<li><tt>W</tt> weekday</li>
+
+<li><tt>-</tt>, <tt>'</tt>, <tt>/</tt>, <tt>.</tt>, <tt>,</tt>, <tt>T</tt> seperators for both time and date</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>date/time/date-time</tt> value corresponding to <tt>date_expression</tt></li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $i := parse-time("30:30","m:s")
+return $i;
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>time("00:30:30.000Z")
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="print-dateprint-timeprint-datetime"></a>print-date/print-time/print-datetime</h3>
+
+<ul>
+
+<li>Syntax:</li>
+</ul>
+<p>print-date/print-time/print-datetime(date_expression,formatting_expression)</p>
+
+<ul>
+
+<li>Creates a <tt>string</tt> representing a <tt>date/time/date-time</tt> value of the <tt>date_expression</tt> using the formatting <tt>formatting_expression</tt></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>date_expression</tt>: A <tt>date/time/datetime</tt> value.</li>
+
+<li><tt>formatting_expression</tt> A <tt>string</tt> value providing the formatting for <tt>date_expression</tt>. Characters used to create date expression:</li>
+
+<li><tt>h</tt> hours</li>
+
+<li><tt>m</tt> minutes</li>
+
+<li><tt>s</tt> seconds</li>
+
+<li><tt>n</tt> milliseconds</li>
+
+<li><tt>a</tt> am/pm</li>
+
+<li><tt>z</tt> timezone</li>
+
+<li><tt>Y</tt> year</li>
+
+<li><tt>M</tt> month</li>
+
+<li><tt>D</tt> day</li>
+
+<li><tt>W</tt> weekday</li>
+
+<li><tt>-</tt>, <tt>'</tt>, <tt>/</tt>, <tt>.</tt>, <tt>,</tt>, <tt>T</tt> seperators for both time and date</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>string</tt> value corresponding to <tt>date_expression</tt></li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $i := print-time(time("00:30:30.000Z"),"m:s")
+return $i;
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>"30:30"
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="get-interval-start_get-interval-end"></a>get-interval-start, get-interval-end</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-interval-start/get-interval-end(interval)
+</pre></div></div></li>
+
+<li>
+<p>Gets the start/end of the given interval.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval</tt>: the interval to be accessed.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>time</tt>, <tt>date</tt>, or <tt>datetime</tt> (depending on the time instances of the interval) representing the starting or ending time.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv := interval-start-from-date("1984-01-01", "P1Y")
+return {"start": get-interval-start($itv), "end": get-interval-end($itv)}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "start": date("1984-01-01"), "end": date("1985-01-01") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="get-interval-start-dateget-interval-start-datetimeget-interval-start-time_get-interval-end-dateget-interval-end-datetimeget-interval-end-time"></a>get-interval-start-date/get-interval-start-datetimeget-interval-start-time, get-interval-end-date/get-interval-end-datetime/get-interval-end-time</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-interval-start-date/get-interval-start-datetime/get-interval-start-time/get-interval-end-date/get-interval-end-datetime/get-interval-end-time(interval)
+</pre></div></div></li>
+
+<li>
+<p>Gets the start/end of the given interval for the specific date/datetime/time type.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval</tt>: the interval to be accessed.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>time</tt>, <tt>date</tt>, or <tt>datetime</tt> (depending on the function) representing the starting or ending time.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-start-from-date("1984-01-01", "P1Y")
+let $itv2 := interval-start-from-datetime("1984-01-01T08:30:00.000", "P1Y1H")
+let $itv3 := interval-start-from-time("08:30:00.000", "P1H")
+return {"start": get-interval-start-date($itv1), "end": get-interval-end-date($itv1), "start": get-interval-start-datetime($itv2), "end": get-interval-end-datetime($itv2), "start": get-interval-start-time($itv3), "end": get-interval-end-time($itv3)}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "start": date("1984-01-01"), "end": date("1985-01-01"), "start": datetime("1984-01-01T08:30:00.000"), "end": datetime("1984-02-01T09:30:00.000"), "start": date("08:30:00.000"), "end": time("09:30:00.000") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="get-overlapping-interval"></a>get-overlapping-interval</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-overlapping-interval(interval_expression_1, interval_expression_2)
+</pre></div></div></li>
+
+<li>
+<p>Gets the start/end of the given interval for the specific date/datetime/time type.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval_expression_1</tt>: an <tt>interval</tt> value</li>
+
+<li><tt>interval_expression_2</tt>: an <tt>interval</tt> value</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>Returns an <tt>interval</tt> that is overlapping <tt>interval_expression_1</tt> and <tt>interval_expression_2</tt>. If <tt>interval_expression_1</tt> and <tt>interval_expression_2</tt> do not overlap <tt>null</tt> is returned. Note each interval must be of the same type.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "overlap1": get-overlapping-interval(interval-from-time(time("11:23:39"), time("18:27:19")), interval-from-time(time("12:23:39"), time("23:18:00"))),
+ "overlap2": get-overlapping-interval(interval-from-time(time("12:23:39"), time("18:27:19")), interval-from-time(time("07:19:39"), time("09:18:00"))),
+ "overlap3": get-overlapping-interval(interval-from-date(date("1980-11-30"), date("1999-09-09")), interval-from-date(date("2013-01-01"), date("2014-01-01"))),
+ "overlap4": get-overlapping-interval(interval-from-date(date("1980-11-30"), date("2099-09-09")), interval-from-date(date("2013-01-01"), date("2014-01-01"))),
+ "overlap5": get-overlapping-interval(interval-from-datetime(datetime("1844-03-03T11:19:39"), datetime("2000-10-30T18:27:19")), interval-from-datetime(datetime("1989-03-04T12:23:39"), datetime("2009-10-10T23:18:00"))),
+ "overlap6": get-overlapping-interval(interval-from-datetime(datetime("1989-03-04T12:23:39"), datetime("2000-10-30T18:27:19")), interval-from-datetime(datetime("1844-03-03T11:19:39"), datetime("1888-10-10T23:18:00"))) }
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "overlap1": interval-time("12:23:39.000Z, 18:27:19.000Z"),
+ "overlap2": null,
+ "overlap3": null,
+ "overlap4": interval-date("2013-01-01, 2014-01-01"),
+ "overlap5": interval-datetime("1989-03-04T12:23:39.000Z, 2000-10-30T18:27:19.000Z"),
+ "overlap6": null }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-beforeinterval-afterinterval-meetsinterval-met-byinterval-overlapsinterval-overlapped-byinterval-overlappinginterval-startsinterval-started-byinterval-coversinterval-covered-byinterval-endsinterval-ended-by"></a>interval-before/interval-after/interval-meets/interval-met-by/interval-overlaps/interval-overlapped-by/interval-overlapping/interval-starts/interval-started-by/interval-covers/interval-covered-by/interval-ends/interval-ended-by</h3>
+<p>See the <a href="allens.html">Allen’s Relations</a>.</p></div>
+<div class="section">
+<h3><a name="interval-bin"></a>interval-bin</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-bin(time-to-bin, time-bin-anchor, duration-bin-size)
+</pre></div></div></li>
+
+<li>
+<p>Return the <tt>interval</tt> value representing the bin containing the <tt>time-to-bin</tt> value.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>time-to-bin</tt>: a date/time/datetime value representing the time to be binned.</li>
+
+<li><tt>time-bin-anchor</tt>: a date/time/datetime value representing an anchor of a bin starts. The type of this argument should be the same as the first <tt>time-to-bin</tt> argument.</li>
+
+<li><tt>duration-bin-size</tt>: the duration value representing the size of the bin, in the type of year-month-duration or day-time-duration. The type of this duration should be compatible with the type of <tt>time-to-bin</tt>, so that the arithmetic operation between <tt>time-to-bin</tt> and <tt>duration-bin-size</tt> is well-defined. Currently AsterixDB supports the following arithmetic operations:
+
+<ul>
+
+<li>datetime +|- year-month-duration</li>
+
+<li>datetime +|- day-time-duration</li>
+
+<li>date +|- year-month-duration</li>
+
+<li>date +|- day-time-duration</li>
+
+<li>time +|- day-time-duration</li>
+ </ul></li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>interval</tt> value representing the bin containing the <tt>time-to-bin</tt> value. Note that the internal type of this interval value should be the same as the <tt>time-to-bin</tt> type.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $c1 := date("2010-10-30")
+let $c2 := datetime("-1987-11-19T23:49:23.938")
+let $c3 := time("12:23:34.930+07:00")
+
+return { "bin1": interval-bin($c1, date("1990-01-01"), year-month-duration("P1Y")),
+ "bin2": interval-bin($c2, datetime("1990-01-01T00:00:00.000Z"), year-month-duration("P6M")),
+ "bin3": interval-bin($c3, time("00:00:00"), day-time-duration("PD1M")),
+ "bin4": interval-bin($c2, datetime("2013-01-01T00:00:00.000"), day-time-duration("PT24H"))
+</pre></div></div> }</li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "bin1": interval-date("2010-01-01, 2011-01-01"),
+ "bin2": interval-datetime("-1987-07-01T00:00:00.000Z, -1986-01-01T00:00:00.000Z"),
+ "bin3": interval-time("05:23:00.000Z, 05:24:00.000Z"),
+ "bin4": interval-datetime("-1987-11-19T00:00:00.000Z, -1987-11-20T00:00:00.000Z")}
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-from-date"></a>interval-from-date</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-from-date(string_expression1, string_expression2)
+</pre></div></div></li>
+
+<li>
+<p>Constructor function for the <tt>interval</tt> type by parsing two date strings.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression1</tt> : The <tt>string</tt> value representing the starting date.</li>
+
+<li><tt>string_expression2</tt> : The <tt>string</tt> value representing the ending date.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>interval</tt> value between the two dates.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>{"date-interval": interval-from-date("2012-01-01", "2013-04-01")}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "date-interval": interval-date("2012-01-01, 2013-04-01") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-from-time"></a>interval-from-time</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-from-time(string_expression1, string_expression2)
+</pre></div></div></li>
+
+<li>
+<p>Constructor function for the <tt>interval</tt> type by parsing two time strings.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression1</tt> : The <tt>string</tt> value representing the starting time.</li>
+
+<li><tt>string_expression2</tt> : The <tt>string</tt> value representing the ending time.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>interval</tt> value between the two times.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>{"time-interval": interval-from-time("12:23:34.456Z", "233445567+0800")}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "time-interval": interval-time("12:23:34.456Z, 15:34:45.567Z") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-from-datetime"></a>interval-from-datetime</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-from-datetime(string_expression1, string_expression2)
+</pre></div></div></li>
+
+<li>
+<p>Constructor function for <tt>interval</tt> type by parsing two datetime strings.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>string_expression1</tt> : The <tt>string</tt> value representing the starting datetime.</li>
+
+<li><tt>string_expression2</tt> : The <tt>string</tt> value representing the ending datetime.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>interval</tt> value between the two datetimes.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>{"datetime-interval": interval-from-datetime("2012-01-01T12:23:34.456+08:00", "20130401T153445567Z")}
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "datetime-interval": interval-datetime("2012-01-01T04:23:34.456Z, 2013-04-01T15:34:45.567Z") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="interval-start-from-datetimedatetime"></a>interval-start-from-date/time/datetime</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>interval-start-from-date/time/datetime(date/time/datetime, duration)
+</pre></div></div></li>
+
+<li>
+<p>Construct an <tt>interval</tt> value by the given starting <tt>date</tt>/<tt>time</tt>/<tt>datetime</tt> and the <tt>duration</tt> that the interval lasts.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>date/time/datetime</tt>: a <tt>string</tt> representing a <tt>date</tt>, <tt>time</tt> or <tt>datetime</tt>, or a <tt>date</tt>/<tt>time</tt>/<tt>datetime</tt> value, representing the starting time point.</li>
+
+<li><tt>duration</tt>: a <tt>string</tt> or <tt>duration</tt> value representing the duration of the interval. Note that duration cannot be negative value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>interval</tt> value representing the interval starting from the given time point with the length of duration.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-start-from-date("1984-01-01", "P1Y")
+let $itv2 := interval-start-from-time(time("02:23:28.394"), "PT3H24M")
+let $itv3 := interval-start-from-datetime("1999-09-09T09:09:09.999", duration("P2M30D"))
+return {"interval1": $itv1, "interval2": $itv2, "interval3": $itv3}
+</pre></div></div></li>
+
+<li>
+<p>The expectecd result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "interval1": interval-date("1984-01-01, 1985-01-01"), "interval2": interval-time("02:23:28.394Z, 05:47:28.394Z"), "interval3": interval-datetime("1999-09-09T09:09:09.999Z, 1999-12-09T09:09:09.999Z") }
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="overlap-bins"></a>overlap-bins</h3>
+
+<ul>
+
+<li>
+<p>Return Value:</p>
+
+<ul>
+
+<li>A <tt>interval</tt> value representing the bin containing the <tt>time-to-bin</tt> value. Note that the internal type of this interval value should be the same as the <tt>time-to-bin</tt> type.</li>
+ </ul></li>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>overlap-bins(interval_expression, time-bin-anchor, duration-bin-size)
+</pre></div></div></li>
+
+<li>
+<p>Returns an ordered list of <tt>interval</tt> values representing each bin that is overlapping the <tt>interval_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>interval_expression</tt>: an <tt>interval</tt> value</li>
+
+<li><tt>time-bin-anchor</tt>: a date/time/datetime value representing an anchor of a bin starts. The type of this argument should be the same as the first <tt>time-to-bin</tt> argument.</li>
+
+<li><tt>duration-bin-size</tt>: the duration value representing the size of the bin, in the type of year-month-duration or day-time-duration. The type of this duration should be compatible with the type of <tt>time-to-bin</tt>, so that the arithmetic operation between <tt>time-to-bin</tt> and <tt>duration-bin-size</tt> is well-defined. Currently AsterixDB supports the following arithmetic operations:
+
+<ul>
+
+<li>datetime +|- year-month-duration</li>
+
+<li>datetime +|- day-time-duration</li>
+
+<li>date +|- year-month-duration</li>
+
+<li>date +|- day-time-duration</li>
+
+<li>time +|- day-time-duration</li>
+ </ul></li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A ordered list of <tt>interval</tt> values representing each bin that is overlapping the <tt>interval_expression</tt>. Note that the internal type as <tt>time-to-bin</tt> and <tt>duration-bin-size</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $itv1 := interval-from-time(time("17:23:37"), time("18:30:21"))
+let $itv2 := interval-from-date(date("1984-03-17"), date("2013-08-22"))
+let $itv3 := interval-from-datetime(datetime("1800-01-01T23:59:48.938"), datetime("2015-07-26T13:28:30.218"))
+return { "timebins": overlap-bins($itv1, time("00:00:00"), day-time-duration("PT30M")),
+ "datebins": overlap-bins($itv2, date("1990-01-01"), year-month-duration("P20Y")),
+ "datetimebins": overlap-bins($itv3, datetime("1900-01-01T00:00:00.000"), year-month-duration("P100Y")) }
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>{ "timebins": [ interval-time("17:00:00.000Z, 17:30:00.000Z"), interval-time("17:30:00.000Z, 18:00:00.000Z"), interval-time("18:00:00.000Z, 18:30:00.000Z"), interval-time("18:30:00.000Z, 19:00:00.000Z") ],
+ "datebins": [ interval-date("1970-01-01, 1990-01-01"), interval-date("1990-01-01, 2010-01-01"), interval-date("2010-01-01, 2030-01-01") ],
+ "datetimebins": [ interval-datetime("1800-01-01T00:00:00.000Z, 1900-01-01T00:00:00.000Z"), interval-datetime("1900-01-01T00:00:00.000Z, 2000-01-01T00:00:00.000Z"), interval-datetime("2000-01-01T00:00:00.000Z, 2100-01-01T00:00:00.000Z") ] }
+</pre></div></div></li>
+</ul></div></div>
+<div class="section">
+<h2><a name="Record_Functions_Back_to_TOC"></a><a name="RecordFunctions" id="RecordFunctions">Record Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="get-record-fields"></a>get-record-fields</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-record-fields(record_expression)
+</pre></div></div></li>
+
+<li>
+<p>Access the record field names, type and open status for a given record.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>record_expression</tt> : a record value.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An order list of <tt>record</tt> values that include the field-name <tt>string</tt>, field-type <tt>string</tt>, is-open <tt>boolean</tt> and optional nested <tt>orderedList</tt> for the values of a nested record.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $r1 := {"id": 1,
+ "project": "AsterixDB",
+ "address": {"city": "Irvine", "state": "CA"},
+ "related": ["Hivestrix", "Preglix", "Apache VXQuery"] }
+return get-record-fields($r1)
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>[ { "field-name": "id", "field-type": "INT64", "is-open": false },
+ { "field-name": "project", "field-type": "STRING", "is-open": false },
+ { "field-name": "address", "field-type": "RECORD", "is-open": false, "nested": [
+ { "field-name": "city", "field-type": "STRING", "is-open": false },
+ { "field-name": "state", "field-type": "STRING", "is-open": false } ] },
+ { "field-name": "related", "field-type": "ORDEREDLIST", "is-open": false, "list": [
+ { "field-type": "STRING" },
+ { "field-type": "STRING" },
+ { "field-type": "STRING" } ] } ]
+</pre></div></div></li>
+</ul>
+<p>]</p></div>
+<div class="section">
+<h3><a name="get-record-field-value"></a>get-record-field-value</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>get-record-field-value(record_expression, string_expression)
+</pre></div></div></li>
+
+<li>
+<p>Access the field name given in the <tt>string_expression</tt> from the <tt>record_expression</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>record_expression</tt> : A <tt>record</tt> value.</li>
+
+<li><tt>string_expression</tt> : A <tt>string</tt> representing the top level field name.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>any</tt> value saved in the designated field of the record.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>let $r1 := {"id": 1,
+ "project": "AsterixDB",
+ "address": {"city": "Irvine", "state": "CA"},
+ "related": ["Hivestrix", "Preglix", "Apache VXQuery"] }
+return get-record-field-value($r1, "project")
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>"AsterixDB"
+</pre></div></div></li>
+</ul></div></div>
+<div class="section">
+<h2><a name="Other_Functions_Back_to_TOC"></a><a name="OtherFunctions" id="OtherFunctions">Other Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<div class="section">
+<h3><a name="create-uuid"></a>create-uuid</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>create-uuid()
+</pre></div></div></li>
+
+<li>
+<p>Generates a <tt>uuid</tt>.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li>none</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A generated <tt>uuid</tt>.</li>
+ </ul></li>
+</ul></div>
+<div class="section">
+<h3><a name="is-null"></a>is-null</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>is-null(var)
+</pre></div></div></li>
+
+<li>
+<p>Checks whether the given variable is a <tt>null</tt> value.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>var</tt> : A variable (any type is allowed).</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>boolean</tt> on whether the variable is a <tt>null</tt> or not.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>for $m in ['hello', 'world', null]
+where not(is-null($m))
+return $m
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>"hello"
+"world"
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="is-system-null"></a>is-system-null</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>is-system-null(var)
+</pre></div></div></li>
+
+<li>
+<p>Checks whether the given variable is a <tt>system null</tt> value.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>var</tt> : A variable (any type is allowed).</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>boolean</tt> on whether the variable is a <tt>system null</tt> or not.</li>
+ </ul></li>
+</ul></div>
+<div class="section">
+<h3><a name="len"></a>len</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+<p>len(list_expression)</p></li>
+
+<li>
+<p>Returns the length of the list list_expression.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>list_expression</tt> : An <tt>OrderedList</tt>, <tt>UnorderedList</tt> or <tt>null</tt>, represents the list need to be checked.</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>An <tt>Int32</tt> that represents the length of list_expression.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+let $l := ["ASTERIX", "Hyracks"]
+return len($l)
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>2
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="not"></a>not</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>not(var)
+</pre></div></div></li>
+
+<li>
+<p>Inverts a <tt>boolean</tt> value</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>var</tt> : A <tt>boolean</tt> (or <tt>null</tt>)</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A <tt>boolean</tt>, the inverse of <tt>var</tt>. returns <tt>null</tt> if <tt>var</tt> is null</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>for $m in ['hello', 'world', null]
+where not(is-null($m))
+return $m
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>"hello"
+"world"
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="range"></a>range</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>range(start_numeric_expression, end_numeric_expression)
+</pre></div></div></li>
+
+<li>
+<p>Generates a series of <tt>int64</tt> values based start the <tt>start_numeric_expression</tt> until the <tt>end_numeric_expression</tt>. The <tt>range</tt> fucntion must be used list argument of a <tt>for</tt> expression.</p></li>
+
+<li>Arguments:</li>
+
+<li><tt>start_numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt> value representing the start value.</li>
+
+<li><tt>end_numeric_expression</tt>: A <tt>int8</tt>/<tt>int16</tt>/<tt>int32</tt>/<tt>int64</tt> value representing the max final value.</li>
+
+<li>Return Value:
+
+<ul>
+
+<li>A generated <tt>uuid</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example:</p>
+
+<div class="source">
+<div class="source">
+<pre>for $i in range(0, 3)
+return $i;
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>[ 0
+, 1
+, 2
+, 3
+]
+</pre></div></div></li>
+</ul></div>
+<div class="section">
+<h3><a name="switch-case"></a>switch-case</h3>
+
+<ul>
+
+<li>
+<p>Syntax:</p>
+
+<div class="source">
+<div class="source">
+<pre>switch-case(condition,
+ case1, case1-result,
+ case2, case2-result,
+ ...,
+ default, default-result
+)
+</pre></div></div></li>
+
+<li>
+<p>Switches amongst a sequence of cases and returns the result of the first matching case. If no match is found, the result of the default case is returned.</p></li>
+
+<li>Arguments:
+
+<ul>
+
+<li><tt>condition</tt>: A variable (any type is allowed).</li>
+
+<li><tt>caseI/default</tt>: A variable (any type is allowed).</li>
+
+<li><tt>caseI/default-result</tt>: A variable (any type is allowed).</li>
+ </ul></li>
+
+<li>Return Value:
+
+<ul>
+
+<li>Returns <tt>caseI-result</tt> if <tt>condition</tt> matches <tt>caseI</tt>, otherwise <tt>default-result</tt>.</li>
+ </ul></li>
+
+<li>
+<p>Example 1:</p>
+
+<div class="source">
+<div class="source">
+<pre>switch-case("a",
+ "a", 0,
+ "x", 1,
+ "y", 2,
+ "z", 3
+)
+</pre></div></div></li>
+</ul>
+
+<ul>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>0
+</pre></div></div></li>
+
+<li>
+<p>Example 2:</p>
+
+<div class="source">
+<div class="source">
+<pre>switch-case("a",
+ "x", 1,
+ "y", 2,
+ "z", 3
+)
+</pre></div></div></li>
+
+<li>
+<p>The expected result is:</p>
+
+<div class="source">
+<div class="source">
+<pre>3
+</pre></div></div></li>
+</ul></div></div>
+ </div>
+ </div>
+ </div>
+
+ <hr/>
+
+ <footer>
+ <div class="container-fluid">
+ <div class="row span12">Copyright © 2015
+ <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+ All Rights Reserved.
+
+ </div>
+
+ <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+ feather logo, and the Apache AsterixDB project logo are either
+ registered trademarks or trademarks of The Apache Software
+ Foundation in the United States and other countries.
+ All other marks mentioned may be trademarks or registered
+ trademarks of their respective owners.</div>
+
+
+ </div>
+ </footer>
+ </body>
+</html>
diff --git a/docs/0.8.7-incubating/aql/js-sdk.html b/docs/0.8.7-incubating/aql/js-sdk.html
new file mode 100644
index 0000000..3f1129c
--- /dev/null
+++ b/docs/0.8.7-incubating/aql/js-sdk.html
@@ -0,0 +1,958 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2015-11-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+ <head>
+ <meta charset="UTF-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+ <meta name="Date-Revision-yyyymmdd" content="20151124" />
+ <meta http-equiv="Content-Language" content="en" />
+ <title>AsterixDB – AsterixDB Javascript SDK</title>
+ <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+ <link rel="stylesheet" href="../css/site.css" />
+ <link rel="stylesheet" href="../css/print.css" media="print" />
+
+
+ <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+
+
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');</script>
+
+ </head>
+ <body class="topBarDisabled">
+
+
+
+
+ <div class="container-fluid">
+ <div id="banner">
+ <div class="pull-left">
+ <a href="http://asterixdb.apache.org/" id="bannerLeft">
+ <img src="../images/asterixlogo.png" alt="AsterixDB"/>
+ </a>
+ </div>
+ <div class="pull-right"> </div>
+ <div class="clear"><hr/></div>
+ </div>
+
+ <div id="breadcrumbs">
+ <ul class="breadcrumb">
+
+
+ <li id="publishDate">Last Published: 2015-11-24</li>
+
+
+
+ <li id="projectVersion" class="pull-right">Version: 0.8.7-incubating</li>
+
+ <li class="divider pull-right">|</li>
+
+ <li class="pull-right"> <a href="../index.html" title="Documentation Home">
+ Documentation Home</a>
+ </li>
+
+ </ul>
+ </div>
+
+
+ <div class="row-fluid">
+ <div id="leftColumn" class="span3">
+ <div class="well sidebar-nav">
+
+
+ <ul class="nav nav-list">
+ <li class="nav-header">Documentation</li>
+
+ <li>
+
+ <a href="../install.html" title="Installing and Managing AsterixDB using Managix">
+ <i class="none"></i>
+ Installing and Managing AsterixDB using Managix</a>
+ </li>
+
+ <li>
+
+ <a href="../yarn.html" title="Deploying AsterixDB using YARN">
+ <i class="none"></i>
+ Deploying AsterixDB using YARN</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer.html" title="AsterixDB 101: An ADM and AQL Primer">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer-sql-like.html" title="AsterixDB 101: An ADM and AQL Primer (For SQL Fans)">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer (For SQL Fans)</a>
+ </li>
+
+ <li class="active">
+
+ <a href="#"><i class="none"></i>AsterixDB Javascript SDK</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/datamodel.html" title="Asterix Data Model (ADM)">
+ <i class="none"></i>
+ Asterix Data Model (ADM)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/manual.html" title="Asterix Query Language (AQL)">
+ <i class="none"></i>
+ Asterix Query Language (AQL)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/functions.html" title="AQL Functions">
+ <i class="none"></i>
+ AQL Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/allens.html" title="AQL Allen's Relations Functions">
+ <i class="none"></i>
+ AQL Allen's Relations Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/similarity.html" title="AQL Support of Similarity Queries">
+ <i class="none"></i>
+ AQL Support of Similarity Queries</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/externaldata.html" title="Accessing External Data">
+ <i class="none"></i>
+ Accessing External Data</a>
+ </li>
+
+ <li>
+
+ <a href="../feeds/tutorial.html" title="Support for Data Ingestion in AsterixDB">
+ <i class="none"></i>
+ Support for Data Ingestion in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../udf.html" title="Support for User Defined Functions in AsterixDB">
+ <i class="none"></i>
+ Support for User Defined Functions in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
+ <i class="none"></i>
+ Filter-Based LSM Index Acceleration</a>
+ </li>
+
+ <li>
+
+ <a href="../api.html" title="HTTP API to AsterixDB">
+ <i class="none"></i>
+ HTTP API to AsterixDB</a>
+ </li>
+ </ul>
+
+
+
+ <hr class="divider" />
+
+ <div id="poweredBy">
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <a href="https://code.google.com/p/hyracks/" title="Hyracks" class="builtBy">
+ <img class="builtBy" alt="Hyracks" src="../images/hyrax_ts.png" />
+ </a>
+ </div>
+ </div>
+ </div>
+
+
+ <div id="bodyColumn" class="span9" >
+
+ <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements. See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership. The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License. You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied. See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>AsterixDB Javascript SDK</h1>
+<div class="section">
+<h2><a name="Obtaining_and_Including"></a>Obtaining and Including</h2>
+<p><a class="externalLink" href="http://asterixdb.ics.uci.edu/download/bindings/asterix-sdk-stable.js">Download</a> the javascript SDK and include it like any other javascript library by adding the following line in the appropriate HTML file:</p>
+
+<div class="source">
+<div class="source">
+<pre><script src="path/to/asterix-sdk-stable.js"></script>
+</pre></div></div></div>
+<div class="section">
+<h2><a name="Interactive_Demos"></a>Interactive Demos</h2>
+<p>There are two interactive demos that are available for download. Both of the demos illustrate how the javascript API would be used in an application:</p>
+
+<ul>
+
+<li><a class="externalLink" href="http://asterixdb.ics.uci.edu/download/demos/tweetbook-demo.zip">Tweetbook Demo</a>: a contrived geo-spatial application dealing with artificial Tweets allowing spatial, temporal, and keyword-based filtering.</li>
+
+<li><a class="externalLink" href="http://asterixdb.ics.uci.edu/download/demos/admaql101-demo.zip">ADM/AQL 101 Demo</a>: an interactive version of all of the examples that are provided in the following section.</li>
+</ul></div>
+<div class="section">
+<h2><a name="The_javascript_SDK:_by_example"></a>The javascript SDK: by example</h2>
+<p>In this section, we explore how to form AQL queries using the javascript SDK. The queries from <a href="primer.html">AsterixDB 101: An ADM and AQL Primer</a> are used as examples here. For each AQL statement, the equivalent javascript expression is shown below it, followed by the results of executing the query.</p>
+<div class="section">
+<h3><a name="Query_0-A_-_Exact-Match_Lookup"></a>Query 0-A - Exact-Match Lookup</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $user in dataset FacebookUsers
+where $user.id = 8
+return $user;
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression0a = new FLWOGRExpression()
+ .ForClause("$user", new AExpression("dataset FacebookUsers"))
+ .WhereClause(new AExpression("$user.id = 8"))
+ .ReturnClause("$user");
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "id": { int32: 8 } , "alias": "Nila", "name": "NilaMilliron", "user-since": { datetime: 1199182200000}, "friend-ids": { unorderedlist: [{ int32: 3 } ]}, "employment": { orderedlist: [{ "organization-name": "Plexlane", "start-date": { date: 1267315200000}, "end-date": null } ]} }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_0-B_-_Range_Scan"></a>Query 0-B - Range Scan</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $user in dataset FacebookUsers
+where $user.id >= 2 and $user.id <= 4
+return $user;
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression0b = new FLWOGRExpression()
+ .ForClause("$user", new AExpression("dataset FacebookUsers"))
+ .WhereClause().and(new AExpression("$user.id >= 2"), new AExpression("$user.id <= 4"))
+ .ReturnClause("$user");
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "id": { int32: 2 } , "alias": "Isbel", "name": "IsbelDull", "user-since": { datetime: 1295691000000}, "friend-ids": { unorderedlist: [{ int32: 1 } , { int32: 4 } ]}, "employment": { orderedlist: [{ "organization-name": "Hexviafind", "start-date": { date: 1272326400000}, "end-date": null } ]} }
+{ "id": { int32: 3 } , "alias": "Emory", "name": "EmoryUnk", "user-since": { datetime: 1341915000000}, "friend-ids": { unorderedlist: [{ int32: 1 } , { int32: 5 } , { int32: 8 } , { int32: 9 } ]}, "employment": { orderedlist: [{ "organization-name": "geomedia", "start-date": { date: 1276732800000}, "end-date": { date: 1264464000000} } ]} }
+{ "id": { int32: 4 } , "alias": "Nicholas", "name": "NicholasStroh", "user-since": { datetime: 1293444600000}, "friend-ids": { unorderedlist: [{ int32: 2 } ]}, "employment": { orderedlist: [{ "organization-name": "Zamcorporation", "start-date": { date: 1275955200000}, "end-date": null } ]} }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_1_-_Other_Query_Filters"></a>Query 1 - Other Query Filters</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $user in dataset FacebookUsers
+where $user.user-since >= datetime('2010-07-22T00:00:00')
+and $user.user-since <= datetime('2012-07-29T23:59:59')
+return $user;
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression1 = new FLWOGRExpression()
+ .ForClause("$user", new AExpression("dataset FacebookUsers"))
+ .WhereClause().and(
+ new AExpression("$user.user-since >= datetime('2010-07-22T00:00:00')"),
+ new AExpression("$user.user-since <= datetime('2012-07-29T23:59:59')")
+ ).ReturnClause("$user");
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "id": { int32: 2 } , "alias": "Isbel", "name": "IsbelDull", "user-since": { datetime: 1295691000000}, "friend-ids": { unorderedlist: [{ int32: 1 } , { int32: 4 } ]}, "employment": { orderedlist: [{ "organization-name": "Hexviafind", "start-date": { date: 1272326400000}, "end-date": null } ]} }
+{ "id": { int32: 10 } , "alias": "Bram", "name": "BramHatch", "user-since": { datetime: 1287223800000}, "friend-ids": { unorderedlist: [{ int32: 1 } , { int32: 5 } , { int32: 9 } ]}, "employment": { orderedlist: [{ "organization-name": "physcane", "start-date": { date: 1181001600000}, "end-date": { date: 1320451200000} } ]} }
+{ "id": { int32: 3 } , "alias": "Emory", "name": "EmoryUnk", "user-since": { datetime: 1341915000000}, "friend-ids": { unorderedlist: [{ int32: 1 } , { int32: 5 } , { int32: 8 } , { int32: 9 } ]}, "employment": { orderedlist: [{ "organization-name": "geomedia", "start-date": { date: 1276732800000}, "end-date": { date: 1264464000000} } ]} }
+{ "id": { int32: 4 } , "alias": "Nicholas", "name": "NicholasStroh", "user-since": { datetime: 1293444600000}, "friend-ids": { unorderedlist: [{ int32: 2 } ]}, "employment": { orderedlist: [{ "organization-name": "Zamcorporation", "start-date": { date: 1275955200000}, "end-date": null } ]} }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_2-A_-_Equijoin"></a>Query 2-A - Equijoin</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $user in dataset FacebookUsers
+for $message in dataset FacebookMessages
+where $message.author-id = $user.id
+return {
+ "uname": $user.name,
+ "message": $message.message
+};
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression2a = new FLWOGRExpression()
+ .ForClause ("$user", new AExpression("dataset FacebookUsers"))
+ .ForClause ("$message", new AExpression("dataset FacebookMessages"))
+ .WhereClause(new AExpression("$message.author-id = $user.id"))
+ .ReturnClause({
+ "uname" : "$user.name",
+ "message" : "$message.message"
+ });
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "uname": "MargaritaStoddard", "message": " dislike iphone its touch-screen is horrible" }
+{ "uname": "MargaritaStoddard", "message": " like verizon the 3G is awesome:)" }
+{ "uname": "MargaritaStoddard", "message": " can't stand motorola the touch-screen is terrible" }
+{ "uname": "MargaritaStoddard", "message": " can't stand at&t the network is horrible:(" }
+{ "uname": "MargaritaStoddard", "message": " can't stand at&t its plan is terrible" }
+{ "uname": "IsbelDull", "message": " like samsung the plan is amazing" }
+{ "uname": "IsbelDull", "message": " like t-mobile its platform is mind-blowing" }
+{ "uname": "WoodrowNehling", "message": " love at&t its 3G is good:)" }
+{ "uname": "BramHatch", "message": " dislike iphone the voice-command is bad:(" }
+{ "uname": "BramHatch", "message": " can't stand t-mobile its voicemail-service is OMG:(" }
+{ "uname": "EmoryUnk", "message": " love sprint its shortcut-menu is awesome:)" }
+{ "uname": "EmoryUnk", "message": " love verizon its wireless is good" }
+{ "uname": "WillisWynne", "message": " love sprint the customization is mind-blowing" }
+{ "uname": "SuzannaTillson", "message": " like iphone the voicemail-service is awesome" }
+{ "uname": "VonKemble", "message": " dislike sprint the speed is horrible" }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_2-B_-_Index_join"></a>Query 2-B - Index join</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $user in dataset FacebookUsers
+for $message in dataset FacebookMessages
+where $message.author-id /*+ indexnl */ = $user.id
+return {
+ "uname": $user.name,
+ "message": $message.message
+};
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression2b = new FLWOGRExpression()
+ .ForClause ("$user", new AExpression("dataset FacebookUsers"))
+ .ForClause ("$message", new AExpression("dataset FacebookMessages"))
+ .WhereClause(new AExpression("$message.author-id /*+ indexnl */ = $user.id"))
+ .ReturnClause({
+ "uname" : "$user.name",
+ "message" : "$message.message"
+ });
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "uname": "MargaritaStoddard", "message": " dislike iphone its touch-screen is horrible" }
+{ "uname": "MargaritaStoddard", "message": " like verizon the 3G is awesome:)" }
+{ "uname": "MargaritaStoddard", "message": " can't stand motorola the touch-screen is terrible" }
+{ "uname": "MargaritaStoddard", "message": " can't stand at&t the network is horrible:(" }
+{ "uname": "MargaritaStoddard", "message": " can't stand at&t its plan is terrible" }
+{ "uname": "IsbelDull", "message": " like samsung the plan is amazing" }
+{ "uname": "IsbelDull", "message": " like t-mobile its platform is mind-blowing" }
+{ "uname": "WoodrowNehling", "message": " love at&t its 3G is good:)" }
+{ "uname": "BramHatch", "message": " dislike iphone the voice-command is bad:(" }
+{ "uname": "BramHatch", "message": " can't stand t-mobile its voicemail-service is OMG:(" }
+{ "uname": "EmoryUnk", "message": " love sprint its shortcut-menu is awesome:)" }
+{ "uname": "EmoryUnk", "message": " love verizon its wireless is good" }
+{ "uname": "WillisWynne", "message": " love sprint the customization is mind-blowing" }
+{ "uname": "SuzannaTillson", "message": " like iphone the voicemail-service is awesome" }
+{ "uname": "VonKemble", "message": " dislike sprint the speed is horrible" }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_3_-_Nested_Outer_Join"></a>Query 3 - Nested Outer Join</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $user in dataset FacebookUsers
+return {
+ "uname": $user.name,
+ "messages": for $message in dataset FacebookMessages
+ where $message.author-id = $user.id
+ return $message.message
+};
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression3messages = new FLWOGRExpression()
+ .ForClause("$message", new AExpression("dataset FacebookMessages"))
+ .WhereClause(new AExpression("$message.author-id = $user.id"))
+ .ReturnClause("$message.message");
+
+var expression3 = new FLWOGRExpression()
+ .ForClause ("$user", new AExpression("dataset FacebookUsers"))
+ .ReturnClause({
+ "uname": "$user.name",
+ "messages" : expression3messages
+ });
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "uname": "MargaritaStoddard", "messages": { orderedlist: [" dislike iphone its touch-screen is horrible", " like verizon the 3G is awesome:)", " can't stand motorola the touch-screen is terrible", " can't stand at&t the network is horrible:(", " can't stand at&t its plan is terrible" ]} }
+{ "uname": "IsbelDull", "messages": { orderedlist: [" like samsung the plan is amazing", " like t-mobile its platform is mind-blowing" ]} }
+{ "uname": "NilaMilliron", "messages": { orderedlist: [ ]} }
+{ "uname": "WoodrowNehling", "messages": { orderedlist: [" love at&t its 3G is good:)" ]} }
+{ "uname": "BramHatch", "messages": { orderedlist: [" dislike iphone the voice-command is bad:(", " can't stand t-mobile its voicemail-service is OMG:(" ]} }
+{ "uname": "EmoryUnk", "messages": { orderedlist: [" love sprint its shortcut-menu is awesome:)", " love verizon its wireless is good" ]} }
+{ "uname": "WillisWynne", "messages": { orderedlist: [" love sprint the customization is mind-blowing" ]} }
+{ "uname": "SuzannaTillson", "messages": { orderedlist: [" like iphone the voicemail-service is awesome" ]} }
+{ "uname": "NicholasStroh", "messages": { orderedlist: [ ]} }
+{ "uname": "VonKemble", "messages": { orderedlist: [" dislike sprint the speed is horrible" ]} }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_4_-_Theta_Join"></a>Query 4 - Theta Join</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $t in dataset TweetMessages
+return {
+ "message": $t.message-text,
+ "nearby-messages": for $t2 in dataset TweetMessages
+ where spatial-distance($t.sender-location, $t2.sender-location) <= 1
+ return { "msgtxt":$t2.message-text}
+};
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression4messages = new FLWOGRExpression()
+ .ForClause( "$t2", new AExpression("dataset TweetMessages"))
+ .WhereClause( new AExpression("spatial-distance($t.sender-location, $t2.sender-location) <= 1"))
+ .ReturnClause({ "msgtxt" : "$t2.message-text" });
+
+var expression4 = new FLWOGRExpression()
+ .ForClause( "$t", new AExpression("dataset TweetMessages"))
+ .ReturnClause({
+ "message" : "$t.message-text",
+ "nearby-messages" : expression4messages
+ });
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "message": " hate verizon its voice-clarity is OMG:(", "nearby-messages": { orderedlist: [{ "msgtxt": " hate verizon its voice-clarity is OMG:(" }, { "msgtxt": " like motorola the speed is good:)" } ]} }
+{ "message": " like iphone the voice-clarity is good:)", "nearby-messages": { orderedlist: [{ "msgtxt": " like iphone the voice-clarity is good:)" } ]} }
+{ "message": " like samsung the platform is good", "nearby-messages": { orderedlist: [{ "msgtxt": " like samsung the platform is good" } ]} }
+{ "message": " love t-mobile its customization is good:)", "nearby-messages": { orderedlist: [{ "msgtxt": " love t-mobile its customization is good:)" } ]} }
+{ "message": " like samsung the voice-command is amazing:)", "nearby-messages": { orderedlist: [{ "msgtxt": " like samsung the voice-command is amazing:)" } ]} }
+{ "message": " like motorola the speed is good:)", "nearby-messages": { orderedlist: [{ "msgtxt": " hate verizon its voice-clarity is OMG:(" }, { "msgtxt": " like motorola the speed is good:)" } ]} }
+{ "message": " love verizon its voicemail-service is awesome", "nearby-messages": { orderedlist: [{ "msgtxt": " love verizon its voicemail-service is awesome" } ]} }
+{ "message": " can't stand motorola its speed is terrible:(", "nearby-messages": { orderedlist: [{ "msgtxt": " can't stand motorola its speed is terrible:(" } ]} }
+{ "message": " like t-mobile the shortcut-menu is awesome:)", "nearby-messages": { orderedlist: [{ "msgtxt": " like t-mobile the shortcut-menu is awesome:)" } ]} }
+{ "message": " can't stand iphone its platform is terrible", "nearby-messages": { orderedlist: [{ "msgtxt": " can't stand iphone its platform is terrible" } ]} }
+{ "message": " like verizon its shortcut-menu is awesome:)", "nearby-messages": { orderedlist: [{ "msgtxt": " like verizon its shortcut-menu is awesome:)" } ]} }
+{ "message": " like sprint the voice-command is mind-blowing:)", "nearby-messages": { orderedlist: [{ "msgtxt": " like sprint the voice-command is mind-blowing:)" } ]} }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_5_-_Fuzzy_Join"></a>Query 5 - Fuzzy Join</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+<p>use dataverse TinySocial;</p>
+
+<div class="source">
+<div class="source">
+<pre>set simfunction "edit-distance";
+set simthreshold "3";
+
+for $fbu in dataset FacebookUsers
+return {
+ "id": $fbu.id,
+ "name": $fbu.name,
+ "similar-users": for $t in dataset TweetMessages
+ let $tu := $t.user
+ where $tu.name ~= $fbu.name
+ return {
+ "twitter-screenname": $tu.screen-name,
+ "twitter-name": $tu.name
+ }
+};
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var similarUsersExpression = new FLWOGRExpression()
+ .ForClause("$t", new AExpression("dataset TweetMessages"))
+ .LetClause ("$tu", new AExpression("$t.user"))
+ .WhereClause(new AExpression("$tu.name ~= $fbu.name"))
+ .ReturnClause({
+ "twitter-screenname": "$tu.screen-name",
+ "twitter-name": "$tu.name"
+ });
+
+var expression5 = new FLWOGRExpression()
+ .ForClause ("$fbu", new AExpression("dataset FacebookUsers"))
+ .ReturnClause(
+ {
+ "id" : "$fbu.id",
+ "name" : "$fbu.name",
+ "similar-users" : similarUsersExpression
+ }
+ );
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "id": { int32: 1 } , "name": "MargaritaStoddard", "similar-users": { orderedlist: [ ]} }
+{ "id": { int32: 2 } , "name": "IsbelDull", "similar-users": { orderedlist: [ ]} }
+{ "id": { int32: 8 } , "name": "NilaMilliron", "similar-users": { orderedlist: [{ "twitter-screenname": "NilaMilliron_tw", "twitter-name": "Nila Milliron" } ]} }
+{ "id": { int32: 9 } , "name": "WoodrowNehling", "similar-users": { orderedlist: [ ]} }
+{ "id": { int32: 10 } , "name": "BramHatch", "similar-users": { orderedlist: [ ]} }
+{ "id": { int32: 3 } , "name": "EmoryUnk", "similar-users": { orderedlist: [ ]} }
+{ "id": { int32: 6 } , "name": "WillisWynne", "similar-users": { orderedlist: [ ]} }
+{ "id": { int32: 7 } , "name": "SuzannaTillson", "similar-users": { orderedlist: [ ]} }
+{ "id": { int32: 4 } , "name": "NicholasStroh", "similar-users": { orderedlist: [ ]} }
+{ "id": { int32: 5 } , "name": "VonKemble", "similar-users": { orderedlist: [ ]} }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_6_-_Existential_Quantification"></a>Query 6 - Existential Quantification</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $fbu in dataset FacebookUsers
+where (some $e in $fbu.employment satisfies is-null($e.end-date))
+return $fbu;
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression6 = new FLWOGRExpression()
+ .ForClause ("$fbu", new AQLClause().set("dataset FacebookUsers"))
+ .WhereClause(
+ new QuantifiedExpression (
+ "some" ,
+ {"$e" : new AExpression("$fbu.employment") },
+ new FunctionExpression("is-null", new AExpression("$e.end-date"))
+ )
+ )
+ .ReturnClause("$fbu");
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "id": { int32: 1 } , "alias": "Margarita", "name": "MargaritaStoddard", "user-since": { datetime: 1345457400000}, "friend-ids": { unorderedlist: [{ int32: 2 } , { int32: 3 } , { int32: 6 } , { int32: 10 } ]}, "employment": { orderedlist: [{ "organization-name": "Codetechno", "start-date": { date: 1154822400000}, "end-date": null } ]} }
+{ "id": { int32: 2 } , "alias": "Isbel", "name": "IsbelDull", "user-since": { datetime: 1295691000000}, "friend-ids": { unorderedlist: [{ int32: 1 } , { int32: 4 } ]}, "employment": { orderedlist: [{ "organization-name": "Hexviafind", "start-date": { date: 1272326400000}, "end-date": null } ]} }
+{ "id": { int32: 8 } , "alias": "Nila", "name": "NilaMilliron", "user-since": { datetime: 1199182200000}, "friend-ids": { unorderedlist: [{ int32: 3 } ]}, "employment": { orderedlist: [{ "organization-name": "Plexlane", "start-date": { date: 1267315200000}, "end-date": null } ]} }
+{ "id": { int32: 6 } , "alias": "Willis", "name": "WillisWynne", "user-since": { datetime: 1105956600000}, "friend-ids": { unorderedlist: [{ int32: 1 } , { int32: 3 } , { int32: 7 } ]}, "employment": { orderedlist: [{ "organization-name": "jaydax", "start-date": { date: 1242345600000}, "end-date": null } ]} }
+{ "id": { int32: 7 } , "alias": "Suzanna", "name": "SuzannaTillson", "user-since": { datetime: 1344334200000}, "friend-ids": { unorderedlist: [{ int32: 6 } ]}, "employment": { orderedlist: [{ "organization-name": "Labzatron", "start-date": { date: 1303171200000}, "end-date": null } ]} }
+{ "id": { int32: 4 } , "alias": "Nicholas", "name": "NicholasStroh", "user-since": { datetime: 1293444600000}, "friend-ids": { unorderedlist: [{ int32: 2 } ]}, "employment": { orderedlist: [{ "organization-name": "Zamcorporation", "start-date": { date: 1275955200000}, "end-date": null } ]} }
+{ "id": { int32: 5 } , "alias": "Von", "name": "VonKemble", "user-since": { datetime: 1262686200000}, "friend-ids": { unorderedlist: [{ int32: 3 } , { int32: 6 } , { int32: 10 } ]}, "employment": { orderedlist: [{ "organization-name": "Kongreen", "start-date": { date: 1290816000000}, "end-date": null } ]} }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_7_-_Universal_Quantification"></a>Query 7 - Universal Quantification</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $fbu in dataset FacebookUsers
+where (every $e in $fbu.employment satisfies not(is-null($e.end-date)))
+return $fbu;
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression7 = new FLWOGRExpression()
+ .ForClause("$fbu", new AExpression("dataset FacebookUsers"))
+ .WhereClause(
+ new QuantifiedExpression (
+ "every" ,
+ {"$e" : new AExpression("$fbu.employment") },
+ new FunctionExpression("not", new FunctionExpression("is-null", new AExpression("$e.end-date")))
+ )
+ )
+ .ReturnClause("$fbu");
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "id": { int32: 9 } , "alias": "Woodrow", "name": "WoodrowNehling", "user-since": { datetime: 1127211000000}, "friend-ids": { unorderedlist: [{ int32: 3 } , { int32: 10 } ]}, "employment": { orderedlist: [{ "organization-name": "Zuncan", "start-date": { date: 1050969600000}, "end-date": { date: 1260662400000} } ]} }
+{ "id": { int32: 10 } , "alias": "Bram", "name": "BramHatch", "user-since": { datetime: 1287223800000}, "friend-ids": { unorderedlist: [{ int32: 1 } , { int32: 5 } , { int32: 9 } ]}, "employment": { orderedlist: [{ "organization-name": "physcane", "start-date": { date: 1181001600000}, "end-date": { date: 1320451200000} } ]} }
+{ "id": { int32: 3 } , "alias": "Emory", "name": "EmoryUnk", "user-since": { datetime: 1341915000000}, "friend-ids": { unorderedlist: [{ int32: 1 } , { int32: 5 } , { int32: 8 } , { int32: 9 } ]}, "employment": { orderedlist: [{ "organization-name": "geomedia", "start-date": { date: 1276732800000}, "end-date": { date: 1264464000000} } ]} }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_8_-_Simple_Aggregation"></a>Query 8 - Simple Aggregation</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+count(for $fbu in dataset FacebookUsers return $fbu);
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression8 = new FunctionExpression(
+ "count",
+ new FLWOGRExpression()
+ .ForClause("$fbu", new AExpression("dataset FacebookUsers"))
+ .ReturnClause("$fbu")
+);
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ int64: 10 }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_9-A_-_Grouping_and_Aggregation"></a>Query 9-A - Grouping and Aggregation</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $t in dataset TweetMessages
+group by $uid := $t.user.screen-name with $t
+return {
+ "user": $uid,
+ "count": count($t)
+};
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression9a = new FLWOGRExpression()
+ .ForClause("$t", new AExpression("dataset TweetMessages"))
+ .GroupClause("$uid", new AExpression("$t.user.screen-name"), "with", "$t")
+ .ReturnClause(
+ {
+ "user" : "$uid",
+ "count" : new FunctionExpression("count", new AExpression("$t"))
+ }
+ );
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "user": "ColineGeyer@63", "count": { int64: 3 } }
+{ "user": "OliJackson_512", "count": { int64: 1 } }
+{ "user": "NilaMilliron_tw", "count": { int64: 1 } }
+{ "user": "ChangEwing_573", "count": { int64: 1 } }
+{ "user": "NathanGiesen@211", "count": { int64: 6 } }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_9-B_-_Hash-Based_Grouping_and_Aggregation"></a>Query 9-B - (Hash-Based) Grouping and Aggregation</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $t in dataset TweetMessages
+/*+ hash*/
+group by $uid := $t.user.screen-name with $t
+return {
+ "user": $uid,
+ "count": count($t)
+};
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression9b = new FLWOGRExpression()
+ .ForClause("$t", new AExpression("dataset TweetMessages"))
+ .AQLClause("/*+ hash*/")
+ .GroupClause("$uid", new AExpression("$t.user.screen-name"), "with", "$t")
+ .ReturnClause(
+ {
+ "user" : "$uid",
+ "count" : new FunctionExpression("count", new AExpression("$t"))
+ }
+ );
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "user": "ColineGeyer@63", "count": { int64: 3 } }
+{ "user": "OliJackson_512", "count": { int64: 1 } }
+{ "user": "NilaMilliron_tw", "count": { int64: 1 } }
+{ "user": "ChangEwing_573", "count": { int64: 1 } }
+{ "user": "NathanGiesen@211", "count": { int64: 6 } }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_10_-_Grouping_and_Limits"></a>Query 10 - Grouping and Limits</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+for $t in dataset TweetMessages
+group by $uid := $t.user.screen-name with $t
+let $c := count($t)
+order by $c desc
+limit 3
+return {
+ "user": $uid,
+ "count": $c
+};
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression10 = new FLWOGRExpression()
+ .ForClause("$t", new AExpression("dataset TweetMessages"))
+ .GroupClause("$uid", new AExpression("$t.user.screen-name"), "with", "$t")
+ .LetClause("$c", new FunctionExpression("count", new AExpression("$t")))
+ .OrderbyClause( new AExpression("$c"), "desc" )
+ .LimitClause(new AExpression("3"))
+ .ReturnClause(
+ {
+ "user" : "$uid",
+ "count" : "$c"
+ }
+ );
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "user": "NathanGiesen@211", "count": { int64: 6 } }
+{ "user": "ColineGeyer@63", "count": { int64: 3 } }
+{ "user": "NilaMilliron_tw", "count": { int64: 1 } }
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h3><a name="Query_11_-_Left_Outer_Fuzzy_Join"></a>Query 11 - Left Outer Fuzzy Join</h3>
+<div class="section">
+<div class="section">
+<div class="section">
+<h6><a name="AQL"></a>AQL</h6>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+
+set simfunction "jaccard";
+set simthreshold "0.3";
+
+for $t in dataset TweetMessages
+return {
+ "tweet": $t,
+ "similar-tweets": for $t2 in dataset TweetMessages
+ where $t2.referred-topics ~= $t.referred-topics
+ and $t2.tweetid != $t.tweetid
+ return $t2.referred-topics
+};
+</pre></div></div></div>
+<div class="section">
+<h6><a name="JS"></a>JS</h6>
+
+<div class="source">
+<div class="source">
+<pre>var expression11 = new FLWOGRExpression()
+ .ForClause( "$t", new AExpression("dataset TweetMessages"))
+ .ReturnClause({
+ "tweet" : new AExpression("$t"),
+ "similar-tweets": new FLWOGRExpression()
+ .ForClause( "$t2", new AExpression("dataset TweetMessages"))
+ .WhereClause().and(
+ new AExpression("$t2.referred-topics ~= $t.referred-topics"),
+ new AExpression("$t2.tweetid != $t.tweetid")
+ )
+ .ReturnClause("$t2.referred-topics")
+ });
+</pre></div></div></div>
+<div class="section">
+<h6><a name="Results"></a>Results</h6>
+
+<div class="source">
+<div class="source">
+<pre>{ "tweet": { "tweetid": "10", "user": { "screen-name": "ColineGeyer@63", "lang": "en", "friends_count": { int32: 121 } , "statuses_count": { int32: 362 } , "name": "Coline Geyer", "followers_count": { int32: 17159 } }, "sender-location": { point: [29.15, 76.53]}, "send-time": { datetime: 1201342200000}, "referred-topics": { unorderedlist: ["verizon", "voice-clarity" ]}, "message-text": " hate verizon its voice-clarity is OMG:(" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["iphone", "voice-clarity" ]}, { unorderedlist: ["verizon", "shortcut-menu" ]}, { unorderedlist: ["verizon", "voicemail-service" ]} ]} }
+{ "tweet": { "tweetid": "6", "user": { "screen-name": "ColineGeyer@63", "lang": "en", "friends_count": { int32: 121 } , "statuses_count": { int32: 362 } , "name": "Coline Geyer", "followers_count": { int32: 17159 } }, "sender-location": { point: [47.51, 83.99]}, "send-time": { datetime: 1273227000000}, "referred-topics": { unorderedlist: ["iphone", "voice-clarity" ]}, "message-text": " like iphone the voice-clarity is good:)" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["verizon", "voice-clarity" ]}, { unorderedlist: ["iphone", "platform" ]} ]} }
+{ "tweet": { "tweetid": "7", "user": { "screen-name": "ChangEwing_573", "lang": "en", "friends_count": { int32: 182 } , "statuses_count": { int32: 394 } , "name": "Chang Ewing", "followers_count": { int32: 32136 } }, "sender-location": { point: [36.21, 72.6]}, "send-time": { datetime: 1314267000000}, "referred-topics": { unorderedlist: ["samsung", "platform" ]}, "message-text": " like samsung the platform is good" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["iphone", "platform" ]}, { unorderedlist: ["samsung", "voice-command" ]} ]} }
+{ "tweet": { "tweetid": "1", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": { int32: 39339 } , "statuses_count": { int32: 473 } , "name": "Nathan Giesen", "followers_count": { int32: 49416 } }, "sender-location": { point: [47.44, 80.65]}, "send-time": { datetime: 1209204600000}, "referred-topics": { unorderedlist: ["t-mobile", "customization" ]}, "message-text": " love t-mobile its customization is good:)" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["t-mobile", "shortcut-menu" ]} ]} }
+{ "tweet": { "tweetid": "12", "user": { "screen-name": "OliJackson_512", "lang": "en", "friends_count": { int32: 445 } , "statuses_count": { int32: 164 } , "name": "Oli Jackson", "followers_count": { int32: 22649 } }, "sender-location": { point: [24.82, 94.63]}, "send-time": { datetime: 1266055800000}, "referred-topics": { unorderedlist: ["samsung", "voice-command" ]}, "message-text": " like samsung the voice-command is amazing:)" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["samsung", "platform" ]}, { unorderedlist: ["sprint", "voice-command" ]} ]} }
+{ "tweet": { "tweetid": "3", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": { int32: 39339 } , "statuses_count": { int32: 473 } , "name": "Nathan Giesen", "followers_count": { int32: 49416 } }, "sender-location": { point: [29.72, 75.8]}, "send-time": { datetime: 1162635000000}, "referred-topics": { unorderedlist: ["motorola", "speed" ]}, "message-text": " like motorola the speed is good:)" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["motorola", "speed" ]} ]} }
+{ "tweet": { "tweetid": "9", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": { int32: 39339 } , "statuses_count": { int32: 473 } , "name": "Nathan Giesen", "followers_count": { int32: 49416 } }, "sender-location": { point: [36.86, 74.62]}, "send-time": { datetime: 1342865400000}, "referred-topics": { unorderedlist: ["verizon", "voicemail-service" ]}, "message-text": " love verizon its voicemail-service is awesome" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["verizon", "voice-clarity" ]}, { unorderedlist: ["verizon", "shortcut-menu" ]} ]} }
+{ "tweet": { "tweetid": "5", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": { int32: 39339 } , "statuses_count": { int32: 473 } , "name": "Nathan Giesen", "followers_count": { int32: 49416 } }, "sender-location": { point: [40.09, 92.69]}, "send-time": { datetime: 1154686200000}, "referred-topics": { unorderedlist: ["motorola", "speed" ]}, "message-text": " can't stand motorola its speed is terrible:(" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["motorola", "speed" ]} ]} }
+{ "tweet": { "tweetid": "8", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": { int32: 39339 } , "statuses_count": { int32: 473 } , "name": "Nathan Giesen", "followers_count": { int32: 49416 } }, "sender-location": { point: [46.05, 93.34]}, "send-time": { datetime: 1129284600000}, "referred-topics": { unorderedlist: ["t-mobile", "shortcut-menu" ]}, "message-text": " like t-mobile the shortcut-menu is awesome:)" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["verizon", "shortcut-menu" ]}, { unorderedlist: ["t-mobile", "customization" ]} ]} }
+{ "tweet": { "tweetid": "11", "user": { "screen-name": "NilaMilliron_tw", "lang": "en", "friends_count": { int32: 445 } , "statuses_count": { int32: 164 } , "name": "Nila Milliron", "followers_count": { int32: 22649 } }, "sender-location": { point: [37.59, 68.42]}, "send-time": { datetime: 1205057400000}, "referred-topics": { unorderedlist: ["iphone", "platform" ]}, "message-text": " can't stand iphone its platform is terrible" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["iphone", "voice-clarity" ]}, { unorderedlist: ["samsung", "platform" ]} ]} }
+{ "tweet": { "tweetid": "2", "user": { "screen-name": "ColineGeyer@63", "lang": "en", "friends_count": { int32: 121 } , "statuses_count": { int32: 362 } , "name": "Coline Geyer", "followers_count": { int32: 17159 } }, "sender-location": { point: [32.84, 67.14]}, "send-time": { datetime: 1273745400000}, "referred-topics": { unorderedlist: ["verizon", "shortcut-menu" ]}, "message-text": " like verizon its shortcut-menu is awesome:)" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["t-mobile", "shortcut-menu" ]}, { unorderedlist: ["verizon", "voice-clarity" ]}, { unorderedlist: ["verizon", "voicemail-service" ]} ]} }
+{ "tweet": { "tweetid": "4", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": { int32: 39339 } , "statuses_count": { int32: 473 } , "name": "Nathan Giesen", "followers_count": { int32: 49416 } }, "sender-location": { point: [39.28, 70.48]}, "send-time": { datetime: 1324894200000}, "referred-topics": { unorderedlist: ["sprint", "voice-command" ]}, "message-text": " like sprint the voice-command is mind-blowing:)" }, "similar-tweets": { orderedlist: [{ unorderedlist: ["samsung", "voice-command" ]} ]} }
+</pre></div></div></div></div></div></div></div>
+ </div>
+ </div>
+ </div>
+
+ <hr/>
+
+ <footer>
+ <div class="container-fluid">
+ <div class="row span12">Copyright © 2015
+ <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+ All Rights Reserved.
+
+ </div>
+
+ <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+ feather logo, and the Apache AsterixDB project logo are either
+ registered trademarks or trademarks of The Apache Software
+ Foundation in the United States and other countries.
+ All other marks mentioned may be trademarks or registered
+ trademarks of their respective owners.</div>
+
+
+ </div>
+ </footer>
+ </body>
+</html>
diff --git a/docs/0.8.7-incubating/aql/manual.html b/docs/0.8.7-incubating/aql/manual.html
new file mode 100644
index 0000000..e091d43
--- /dev/null
+++ b/docs/0.8.7-incubating/aql/manual.html
@@ -0,0 +1,1029 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2015-11-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+ <head>
+ <meta charset="UTF-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+ <meta name="Date-Revision-yyyymmdd" content="20151124" />
+ <meta http-equiv="Content-Language" content="en" />
+ <title>AsterixDB – The Asterix Query Language, Version 1.0</title>
+ <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+ <link rel="stylesheet" href="../css/site.css" />
+ <link rel="stylesheet" href="../css/print.css" media="print" />
+
+
+ <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+
+
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');</script>
+
+ </head>
+ <body class="topBarDisabled">
+
+
+
+
+ <div class="container-fluid">
+ <div id="banner">
+ <div class="pull-left">
+ <a href="http://asterixdb.apache.org/" id="bannerLeft">
+ <img src="../images/asterixlogo.png" alt="AsterixDB"/>
+ </a>
+ </div>
+ <div class="pull-right"> </div>
+ <div class="clear"><hr/></div>
+ </div>
+
+ <div id="breadcrumbs">
+ <ul class="breadcrumb">
+
+
+ <li id="publishDate">Last Published: 2015-11-24</li>
+
+
+
+ <li id="projectVersion" class="pull-right">Version: 0.8.7-incubating</li>
+
+ <li class="divider pull-right">|</li>
+
+ <li class="pull-right"> <a href="../index.html" title="Documentation Home">
+ Documentation Home</a>
+ </li>
+
+ </ul>
+ </div>
+
+
+ <div class="row-fluid">
+ <div id="leftColumn" class="span3">
+ <div class="well sidebar-nav">
+
+
+ <ul class="nav nav-list">
+ <li class="nav-header">Documentation</li>
+
+ <li>
+
+ <a href="../install.html" title="Installing and Managing AsterixDB using Managix">
+ <i class="none"></i>
+ Installing and Managing AsterixDB using Managix</a>
+ </li>
+
+ <li>
+
+ <a href="../yarn.html" title="Deploying AsterixDB using YARN">
+ <i class="none"></i>
+ Deploying AsterixDB using YARN</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer.html" title="AsterixDB 101: An ADM and AQL Primer">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer-sql-like.html" title="AsterixDB 101: An ADM and AQL Primer (For SQL Fans)">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer (For SQL Fans)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/js-sdk.html" title="AsterixDB Javascript SDK">
+ <i class="none"></i>
+ AsterixDB Javascript SDK</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/datamodel.html" title="Asterix Data Model (ADM)">
+ <i class="none"></i>
+ Asterix Data Model (ADM)</a>
+ </li>
+
+ <li class="active">
+
+ <a href="#"><i class="none"></i>Asterix Query Language (AQL)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/functions.html" title="AQL Functions">
+ <i class="none"></i>
+ AQL Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/allens.html" title="AQL Allen's Relations Functions">
+ <i class="none"></i>
+ AQL Allen's Relations Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/similarity.html" title="AQL Support of Similarity Queries">
+ <i class="none"></i>
+ AQL Support of Similarity Queries</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/externaldata.html" title="Accessing External Data">
+ <i class="none"></i>
+ Accessing External Data</a>
+ </li>
+
+ <li>
+
+ <a href="../feeds/tutorial.html" title="Support for Data Ingestion in AsterixDB">
+ <i class="none"></i>
+ Support for Data Ingestion in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../udf.html" title="Support for User Defined Functions in AsterixDB">
+ <i class="none"></i>
+ Support for User Defined Functions in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
+ <i class="none"></i>
+ Filter-Based LSM Index Acceleration</a>
+ </li>
+
+ <li>
+
+ <a href="../api.html" title="HTTP API to AsterixDB">
+ <i class="none"></i>
+ HTTP API to AsterixDB</a>
+ </li>
+ </ul>
+
+
+
+ <hr class="divider" />
+
+ <div id="poweredBy">
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <a href="https://code.google.com/p/hyracks/" title="Hyracks" class="builtBy">
+ <img class="builtBy" alt="Hyracks" src="../images/hyrax_ts.png" />
+ </a>
+ </div>
+ </div>
+ </div>
+
+
+ <div id="bodyColumn" class="span9" >
+
+ <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements. See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership. The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License. You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied. See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>The Asterix Query Language, Version 1.0</h1>
+<div class="section">
+<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
+
+<ul>
+
+<li><a href="#Introduction">1. Introduction</a></li>
+
+<li><a href="#Expressions">2. Expressions</a></li>
+
+<li><a href="#Statements">3. Statements</a></li>
+</ul></div>
+<div class="section">
+<h2><a name="a1._Introduction_Back_to_TOC"></a><a name="Introduction" id="Introduction">1. Introduction</a><font size="4"> <a href="#toc">[Back to TOC]</a></font></h2>
+<p>This document is intended as a reference guide to the full syntax and semantics of the Asterix Query Language (AQL), the language for talking to AsterixDB. This guide covers both the data manipulation language (DML) aspects of AQL, including its support for queries and data modification, as well as its data definition language (DDL) aspects. New AsterixDB users are encouraged to read and work through the (friendlier) guide “AsterixDB 101: An ADM and AQL Primer” before attempting to make use of this document. In addition, readers are advised to read and understand the Asterix Data Model (ADM) reference guide since a basic understanding of ADM concepts is a prerequisite to understanding AQL. In what follows, we detail the features of the AQL language in a grammar-guided manner: We list and briefly explain each of the productions in the AQL grammar, offering examples for clarity in cases where doing so seems needed or helpful.</p></div>
+<div class="section">
+<h2><a name="a2._Expressions_Back_to_TOC"></a><a name="Expressions" id="Expressions">2. Expressions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+
+<div class="source">
+<div class="source">
+<pre>Query ::= Expression
+</pre></div></div>
+<p>An AQL query can be any legal AQL expression.</p>
+
+<div class="source">
+<div class="source">
+<pre>Expression ::= ( OperatorExpr | IfThenElse | FLWOR | QuantifiedExpression )
+</pre></div></div>
+<p>AQL is a fully composable expression language. Each AQL expression returns zero or more Asterix Data Model (ADM) instances. There are four major kinds of expressions in AQL. At the topmost level, an AQL expression can be an OperatorExpr (similar to a mathematical expression), an IfThenElse (to choose between two alternative values), a FLWOR expression (the heart of AQL, pronounced “flower expression”), or a QuantifiedExpression (which yields a boolean value). Each will be detailed as we explore the full AQL grammar.</p>
+<div class="section">
+<h3><a name="Primary_Expressions"></a>Primary Expressions</h3>
+
+<div class="source">
+<div class="source">
+<pre>PrimaryExpr ::= Literal
+ | VariableRef
+ | ParenthesizedExpression
+ | FunctionCallExpr
+ | DatasetAccessExpression
+ | ListConstructor
+ | RecordConstructor
+</pre></div></div>
+<p>The most basic building block for any AQL expression is the PrimaryExpr. This can be a simple literal (constant) value, a reference to a query variable that is in scope, a parenthesized expression, a function call, an expression accessing the ADM contents of a dataset, a newly constructed list of ADM instances, or a newly constructed ADM record.</p>
+<div class="section">
+<h4><a name="Literals"></a>Literals</h4>
+
+<div class="source">
+<div class="source">
+<pre>Literal ::= StringLiteral
+ | IntegerLiteral
+ | FloatLiteral
+ | DoubleLiteral
+ | "null"
+ | "true"
+ | "false"
+StringLiteral ::= ("\"" (<ESCAPE_QUOT> | ~["\""])* "\"")
+ | ("\'" (<ESCAPE_APOS> | ~["\'"])* "\'")
+<ESCAPE_QUOT> ::= "\\\""
+<ESCAPE_APOS> ::= "\\\'"
+IntegerLiteral ::= <DIGITS>
+<DIGITS> ::= ["0" - "9"]+
+FloatLiteral ::= <DIGITS> ( "f" | "F" )
+ | <DIGITS> ( "." <DIGITS> ( "f" | "F" ) )?
+ | "." <DIGITS> ( "f" | "F" )
+DoubleLiteral ::= <DIGITS>
+ | <DIGITS> ( "." <DIGITS> )?
+ | "." <DIGITS>
+</pre></div></div>
+<p>Literals (constants) in AQL can be strings, integers, floating point values, double values, boolean constants, or the constant value null. The null value in AQL has “unknown” or “missing” value semantics, similar to (though not identical to) nulls in the relational query language SQL.</p>
+<p>The following are some simple examples of AQL literals. Since AQL is an expression language, each example is also a complete, legal AQL query (!).</p>
+<div class="section">
+<h5><a name="Examples"></a>Examples</h5>
+
+<div class="source">
+<div class="source">
+<pre>"a string"
+42
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Variable_References"></a>Variable References</h4>
+
+<div class="source">
+<div class="source">
+<pre>VariableRef ::= <VARIABLE>
+<VARIABLE> ::= "$" <LETTER> (<LETTER> | <DIGIT> | "_")*
+<LETTER> ::= ["A" - "Z", "a" - "z"]
+</pre></div></div>
+<p>A variable in AQL can be bound to any legal ADM value. A variable reference refers to the value to which an in-scope variable is bound. (E.g., a variable binding may originate from one of the for or let clauses of a FLWOR expression or from an input parameter in the context of an AQL function body.)</p>
+<div class="section">
+<h5><a name="Examples"></a>Examples</h5>
+
+<div class="source">
+<div class="source">
+<pre>$tweet
+$id
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Parenthesized_Expressions"></a>Parenthesized Expressions</h4>
+
+<div class="source">
+<div class="source">
+<pre>ParenthesizedExpression ::= "(" Expression ")"
+</pre></div></div>
+<p>As in most languages, an expression may be parenthesized.</p>
+<p>Since AQL is an expression language, the following example expression is actually also a complete, legal AQL query whose result is the value 2. (As such, you can have Big Fun explaining to your boss how AsterixDB and AQL can turn your 1000-node shared-nothing Big Data cluster into a $5M calculator in its spare time.)</p>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>( 1 + 1 )
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Function_Calls"></a>Function Calls</h4>
+
+<div class="source">
+<div class="source">
+<pre>FunctionCallExpr ::= FunctionOrTypeName "(" ( Expression ( "," Expression )* )? ")"
+</pre></div></div>
+<p>Functions are included in AQL, like most languages, as a way to package useful functionality or to componentize complicated or reusable AQL computations. A function call is a legal AQL query expression that represents the ADM value resulting from the evaluation of its body expression with the given parameter bindings; the parameter value bindings can themselves be any AQL expressions.</p>
+<p>The following example is a (built-in) function call expression whose value is 8.</p>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>string-length("a string")
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Dataset_Access"></a>Dataset Access</h4>
+
+<div class="source">
+<div class="source">
+<pre>DatasetAccessExpression ::= "dataset" ( ( Identifier ( "." Identifier )? )
+ | ( "(" Expression ")" ) )
+Identifier ::= <IDENTIFIER> | StringLiteral
+<IDENTIFIER> ::= <LETTER> (<LETTER> | <DIGIT> | <SPECIALCHARS>)*
+<SPECIALCHARS> ::= ["$", "_", "-"]
+</pre></div></div>
+<p>Querying Big Data is the main point of AsterixDB and AQL. Data in AsterixDB reside in datasets (collections of ADM records), each of which in turn resides in some namespace known as a dataverse (data universe). Data access in a query expression is accomplished via a DatasetAccessExpression. Dataset access expressions are most commonly used in FLWOR expressions, where variables are bound to their contents.</p>
+<p>Note that the Identifier that identifies a dataset (or any other Identifier in AQL) can also be a StringLiteral. This is especially useful to avoid conficts with AQL keywords (e.g. “dataset”, “null”, or “type”).</p>
+<p>The following are three examples of legal dataset access expressions. The first one accesses a dataset called Customers in the dataverse called SalesDV. The second one accesses the Customers dataverse in whatever the current dataverse is. The third one does the same thing as the second but uses a slightly older AQL syntax.</p>
+<div class="section">
+<h5><a name="Examples"></a>Examples</h5>
+
+<div class="source">
+<div class="source">
+<pre>dataset SalesDV.Customers
+dataset Customers
+dataset("Customers")
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Constructors"></a>Constructors</h4>
+
+<div class="source">
+<div class="source">
+<pre>ListConstructor ::= ( OrderedListConstructor | UnorderedListConstructor )
+OrderedListConstructor ::= "[" ( Expression ( "," Expression )* )? "]"
+UnorderedListConstructor ::= "{{" ( Expression ( "," Expression )* )? "}}"
+RecordConstructor ::= "{" ( FieldBinding ( "," FieldBinding )* )? "}"
+FieldBinding ::= Expression ":" Expression
+</pre></div></div>
+<p>A major feature of AQL is its ability to construct new ADM data instances. This is accomplished using its constructors for each of the major ADM complex object structures, namely lists (ordered or unordered) and records. Ordered lists are like JSON arrays, while unordered lists have bag (multiset) semantics. Records are built from attributes that are field-name/field-value pairs, again like JSON. (See the AsterixDB Data Model document for more details on each.)</p>
+<p>The following examples illustrate how to construct a new ordered list with 3 items, a new unordered list with 4 items, and a new record with 2 fields, respectively. List elements can be homogeneous (as in the first example), which is the common case, or they may be heterogeneous (as in the second example). The data values and field name values used to construct lists and records in constructors are all simply AQL expressions. Thus the list elements, field names, and field values used in constructors can be simple literals (as in these three examples) or they can come from query variable references or even arbitrarily complex AQL expressions.</p>
+<div class="section">
+<h5><a name="Examples"></a>Examples</h5>
+
+<div class="source">
+<div class="source">
+<pre>[ "a", "b", "c" ]
+
+{{ 42, "forty-two", "AsterixDB!", 3.14f }}
+
+{
+ "project name": "AsterixDB"
+ "project members": {{ "vinayakb", "dtabass", "chenli" }}
+}
+</pre></div></div></div>
+<div class="section">
+<h5><a name="Note"></a>Note</h5>
+<p>When constructing nested records there needs to be a space between the closing braces to avoid confusion with the <tt>}}</tt> token that ends an unordered list constructor: <tt>{ "a" : { "b" : "c" }}</tt> will fail to parse while <tt>{ "a" : { "b" : "c" } }</tt> will work.</p></div></div></div>
+<div class="section">
+<h3><a name="Path_Expressions"></a>Path Expressions</h3>
+
+<div class="source">
+<div class="source">
+<pre>ValueExpr ::= PrimaryExpr ( Field | Index )*
+Field ::= "." Identifier
+Index ::= "[" ( Expression | "?" ) "]"
+</pre></div></div>
+<p>Components of complex types in ADM are accessed via path expressions. Path access can be applied to the result of an AQL expression that yields an instance of such a type, e.g., a record or list instance. For records, path access is based on field names. For ordered lists, path access is based on (zero-based) array-style indexing. AQL also supports an “I’m feeling lucky” style index accessor, [?], for selecting an arbitrary element from an ordered list. Attempts to access non-existent fields or list elements produce a null (i.e., missing information) result as opposed to signaling a runtime error.</p>
+<p>The following examples illustrate field access for a record, index-based element access for an ordered list, and also a composition thereof.</p>
+<div class="section">
+<div class="section">
+<h5><a name="Examples"></a>Examples</h5>
+
+<div class="source">
+<div class="source">
+<pre>({"list": [ "a", "b", "c"]}).list
+
+(["a", "b", "c"])[2]
+
+({ "list": [ "a", "b", "c"]}).list[2]
+</pre></div></div></div></div></div>
+<div class="section">
+<h3><a name="Logical_Expressions"></a>Logical Expressions</h3>
+
+<div class="source">
+<div class="source">
+<pre>OperatorExpr ::= AndExpr ( "or" AndExpr )*
+AndExpr ::= RelExpr ( "and" RelExpr )*
+</pre></div></div>
+<p>As in most languages, boolean expressions can be built up from smaller expressions by combining them with the logical connectives and/or. Legal boolean values in AQL are true, false, and null. (Nulls in AQL are treated much like SQL treats its unknown truth value in boolean expressions.)</p>
+<p>The following is an example of a conjuctive range predicate in AQL. It will yield true if $a is bound to 4, null if $a is bound to null, and false otherwise.</p>
+<div class="section">
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>$a > 3 and $a < 5
+</pre></div></div></div></div></div>
+<div class="section">
+<h3><a name="Comparison_Expressions"></a>Comparison Expressions</h3>
+
+<div class="source">
+<div class="source">
+<pre>RelExpr ::= AddExpr ( ( "<" | ">" | "<=" | ">=" | "=" | "!=" | "~=" ) AddExpr )?
+</pre></div></div>
+<p>AQL has the usual list of suspects, plus one, for comparing pairs of atomic values. The “plus one” is the last operator listed above, which is the “roughly equal” operator provided for similarity queries. (See the separate document on <a href="similarity.html">AsterixDB Similarity Queries</a> for more details on similarity matching.)</p>
+<p>An example comparison expression (which yields the boolean value true) is shown below.</p>
+<div class="section">
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>5 > 3
+</pre></div></div></div></div></div>
+<div class="section">
+<h3><a name="Arithmetic_Expressions"></a>Arithmetic Expressions</h3>
+
+<div class="source">
+<div class="source">
+<pre>AddExpr ::= MultExpr ( ( "+" | "-" ) MultExpr )*
+MultExpr ::= UnaryExpr ( ( "*" | "/" | "%" | "^"| "idiv" ) UnaryExpr )*
+UnaryExpr ::= ( ( "+" | "-" ) )? ValueExpr
+</pre></div></div>
+<p>AQL also supports the usual cast of characters for arithmetic expressions. The example below evaluates to 25.</p>
+<div class="section">
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>3 ^ 2 + 4 ^ 2
+</pre></div></div></div></div></div>
+<div class="section">
+<h3><a name="FLWOR_Expression"></a>FLWOR Expression</h3>
+
+<div class="source">
+<div class="source">
+<pre>FLWOR ::= ( ForClause | LetClause ) ( Clause )* ("return"|"select") Expression
+Clause ::= ForClause | LetClause | WhereClause | OrderbyClause
+ | GroupClause | LimitClause | DistinctClause
+ForClause ::= ("for"|"from") Variable ( "at" Variable )? "in" ( Expression )
+LetClause ::= ("let"|"with") Variable ":=" Expression
+WhereClause ::= "where" Expression
+OrderbyClause ::= "order" "by" Expression ( ( "asc" ) | ( "desc" ) )?
+ ( "," Expression ( ( "asc" ) | ( "desc" ) )? )*
+GroupClause ::= "group" "by" ( Variable ":=" )? Expression ( "," ( Variable ":=" )? Expression )*
+ ("with"|"keeping") VariableRef ( "," VariableRef )*
+LimitClause ::= "limit" Expression ( "offset" Expression )?
+DistinctClause ::= "distinct" "by" Expression ( "," Expression )*
+Variable ::= <VARIABLE>
+</pre></div></div>
+<p>The heart of AQL is the FLWOR (for-let-where-orderby-return) expression. The roots of this expression were borrowed from the expression of the same name in XQuery. A FLWOR expression starts with one or more clauses that establish variable bindings. A <tt>for</tt> clause binds a variable incrementally to each element of its associated expression; it includes an optional positional variable for counting/numbering the bindings. By default no ordering is implied or assumed by a <tt>for</tt> clause. A <tt>let</tt> clause binds a variable to the collection of elements computed by its associated expression.</p>
+<p>Following the initial <tt>for</tt> or <tt>let</tt> clause(s), a FLWOR expression may contain an arbitrary sequence of other clauses. The <tt>where</tt> clause in a FLWOR expression filters the preceding bindings via a boolean expression, much like a <tt>where</tt> clause does in a SQL query. The <tt>order by</tt> clause in a FLWOR expression induces an ordering on the data. The <tt>group by</tt> clause, discussed further below, forms groups based on its group by expressions, optionally naming the expressions’ values (which together form the grouping key for the expression). The <tt>with</tt> subclause of a <tt>group by</tt> clause specifies the variable(s) whose values should be grouped based on the grouping key(s); following the grouping clause, only the grouping key(s) and the variables named in the with subclause remain in scope, and the named grouping variables now contain lists formed from their input values. The <tt>limit</tt> clause caps the number of values returned, optionally starting its result count from a specified offset. (Web applications can use this feature for doing pagination.) The <tt>distinct</tt> clause is similar to the <tt>group-by</tt> clause, but it forms no groups; it serves only to eliminate duplicate values. As indicated by the grammar, the clauses in an AQL query can appear in any order. To interpret a query, one can think of data as flowing down through the query from the first clause to the <tt>return</tt> clause.</p>
+<p>The following example shows a FLWOR expression that selects and returns one user from the dataset FacebookUsers.</p>
+<div class="section">
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>for $user in dataset FacebookUsers
+where $user.id = 8
+return $user
+</pre></div></div>
+<p>The next example shows a FLWOR expression that joins two datasets, FacebookUsers and FacebookMessages, returning user/message pairs. The results contain one record per pair, with result records containing the user’s name and an entire message.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>for $user in dataset FacebookUsers
+for $message in dataset FacebookMessages
+where $message.author-id = $user.id
+return
+ {
+ "uname": $user.name,
+ "message": $message.message
+ };
+</pre></div></div>
+<p>In the next example, a <tt>let</tt> clause is used to bind a variable to all of a user’s FacebookMessages. The query returns one record per user, with result records containing the user’s name and the set of all messages by that user.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>for $user in dataset FacebookUsers
+let $messages :=
+ for $message in dataset FacebookMessages
+ where $message.author-id = $user.id
+ return $message.message
+return
+ {
+ "uname": $user.name,
+ "messages": $messages
+ };
+</pre></div></div>
+<p>The following example returns all TwitterUsers ordered by their followers count (most followers first) and language. When ordering <tt>null</tt> is treated as being smaller than any other value if <tt>null</tt>s are encountered in the ordering key(s).</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre> for $user in dataset TwitterUsers
+ order by $user.followers_count desc, $user.lang asc
+ return $user
+</pre></div></div>
+<p>The next example illustrates the use of the <tt>group by</tt> clause in AQL. After the <tt>group by</tt> clause in the query, only variables that are either in the <tt>group by</tt> list or in the <tt>with</tt> list are in scope. The variables in the clause’s <tt>with</tt> list will each contain a collection of items following the <tt>group by</tt> clause; the collected items are the values that the source variable was bound to in the tuples that formed the group. For grouping <tt>null</tt> is handled as a single value.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre> for $x in dataset FacebookMessages
+ let $messages := $x.message
+ group by $loc := $x.sender-location with $messages
+ return
+ {
+ "location" : $loc,
+ "message" : $messages
+ }
+</pre></div></div>
+<p>The use of the <tt>limit</tt> clause is illustrated in the next example.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre> for $user in dataset TwitterUsers
+ order by $user.followers_count desc
+ limit 2
+ return $user
+</pre></div></div>
+<p>The final example shows how AQL’s <tt>distinct by</tt> clause works. Each variable in scope before the distinct clause is also in scope after the <tt>distinct by</tt> clause. This clause works similarly to <tt>group by</tt>, but for each variable that contains more than one value after the <tt>distinct by</tt> clause, one value is picked nondeterministically. (If the variable is in the <tt>distinct by</tt> list, then its value will be deterministic.) Nulls are treated as a single value when they occur in a grouping field.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre> for $x in dataset FacebookMessages
+ distinct by $x.sender-location
+ return
+ {
+ "location" : $x.sender-location,
+ "message" : $x.message
+ }
+</pre></div></div>
+<p>In order to allow SQL fans to write queries in their favored ways, AQL provides synonyms: <i>from</i> for <i>for</i>, <i>select</i> for <i>return</i>, <i>with</i> for <i>let</i>, and <i>keeping</i> for <i>with</i> in the group by clause. The following query is such an example.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre> from $x in dataset FacebookMessages
+ with $messages := $x.message
+ group by $loc := $x.sender-location keeping $messages
+ select
+ {
+ "location" : $loc,
+ "message" : $messages
+ }
+</pre></div></div></div></div></div>
+<div class="section">
+<h3><a name="Conditional_Expression"></a>Conditional Expression</h3>
+
+<div class="source">
+<div class="source">
+<pre>IfThenElse ::= "if" "(" Expression ")" "then" Expression "else" Expression
+</pre></div></div>
+<p>A conditional expression is useful for choosing between two alternative values based on a boolean condition. If its first (<tt>if</tt>) expression is true, its second (<tt>then</tt>) expression’s value is returned, and otherwise its third (<tt>else</tt>) expression is returned.</p>
+<p>The following example illustrates the form of a conditional expression.</p>
+<div class="section">
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>if (2 < 3) then "yes" else "no"
+</pre></div></div></div></div></div>
+<div class="section">
+<h3><a name="Quantified_Expressions"></a>Quantified Expressions</h3>
+
+<div class="source">
+<div class="source">
+<pre>QuantifiedExpression ::= ( ( "some" ) | ( "every" ) ) Variable "in" Expression
+ ( "," Variable "in" Expression )* "satisfies" Expression
+</pre></div></div>
+<p>Quantified expressions are used for expressing existential or universal predicates involving the elements of a collection.</p>
+<p>The following pair of examples illustrate the use of a quantified expression to test that every (or some) element in the set [1, 2, 3] of integers is less than three. The first example yields <tt>false</tt> and second example yields <tt>true</tt>.</p>
+<p>It is useful to note that if the set were instead the empty set, the first expression would yield <tt>true</tt> (“every” value in an empty set satisfies the condition) while the second expression would yield <tt>false</tt> (since there isn’t “some” value, as there are no values in the set, that satisfies the condition).</p>
+<div class="section">
+<div class="section">
+<h5><a name="Examples"></a>Examples</h5>
+
+<div class="source">
+<div class="source">
+<pre>every $x in [ 1, 2, 3 ] satisfies $x < 3
+some $x in [ 1, 2, 3 ] satisfies $x < 3
+</pre></div></div></div></div></div></div>
+<div class="section">
+<h2><a name="a3._Statements_Back_to_TOC"></a><a name="Statements" id="Statements">3. Statements</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+
+<div class="source">
+<div class="source">
+<pre>Statement ::= ( SingleStatement ( ";" )? )* <EOF>
+SingleStatement ::= DataverseDeclaration
+ | FunctionDeclaration
+ | CreateStatement
+ | DropStatement
+ | LoadStatement
+ | SetStatement
+ | InsertStatement
+ | DeleteStatement
+ | Query
+</pre></div></div>
+<p>In addition to expresssions for queries, AQL supports a variety of statements for data definition and manipulation purposes as well as controlling the context to be used in evaluating AQL expressions. AQL supports record-level ACID transactions that begin and terminate implicitly for each record inserted, deleted, or searched while a given AQL statement is being executed.</p>
+<p>This section details the statements supported in the AQL language.</p>
+<div class="section">
+<h3><a name="Declarations"></a>Declarations</h3>
+
+<div class="source">
+<div class="source">
+<pre>DataverseDeclaration ::= "use" "dataverse" Identifier
+</pre></div></div>
+<p>The world of data in an AsterixDB cluster is organized into data namespaces called dataverses. To set the default dataverse for a series of statements, the use dataverse statement is provided.</p>
+<p>As an example, the following statement sets the default dataverse to be TinySocial.</p>
+<div class="section">
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>use dataverse TinySocial;
+</pre></div></div>
+<p>The set statement in AQL is used to control aspects of the expression evalation context for queries.</p>
+
+<div class="source">
+<div class="source">
+<pre>SetStatement ::= "set" Identifier StringLiteral
+</pre></div></div>
+<p>As an example, the following set statements request that Jaccard similarity with a similarity threshold 0.6 be used for set similarity matching when the ~= operator is used in a query expression.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>set simfunction "jaccard";
+set simthreshold "0.6f";
+</pre></div></div>
+<p>When writing a complex AQL query, it can sometimes be helpful to define one or more auxilliary functions that each address a sub-piece of the overall query. The declare function statement supports the creation of such helper functions.</p>
+
+<div class="source">
+<div class="source">
+<pre>FunctionDeclaration ::= "declare" "function" Identifier ParameterList "{" Expression "}"
+ParameterList ::= "(" ( <VARIABLE> ( "," <VARIABLE> )* )? ")"
+</pre></div></div>
+<p>The following is a very simple example of a temporary AQL function definition.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>declare function add($a, $b) {
+ $a + $b
+};
+</pre></div></div></div></div></div>
+<div class="section">
+<h3><a name="Lifecycle_Management_Statements"></a>Lifecycle Management Statements</h3>
+
+<div class="source">
+<div class="source">
+<pre>CreateStatement ::= "create" ( DataverseSpecification
+ | TypeSpecification
+ | DatasetSpecification
+ | IndexSpecification
+ | FunctionSpecification )
+
+QualifiedName ::= Identifier ( "." Identifier )?
+DoubleQualifiedName ::= Identifier "." Identifier ( "." Identifier )?
+</pre></div></div>
+<p>The create statement in AQL is used for creating persistent artifacts in the context of dataverses. It can be used to create new dataverses, datatypes, datasets, indexes, and user-defined AQL functions.</p>
+<div class="section">
+<h4><a name="Dataverses"></a>Dataverses</h4>
+
+<div class="source">
+<div class="source">
+<pre>DataverseSpecification ::= "dataverse" Identifier IfNotExists ( "with format" StringLiteral )?
+</pre></div></div>
+<p>The create dataverse statement is used to create new dataverses. To ease the authoring of reusable AQL scripts, its optional IfNotExists clause allows creation to be requested either unconditionally or only if the the dataverse does not already exist. If this clause is absent, an error will be returned if the specified dataverse already exists. The <tt>with format</tt> clause is a placeholder for future functionality that can safely be ignored.</p>
+<p>The following example creates a dataverse named TinySocial.</p>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create dataverse TinySocial;
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Types"></a>Types</h4>
+
+<div class="source">
+<div class="source">
+<pre>TypeSpecification ::= "type" FunctionOrTypeName IfNotExists "as" TypeExpr
+FunctionOrTypeName ::= QualifiedName
+IfNotExists ::= ( "if not exists" )?
+TypeExpr ::= RecordTypeDef | TypeReference | OrderedListTypeDef | UnorderedListTypeDef
+RecordTypeDef ::= ( "closed" | "open" )? "{" ( RecordField ( "," RecordField )* )? "}"
+RecordField ::= Identifier ":" ( TypeExpr ) ( "?" )?
+NestedField ::= Identifier ( "." Identifier )*
+IndexField ::= NestedField ( ":" TypeReference )?
+TypeReference ::= Identifier
+OrderedListTypeDef ::= "[" ( TypeExpr ) "]"
+UnorderedListTypeDef ::= "{{" ( TypeExpr ) "}}"
+</pre></div></div>
+<p>The create type statement is used to create a new named ADM datatype. This type can then be used to create datasets or utilized when defining one or more other ADM datatypes. Much more information about the Asterix Data Model (ADM) is available in the <a href="datamodel.html">data model reference guide</a> to ADM. A new type can be a record type, a renaming of another type, an ordered list type, or an unordered list type. A record type can be defined as being either open or closed. Instances of a closed record type are not permitted to contain fields other than those specified in the create type statement. Instances of an open record type may carry additional fields, and open is the default for a new type (if neither option is specified).</p>
+<p>The following example creates a new ADM record type called FacebookUser type. Since it is closed, its instances will contain only what is specified in the type definition. The first four fields are traditional typed name/value pairs. The friend-ids field is an unordered list of 32-bit integers. The employment field is an ordered list of instances of another named record type, EmploymentType.</p>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create type FacebookUserType as closed {
+ "id" : int32,
+ "alias" : string,
+ "name" : string,
+ "user-since" : datetime,
+ "friend-ids" : {{ int32 }},
+ "employment" : [ EmploymentType ]
+}
+</pre></div></div>
+<p>The next example creates a new ADM record type called FbUserType. Note that the type of the id field is UUID. You need to use this field type if you want to have this field be an autogenerated-PK field. Refer to the Datasets section later for more details.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create type FbUserType as closed {
+ "id" : uuid,
+ "alias" : string,
+ "name" : string
+}
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Datasets"></a>Datasets</h4>
+
+<div class="source">
+<div class="source">
+<pre>DatasetSpecification ::= "internal"? "dataset" QualifiedName "(" Identifier ")" IfNotExists
+ PrimaryKey ( "on" Identifier )? ( "hints" Properties )?
+ ( "using" "compaction" "policy" CompactionPolicy ( Configuration )? )?
+ ( "with filter on" Identifier )?
+ | "external" "dataset" QualifiedName "(" Identifier ")" IfNotExists
+ "using" AdapterName Configuration ( "hints" Properties )?
+ ( "using" "compaction" "policy" CompactionPolicy ( Configuration )? )?
+AdapterName ::= Identifier
+Configuration ::= "(" ( KeyValuePair ( "," KeyValuePair )* )? ")"
+KeyValuePair ::= "(" StringLiteral "=" StringLiteral ")"
+Properties ::= ( "(" Property ( "," Property )* ")" )?
+Property ::= Identifier "=" ( StringLiteral | IntegerLiteral )
+FunctionSignature ::= FunctionOrTypeName "@" IntegerLiteral
+PrimaryKey ::= "primary" "key" NestedField ( "," NestedField )* ( "autogenerated ")?
+CompactionPolicy ::= Identifier
+PrimaryKey ::= "primary" "key" Identifier ( "," Identifier )* ( "autogenerated ")?
+</pre></div></div>
+<p>The create dataset statement is used to create a new dataset. Datasets are named, unordered collections of ADM record instances; they are where data lives persistently and are the targets for queries in AsterixDB. Datasets are typed, and AsterixDB will ensure that their contents conform to their type definitions. An Internal dataset (the default) is a dataset that is stored in and managed by AsterixDB. It must have a specified unique primary key that can be used to partition data across nodes of an AsterixDB cluster. The primary key is also used in secondary indexes to uniquely identify the indexed primary data records. Random primary key (UUID) values can be auto-generated by declaring the field to be UUID and putting “autogenerated” after the “primary key” identifier. In this case, values for the auto-generated PK field should not be provided by the user since it will be auto-generated by AsterixDB. Optionally, a filter can be created on a field to further optimize range queries with predicates on the filter’s field. (Refer to <a href="filters.html">Filter-Based LSM Index Acceleration</a> for more information about filters.)</p>
+<p>An External dataset is stored outside of AsterixDB (currently datasets in HDFS or on the local filesystem(s) of the cluster’s nodes are supported). External dataset support allows AQL queries to treat external data as though it were stored in AsterixDB, making it possible to query “legacy” file data (e.g., Hive data) without having to physically import it into AsterixDB. For an external dataset, an appropriate adapter must be selected to handle the nature of the desired external data. (See the <a href="externaldata.html">guide to external data</a> for more information on the available adapters.)</p>
+<p>When creating a dataset, it is possible to choose a merge policy that controls which of the underlaying LSM storage components to be merged. Currently, AsterixDB provides four different merge policies that can be configured per dataset: no-merge, constant, prefix, and correlated-prefix. The no-merge policy simply never merges disk components. While the constant policy merges disk components when the number of components reaches some constant number k, which can be configured by the user. The prefix policy relies on component sizes and the number of components to decide which components to merge. Specifically, it works by first trying to identify the smallest ordered (oldest to newest) sequence of components such that the sequence does not contain a single component that exceeds some threshold size M and that either the sum of the component’s sizes exceeds M or the number of components in the sequence exceeds another threshold C. If such a sequence of components exists, then each of the components in the sequence are merged together to form a single component. Finally, the correlated-prefix is similar to the prefix policy but it delegates the decision of merging the disk components of all the indexes in a dataset to the primary index. When the policy decides that the primary index needs to be merged (using the same decision criteria as for the prefix policy), then it will issue successive merge requests on behalf of all other indexes associated with the same dataset. The default policy for AsterixDB is the prefix policy except when there is a filter on a dataset, where the preferred policy for filters is the correlated-prefix.</p>
+<p>The following example creates an internal dataset for storing FacefookUserType records. It specifies that their id field is their primary key.</p>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create internal dataset FacebookUsers(FacebookUserType) primary key id;
+</pre></div></div>
+<p>The following example creates an internal dataset for storing FbUserType records. It specifies that their id field is their primary key. It also specifies that the id field is an auto-generated field, meaning that a randomly generated UUID value will be assigned to each record by the system. (A user should therefore not proivde a value for this field.) Note that the id field should be UUID.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create internal dataset FbMsgs(FbUserType) primary key id autogenerated;
+</pre></div></div>
+<p>The next example creates an external dataset for storing LineitemType records. The choice of the <tt>hdfs</tt> adapter means that its data will reside in HDFS. The create statement provides parameters used by the hdfs adapter: the URL and path needed to locate the data in HDFS and a description of the data format.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create external dataset Lineitem('LineitemType) using hdfs (
+ ("hdfs"="hdfs://HOST:PORT"),
+ ("path"="HDFS_PATH"),
+ ("input-format"="text-input-format"),
+ ("format"="delimited-text"),
+ ("delimiter"="|"));
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Indices"></a>Indices</h4>
+
+<div class="source">
+<div class="source">
+<pre>IndexSpecification ::= "index" Identifier IfNotExists "on" QualifiedName
+ "(" ( IndexField ) ( "," IndexField )* ")" ( "type" IndexType )? ( "enforced" )?
+IndexType ::= "btree"
+ | "rtree"
+ | "keyword"
+ | "ngram" "(" IntegerLiteral ")"
+</pre></div></div>
+<p>The create index statement creates a secondary index on one or more fields of a specified dataset. Supported index types include <tt>btree</tt> for totally ordered datatypes, <tt>rtree</tt> for spatial data, and <tt>keyword</tt> and <tt>ngram</tt> for textual (string) data. An index can be created on a nested field (or fields) by providing a valid path expression as an index field identifier. An index field is not required to be part of the datatype associated with a dataset if that datatype is declared as open and the field’s type is provided along with its type and the <tt>enforced</tt> keyword is specified in the end of index definition. <tt>Enforcing</tt> an open field will introduce a check that will make sure that the actual type of an indexed field (if the field exists in the record) always matches this specified (open) field type.</p>
+<p>The following example creates a btree index called fbAuthorIdx on the author-id field of the FacebookMessages dataset. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the author-id field.</p>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create index fbAuthorIdx on FacebookMessages(author-id) type btree;
+</pre></div></div>
+<p>The following example creates an open btree index called fbSendTimeIdx on the open send-time field of the FacebookMessages dataset having datetime type. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the send-time field.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create index fbSendTimeIdx on FacebookMessages(send-time:datetime) type btree enforced;
+</pre></div></div>
+<p>The following example creates a btree index called twUserScrNameIdx on the screen-name field, which is a nested field of the user field in the TweetMessages dataset. This index can be useful for accelerating exact-match queries, range search queries, and joins involving the screen-name field.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create index twUserScrNameIdx on TweetMessages(user.screen-name) type btree;
+</pre></div></div>
+<p>The following example creates an rtree index called fbSenderLocIdx on the sender-location field of the FacebookMessages dataset. This index can be useful for accelerating queries that use the <a href="functions.html#spatial-intersect"><tt>spatial-intersect</tt> function</a> in a predicate involving the sender-location field.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create index fbSenderLocIndex on FacebookMessages(sender-location) type rtree;
+</pre></div></div>
+<p>The following example creates a 3-gram index called fbUserIdx on the name field of the FacebookUsers dataset. This index can be used to accelerate some similarity or substring maching queries on the name field. For details refer to the <a href="similarity.html#NGram_Index">document on similarity queries</a>.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create index fbUserIdx on FacebookUsers(name) type ngram(3);
+</pre></div></div>
+<p>The following example creates a keyword index called fbMessageIdx on the message field of the FacebookMessages dataset. This keyword index can be used to optimize queries with token-based similarity predicates on the message field. For details refer to the <a href="similarity.html#Keyword_Index">document on similarity queries</a>.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create index fbMessageIdx on FacebookMessages(message) type keyword;
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Functions"></a>Functions</h4>
+<p>The create function statement creates a named function that can then be used and reused in AQL queries. The body of a function can be any AQL expression involving the function’s parameters.</p>
+
+<div class="source">
+<div class="source">
+<pre>FunctionSpecification ::= "function" FunctionOrTypeName IfNotExists ParameterList "{" Expression "}"
+</pre></div></div>
+<p>The following is a very simple example of a create function statement. It differs from the declare function example shown previously in that it results in a function that is persistently registered by name in the specified dataverse.</p>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>create function add($a, $b) {
+ $a + $b
+};
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Removal"></a>Removal</h4>
+
+<div class="source">
+<div class="source">
+<pre>DropStatement ::= "drop" ( "dataverse" Identifier IfExists
+ | "type" FunctionOrTypeName IfExists
+ | "dataset" QualifiedName IfExists
+ | "index" DoubleQualifiedName IfExists
+ | "function" FunctionSignature IfExists )
+IfExists ::= ( "if" "exists" )?
+</pre></div></div>
+<p>The drop statement in AQL is the inverse of the create statement. It can be used to drop dataverses, datatypes, datasets, indexes, and functions.</p>
+<p>The following examples illustrate uses of the drop statement.</p>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>drop dataset FacebookUsers if exists;
+
+drop index fbSenderLocIndex;
+
+drop type FacebookUserType;
+
+drop dataverse TinySocial;
+
+drop function add;
+</pre></div></div></div></div></div>
+<div class="section">
+<h3><a name="ImportExport_Statements"></a>Import/Export Statements</h3>
+
+<div class="source">
+<div class="source">
+<pre>LoadStatement ::= "load" "dataset" QualifiedName "using" AdapterName Configuration ( "pre-sorted" )?
+</pre></div></div>
+<p>The load statement is used to initially populate a dataset via bulk loading of data from an external file. An appropriate adapter must be selected to handle the nature of the desired external data. The load statement accepts the same adapters and the same parameters as external datasets. (See the <a href="externaldata.html">guide to external data</a> for more information on the available adapters.) If a dataset has an auto-generated primary key field, a file to be imported should not include that field in it.</p>
+<p>The following example shows how to bulk load the FacebookUsers dataset from an external file containing data that has been prepared in ADM format.</p>
+<div class="section">
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>load dataset FacebookUsers using localfs
+(("path"="localhost:///Users/zuck/AsterixDB/load/fbu.adm"),("format"="adm"));
+</pre></div></div></div></div></div>
+<div class="section">
+<h3><a name="Modification_Statements"></a>Modification Statements</h3>
+<div class="section">
+<h4><a name="Insert"></a>Insert</h4>
+
+<div class="source">
+<div class="source">
+<pre>InsertStatement ::= "insert" "into" "dataset" QualifiedName Query
+</pre></div></div>
+<p>The AQL insert statement is used to insert data into a dataset. The data to be inserted comes from an AQL query expression. The expression can be as simple as a constant expression, or in general it can be any legal AQL query. Inserts in AsterixDB are processed transactionally, with the scope of each insert transaction being the insertion of a single object plus its affiliated secondary index entries (if any). If the query part of an insert returns a single object, then the insert statement itself will be a single, atomic transaction. If the query part returns multiple objects, then each object inserted will be handled independently as a tranaction. If a dataset has an auto-generated primary key field, an insert statement should not include a value for that field in it. (The system will automatically extend the provided record with this additional field and a corresponding value.)</p>
+<p>The following example illustrates a query-based insertion.</p>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>insert into dataset UsersCopy (for $user in dataset FacebookUsers return $user)
+</pre></div></div></div></div>
+<div class="section">
+<h4><a name="Delete"></a>Delete</h4>
+
+<div class="source">
+<div class="source">
+<pre>DeleteStatement ::= "delete" Variable "from" "dataset" QualifiedName ( "where" Expression )?
+</pre></div></div>
+<p>The AQL delete statement is used to delete data from a target dataset. The data to be deleted is identified by a boolean expression involving the variable bound to the target dataset in the delete statement. Deletes in AsterixDB are processed transactionally, with the scope of each delete transaction being the deletion of a single object plus its affiliated secondary index entries (if any). If the boolean expression for a delete identifies a single object, then the delete statement itself will be a single, atomic transaction. If the expression identifies multiple objects, then each object deleted will be handled independently as a transaction.</p>
+<p>The following example illustrates a single-object deletion.</p>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>delete $user from dataset FacebookUsers where $user.id = 8;
+</pre></div></div>
+<p>We close this guide to AQL with one final example of a query expression.</p></div>
+<div class="section">
+<h5><a name="Example"></a>Example</h5>
+
+<div class="source">
+<div class="source">
+<pre>for $praise in {{ "great", "brilliant", "awesome" }}
+return
+ string-concat(["AsterixDB is ", $praise])
+</pre></div></div></div></div></div></div>
+ </div>
+ </div>
+ </div>
+
+ <hr/>
+
+ <footer>
+ <div class="container-fluid">
+ <div class="row span12">Copyright © 2015
+ <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+ All Rights Reserved.
+
+ </div>
+
+ <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+ feather logo, and the Apache AsterixDB project logo are either
+ registered trademarks or trademarks of The Apache Software
+ Foundation in the United States and other countries.
+ All other marks mentioned may be trademarks or registered
+ trademarks of their respective owners.</div>
+
+
+ </div>
+ </footer>
+ </body>
+</html>
diff --git a/docs/0.8.7-incubating/aql/primer-sql-like.html b/docs/0.8.7-incubating/aql/primer-sql-like.html
new file mode 100644
index 0000000..98e82ec
--- /dev/null
+++ b/docs/0.8.7-incubating/aql/primer-sql-like.html
@@ -0,0 +1,886 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2015-11-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+ <head>
+ <meta charset="UTF-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+ <meta name="Date-Revision-yyyymmdd" content="20151124" />
+ <meta http-equiv="Content-Language" content="en" />
+ <title>AsterixDB – AsterixDB 101: An ADM and AQL Primer (for SQL fans)</title>
+ <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+ <link rel="stylesheet" href="../css/site.css" />
+ <link rel="stylesheet" href="../css/print.css" media="print" />
+
+
+ <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+
+
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');</script>
+
+ </head>
+ <body class="topBarDisabled">
+
+
+
+
+ <div class="container-fluid">
+ <div id="banner">
+ <div class="pull-left">
+ <a href="http://asterixdb.apache.org/" id="bannerLeft">
+ <img src="../images/asterixlogo.png" alt="AsterixDB"/>
+ </a>
+ </div>
+ <div class="pull-right"> </div>
+ <div class="clear"><hr/></div>
+ </div>
+
+ <div id="breadcrumbs">
+ <ul class="breadcrumb">
+
+
+ <li id="publishDate">Last Published: 2015-11-24</li>
+
+
+
+ <li id="projectVersion" class="pull-right">Version: 0.8.7-incubating</li>
+
+ <li class="divider pull-right">|</li>
+
+ <li class="pull-right"> <a href="../index.html" title="Documentation Home">
+ Documentation Home</a>
+ </li>
+
+ </ul>
+ </div>
+
+
+ <div class="row-fluid">
+ <div id="leftColumn" class="span3">
+ <div class="well sidebar-nav">
+
+
+ <ul class="nav nav-list">
+ <li class="nav-header">Documentation</li>
+
+ <li>
+
+ <a href="../install.html" title="Installing and Managing AsterixDB using Managix">
+ <i class="none"></i>
+ Installing and Managing AsterixDB using Managix</a>
+ </li>
+
+ <li>
+
+ <a href="../yarn.html" title="Deploying AsterixDB using YARN">
+ <i class="none"></i>
+ Deploying AsterixDB using YARN</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer.html" title="AsterixDB 101: An ADM and AQL Primer">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer</a>
+ </li>
+
+ <li class="active">
+
+ <a href="#"><i class="none"></i>AsterixDB 101: An ADM and AQL Primer (For SQL Fans)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/js-sdk.html" title="AsterixDB Javascript SDK">
+ <i class="none"></i>
+ AsterixDB Javascript SDK</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/datamodel.html" title="Asterix Data Model (ADM)">
+ <i class="none"></i>
+ Asterix Data Model (ADM)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/manual.html" title="Asterix Query Language (AQL)">
+ <i class="none"></i>
+ Asterix Query Language (AQL)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/functions.html" title="AQL Functions">
+ <i class="none"></i>
+ AQL Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/allens.html" title="AQL Allen's Relations Functions">
+ <i class="none"></i>
+ AQL Allen's Relations Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/similarity.html" title="AQL Support of Similarity Queries">
+ <i class="none"></i>
+ AQL Support of Similarity Queries</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/externaldata.html" title="Accessing External Data">
+ <i class="none"></i>
+ Accessing External Data</a>
+ </li>
+
+ <li>
+
+ <a href="../feeds/tutorial.html" title="Support for Data Ingestion in AsterixDB">
+ <i class="none"></i>
+ Support for Data Ingestion in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../udf.html" title="Support for User Defined Functions in AsterixDB">
+ <i class="none"></i>
+ Support for User Defined Functions in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
+ <i class="none"></i>
+ Filter-Based LSM Index Acceleration</a>
+ </li>
+
+ <li>
+
+ <a href="../api.html" title="HTTP API to AsterixDB">
+ <i class="none"></i>
+ HTTP API to AsterixDB</a>
+ </li>
+ </ul>
+
+
+
+ <hr class="divider" />
+
+ <div id="poweredBy">
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <a href="https://code.google.com/p/hyracks/" title="Hyracks" class="builtBy">
+ <img class="builtBy" alt="Hyracks" src="../images/hyrax_ts.png" />
+ </a>
+ </div>
+ </div>
+ </div>
+
+
+ <div id="bodyColumn" class="span9" >
+
+ <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements. See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership. The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License. You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied. See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>AsterixDB 101: An ADM and AQL Primer (for SQL fans)</h1>
+<div class="section">
+<h2><a name="Welcome_to_AsterixDB"></a>Welcome to AsterixDB!</h2>
+<p>This document introduces the main features of AsterixDB’s data model (ADM) and query language (AQL) by example. The example is a simple scenario involving (synthetic) sample data modeled after data from the social domain. This document describes a set of sample ADM datasets, together with a set of illustrative AQL queries (in a SQL-like form), to introduce you to the “AsterixDB user experience”. The complete set of steps required to create and load a handful of sample datasets, along with runnable queries and the expected results for each query, are included.</p>
+<p>This document assumes that you are at least vaguely familiar with AsterixDB and why you might want to use it. Most importantly, it assumes you already have a running instance of AsterixDB and that you know how to query it using AsterixDB’s basic web interface. For more information on these topics, you should go through the steps in <a href="../install.html">Installing Asterix Using Managix</a> before reading this document and make sure that you have a running AsterixDB instance ready to go. To get your feet wet, you should probably start with a simple local installation of AsterixDB on your favorite machine, accepting all of the default settings that Managix offers. Later you can graduate to trying AsterixDB on a cluster, its real intended home (since it targets Big Data). (Note: With the exception of specifying the correct locations where you put the source data for this example, there should no changes needed in your ADM or AQL statements to run the examples locally and/or to run them on a cluster when you are ready to take that step.)</p>
+<p>As you read through this document, you should try each step for yourself on your own AsterixDB instance. Once you have reached the end, you will be fully armed and dangerous, with all the basic AsterixDB knowledge that you’ll need to start down the path of modeling, storing, and querying your own semistructured data.</p>
+<p>—-</p></div>
+<div class="section">
+<h2><a name="ADM:_Modeling_Semistructed_Data_in_AsterixDB"></a>ADM: Modeling Semistructed Data in AsterixDB</h2>
+<p>In this section you will learn all about modeling Big Data using ADM, the data model of the AsterixDB BDMS.</p>
+<div class="section">
+<h3><a name="Dataverses_Datatypes_and_Datasets"></a>Dataverses, Datatypes, and Datasets</h3>
+<p>The top-level organizing concept in the AsterixDB world is the <i>dataverse</i>. A dataverse—short for “data universe”—is a place (similar to a database in a relational DBMS) in which to create and manage the types, datasets, functions, and other artifacts for a given AsterixDB application. When you start using an AsterixDB instance for the first time, it starts out “empty”; it contains no data other than the AsterixDB system catalogs (which live in a special dataverse called the Metadata dataverse). To store your data in AsterixDB, you will first create a dataverse and then you use it for the <i>datatypes</i> and <i>datasets</i> for managing your own data. A datatype tells AsterixDB what you know (or more accurately, what you want it to know) a priori about one of the kinds of data instances that you want AsterixDB to hold for you. A dataset is a collection of data instances of a datatype, and AsterixDB makes sure that the data instances that you put in it conform to its specified type. Since AsterixDB targets semistructured data, you can use <i>open</i> datatypes and tell it as little or as much as you wish about your data up front; the more you tell it up front, the less information it will have to store repeatedly in the individual data instances that you give it. Instances of open datatypes are permitted to have additional content, beyond what the datatype says, as long as they at least contain the information prescribed by the datatype definition. Open typing allows data to vary from one instance to another and it leaves wiggle room for application evolution in terms of what might need to be stored in the future. If you want to restrict data instances in a dataset to have only what the datatype says, and nothing extra, you can define a <i>closed</i> datatype for that dataset and AsterixDB will keep users from storing objects that have extra data in them. Datatypes are open by default unless you tell AsterixDB otherwise. Let’s put these concepts to work</p>
+<p>Our little sample scenario involves hypothetical information about users of two popular social networks, Facebook and Twitter, and their messages. We’ll start by defining a dataverse called “TinySocial” to hold our datatypes and datasets. The AsterixDB data model (ADM) is essentially a superset of JSON—it’s what you get by extending JSON with more data types and additional data modeling constructs borrowed from object databases. The following is how we can create the TinySocial dataverse plus a set of ADM types for modeling Twitter users, their Tweets, Facebook users, their users’ employment information, and their messages. (Note: Keep in mind that this is just a tiny and somewhat silly example intended for illustrating some of the key features of AsterixDB. :-))</p>
+
+<div class="source">
+<div class="source">
+<pre> drop dataverse TinySocial if exists;
+ create dataverse TinySocial;
+ use dataverse TinySocial;
+
+ create type TwitterUserType as open {
+ screen-name: string,
+ lang: string,
+ friends_count: int32,
+ statuses_count: int32,
+ name: string,
+ followers_count: int32
+ }
+ create type TweetMessageType as closed {
+ tweetid: string,
+ user: TwitterUserType,
+ sender-location: point?,
+ send-time: datetime,
+ referred-topics: {{ string }},
+ message-text: string
+ }
+ create type EmploymentType as open {
+ organization-name: string,
+ start-date: date,
+ end-date: date?
+ }
+ create type FacebookUserType as closed {
+ id: int32,
+ alias: string,
+ name: string,
+ user-since: datetime,
+ friend-ids: {{ int32 }},
+ employment: [EmploymentType]
+ }
+ create type FacebookMessageType as closed {
+ message-id: int32,
+ author-id: int32,
+ in-response-to: int32?,
+ sender-location: point?,
+ message: string
+ }
+</pre></div></div>
+<p>The first three lines above tell AsterixDB to drop the old TinySocial dataverse, if one already exists, and then to create a brand new one and make it the focus of the statements that follow. The first type creation statement creates a datatype for holding information about Twitter users. It is a record type with a mix of integer and string data, very much like a (flat) relational tuple. The indicated fields are all mandatory, but because the type is open, additional fields are welcome. The second statement creates a datatype for Twitter messages; this shows how to specify a closed type. Interestingly (based on one of Twitter’s APIs), each Twitter message actually embeds an instance of the sending user’s information (current as of when the message was sent), so this is an example of a nested record in ADM. Twitter messages can optionally contain the sender’s location, which is modeled via the sender-location field of spatial type <i>point</i>; the question mark following the field type indicates its optionality. An optional field is like a nullable field in SQL—it may be present or missing, but when it’s present, its data type will conform to the datatype’s specification. The send-time field illustrates the use of a temporal primitive type, <i>datetime</i>. Lastly, the referred-topics field illustrates another way that ADM is richer than the relational model; this field holds a bag (a.k.a. an unordered list) of strings. Since the overall datatype definition for Twitter messages says “closed”, the fields that it lists are the only fields that instances of this type will be allowed to contain. The next two create type statements create a record type for holding information about one component of the employment history of a Facebook user and then a record type for holding the user information itself. The Facebook user type highlights a few additional ADM data model features. Its friend-ids field is a bag of integers, presumably the Facebook user ids for this user’s friends, and its employment field is an ordered list of employment records. The final create type statement defines a type for handling the content of a Facebook message in our hypothetical social data storage scenario.</p>
+<p>Before going on, we need to once again emphasize the idea that AsterixDB is aimed at storing and querying not just Big Data, but Big <i>Semistructured</i> Data. This means that most of the fields listed in the create type statements above could have been omitted without changing anything other than the resulting size of stored data instances on disk. AsterixDB stores its information about the fields defined a priori as separate metadata, whereas the information about other fields that are “just there” in instances of open datatypes is stored with each instance—making for more bits on disk and longer times for operations affected by data size (e.g., dataset scans). The only fields that <i>must</i> be specified a priori are the primary key and any fields that you would like to build indexes on.</p></div>
+<div class="section">
+<h3><a name="Creating_Datasets_and_Indexes"></a>Creating Datasets and Indexes</h3>
+<p>Now that we have defined our datatypes, we can move on and create datasets to store the actual data. (If we wanted to, we could even have several named datasets based on any one of these datatypes.) We can do this as follows, utilizing the DDL capabilities of AsterixDB.</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ create dataset FacebookUsers(FacebookUserType)
+ primary key id;
+
+ create dataset FacebookMessages(FacebookMessageType)
+ primary key message-id;
+
+ create dataset TwitterUsers(TwitterUserType)
+ primary key screen-name;
+
+ create dataset TweetMessages(TweetMessageType)
+ primary key tweetid
+ hints(cardinality=100);
+
+ create index fbUserSinceIdx on FacebookUsers(user-since);
+ create index fbAuthorIdx on FacebookMessages(author-id) type btree;
+ create index fbSenderLocIndex on FacebookMessages(sender-location) type rtree;
+ create index fbMessageIdx on FacebookMessages(message) type keyword;
+
+ from $ds in dataset Metadata.Dataset select $ds;
+ from $ix in dataset Metadata.Index select $ix;
+</pre></div></div>
+<p>The ADM DDL statements above create four datasets for holding our social data in the TinySocial dataverse: FacebookUsers, FacebookMessages, TwitterUsers, and TweetMessages. The first statement creates the FacebookUsers data set. It specifies that this dataset will store data instances conforming to FacebookUserType and that it has a primary key which is the id field of each instance. The primary key information is used by AsterixDB to uniquely identify instances for the purpose of later lookup and for use in secondary indexes. Each AsterixDB dataset is stored (and indexed) in the form of a B+ tree on primary key; secondary indexes point to their indexed data by primary key. In AsterixDB clusters, the primary key is also used to hash-partition (a.k.a. shard) the dataset across the nodes of the cluster. The next three create dataset statements are similar. The last one illustrates an optional clause for providing useful hints to AsterixDB. In this case, the hint tells AsterixDB that the dataset definer is anticipating that the TweetMessages dataset will contain roughly 100 objects; knowing this can help AsterixDB to more efficiently manage and query this dataset. (AsterixDB does not yet gather and maintain data statistics; it will currently, abitrarily, assume a cardinality of one million objects per dataset in the absence of such an optional definition-time hint.)</p>
+<p>The create dataset statements above are followed by four more DDL statements, each of which creates a secondary index on a field of one of the datasets. The first one indexes the FacebookUsers dataset on its user-since field. This index will be a B+ tree index; its type is unspecified and <i>btree</i> is the default type. The other three illustrate how you can explicitly specify the desired type of index. In addition to btree, <i>rtree</i> and inverted <i>keyword</i> indexes are supported by AsterixDB. Indexes can also have composite keys, and more advanced text indexing is available as well (ngram(k), where k is the desired gram length).</p></div>
+<div class="section">
+<h3><a name="Querying_the_Metadata_Dataverse"></a>Querying the Metadata Dataverse</h3>
+<p>The last two statements above show how you can use queries in AQL to examine the AsterixDB system catalogs and tell what artifacts you have created. Just as relational DBMSs use their own tables to store their catalogs, AsterixDB uses its own datasets to persist descriptions of its datasets, datatypes, indexes, and so on. Running the first of the two queries above will list all of your newly created datasets, and it will also show you a full list of all the metadata datasets. (You can then explore from there on your own if you are curious) These last two queries also illustrate one other factoid worth knowing: AsterixDB allows queries to span dataverses by allowing the optional use of fully-qualified dataset names (i.e., <i>dataversename.datasetname</i>) to reference datasets that live in a dataverse other than the one that was named in the most recently executed <i>use dataverse</i> directive.</p>
+<p>—-</p></div></div>
+<div class="section">
+<h2><a name="Loading_Data_Into_AsterixDB"></a>Loading Data Into AsterixDB</h2>
+<p>Okay, so far so good—AsterixDB is now ready for data, so let’s give it some data to store Our next task will be to load some sample data into the four datasets that we just defined. Here we will load a tiny set of records, defined in ADM format (a superset of JSON), into each dataset. In the boxes below you can see the actual data instances contained in each of the provided sample files. In order to load this data yourself, you should first store the four corresponding <tt>.adm</tt> files (whose URLs are indicated on top of each box below) into a filesystem directory accessible to your running AsterixDB instance. Take a few minutes to look carefully at each of the sample data sets. This will give you a better sense of the nature of the data that we are about to load and query. We should note that ADM format is a textual serialization of what AsterixDB will actually store; when persisted in AsterixDB, the data format will be binary and the data in the predefined fields of the data instances will be stored separately from their associated field name and type metadata.</p>
+<p><a href="../data/twu.adm">Twitter Users</a></p>
+
+<div class="source">
+<div class="source">
+<pre> {"screen-name":"NathanGiesen@211","lang":"en","friends_count":18,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416}
+ {"screen-name":"ColineGeyer@63","lang":"en","friends_count":121,"statuses_count":362,"name":"Coline Geyer","followers_count":17159}
+ {"screen-name":"NilaMilliron_tw","lang":"en","friends_count":445,"statuses_count":164,"name":"Nila Milliron","followers_count":22649}
+ {"screen-name":"ChangEwing_573","lang":"en","friends_count":182,"statuses_count":394,"name":"Chang Ewing","followers_count":32136}
+</pre></div></div>
+<p><a href="../data/twm.adm">Tweet Messages</a></p>
+
+<div class="source">
+<div class="source">
+<pre> {"tweetid":"1","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("47.44,80.65"),"send-time":datetime("2008-04-26T10:10:00"),"referred-topics":{{"t-mobile","customization"}},"message-text":" love t-mobile its customization is good:)"}
+ {"tweetid":"2","user":{"screen-name":"ColineGeyer@63","lang":"en","friends_count":121,"statuses_count":362,"name":"Coline Geyer","followers_count":17159},"sender-location":point("32.84,67.14"),"send-time":datetime("2010-05-13T10:10:00"),"referred-topics":{{"verizon","shortcut-menu"}},"message-text":" like verizon its shortcut-menu is awesome:)"}
+ {"tweetid":"3","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("29.72,75.8"),"send-time":datetime("2006-11-04T10:10:00"),"referred-topics":{{"motorola","speed"}},"message-text":" like motorola the speed is good:)"}
+ {"tweetid":"4","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("39.28,70.48"),"send-time":datetime("2011-12-26T10:10:00"),"referred-topics":{{"sprint","voice-command"}},"message-text":" like sprint the voice-command is mind-blowing:)"}
+ {"tweetid":"5","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("40.09,92.69"),"send-time":datetime("2006-08-04T10:10:00"),"referred-topics":{{"motorola","speed"}},"message-text":" can't stand motorola its speed is terrible:("}
+ {"tweetid":"6","user":{"screen-name":"ColineGeyer@63","lang":"en","friends_count":121,"statuses_count":362,"name":"Coline Geyer","followers_count":17159},"sender-location":point("47.51,83.99"),"send-time":datetime("2010-05-07T10:10:00"),"referred-topics":{{"iphone","voice-clarity"}},"message-text":" like iphone the voice-clarity is good:)"}
+ {"tweetid":"7","user":{"screen-name":"ChangEwing_573","lang":"en","friends_count":182,"statuses_count":394,"name":"Chang Ewing","followers_count":32136},"sender-location":point("36.21,72.6"),"send-time":datetime("2011-08-25T10:10:00"),"referred-topics":{{"samsung","platform"}},"message-text":" like samsung the platform is good"}
+ {"tweetid":"8","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("46.05,93.34"),"send-time":datetime("2005-10-14T10:10:00"),"referred-topics":{{"t-mobile","shortcut-menu"}},"message-text":" like t-mobile the shortcut-menu is awesome:)"}
+ {"tweetid":"9","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("36.86,74.62"),"send-time":datetime("2012-07-21T10:10:00"),"referred-topics":{{"verizon","voicemail-service"}},"message-text":" love verizon its voicemail-service is awesome"}
+ {"tweetid":"10","user":{"screen-name":"ColineGeyer@63","lang":"en","friends_count":121,"statuses_count":362,"name":"Coline Geyer","followers_count":17159},"sender-location":point("29.15,76.53"),"send-time":datetime("2008-01-26T10:10:00"),"referred-topics":{{"verizon","voice-clarity"}},"message-text":" hate verizon its voice-clarity is OMG:("}
+ {"tweetid":"11","user":{"screen-name":"NilaMilliron_tw","lang":"en","friends_count":445,"statuses_count":164,"name":"Nila Milliron","followers_count":22649},"sender-location":point("37.59,68.42"),"send-time":datetime("2008-03-09T10:10:00"),"referred-topics":{{"iphone","platform"}},"message-text":" can't stand iphone its platform is terrible"}
+ {"tweetid":"12","user":{"screen-name":"OliJackson_512","lang":"en","friends_count":445,"statuses_count":164,"name":"Oli Jackson","followers_count":22649},"sender-location":point("24.82,94.63"),"send-time":datetime("2010-02-13T10:10:00"),"referred-topics":{{"samsung","voice-command"}},"message-text":" like samsung the voice-command is amazing:)"}
+</pre></div></div>
+<p><a href="../data/fbu.adm">Facebook Users</a></p>
+
+<div class="source">
+<div class="source">
+<pre> {"id":1,"alias":"Margarita","name":"MargaritaStoddard","user-since":datetime("2012-08-20T10:10:00"),"friend-ids":{{2,3,6,10}},"employment":[{"organization-name":"Codetechno","start-date":date("2006-08-06")}]}
+ {"id":2,"alias":"Isbel","name":"IsbelDull","user-since":datetime("2011-01-22T10:10:00"),"friend-ids":{{1,4}},"employment":[{"organization-name":"Hexviafind","start-date":date("2010-04-27")}]}
+ {"id":3,"alias":"Emory","name":"EmoryUnk","user-since":datetime("2012-07-10T10:10:00"),"friend-ids":{{1,5,8,9}},"employment":[{"organization-name":"geomedia","start-date":date("2010-06-17"),"end-date":date("2010-01-26")}]}
+ {"id":4,"alias":"Nicholas","name":"NicholasStroh","user-since":datetime("2010-12-27T10:10:00"),"friend-ids":{{2}},"employment":[{"organization-name":"Zamcorporation","start-date":date("2010-06-08")}]}
+ {"id":5,"alias":"Von","name":"VonKemble","user-since":datetime("2010-01-05T10:10:00"),"friend-ids":{{3,6,10}},"employment":[{"organization-name":"Kongreen","start-date":date("2010-11-27")}]}
+ {"id":6,"alias":"Willis","name":"WillisWynne","user-since":datetime("2005-01-17T10:10:00"),"friend-ids":{{1,3,7}},"employment":[{"organization-name":"jaydax","start-date":date("2009-05-15")}]}
+ {"id":7,"alias":"Suzanna","name":"SuzannaTillson","user-since":datetime("2012-08-07T10:10:00"),"friend-ids":{{6}},"employment":[{"organization-name":"Labzatron","start-date":date("2011-04-19")}]}
+ {"id":8,"alias":"Nila","name":"NilaMilliron","user-since":datetime("2008-01-01T10:10:00"),"friend-ids":{{3}},"employment":[{"organization-name":"Plexlane","start-date":date("2010-02-28")}]}
+ {"id":9,"alias":"Woodrow","name":"WoodrowNehling","user-since":datetime("2005-09-20T10:10:00"),"friend-ids":{{3,10}},"employment":[{"organization-name":"Zuncan","start-date":date("2003-04-22"),"end-date":date("2009-12-13")}]}
+ {"id":10,"alias":"Bram","name":"BramHatch","user-since":datetime("2010-10-16T10:10:00"),"friend-ids":{{1,5,9}},"employment":[{"organization-name":"physcane","start-date":date("2007-06-05"),"end-date":date("2011-11-05")}]}
+</pre></div></div>
+<p><a href="../data/fbm.adm">Facebook Messages</a></p>
+
+<div class="source">
+<div class="source">
+<pre> {"message-id":1,"author-id":3,"in-response-to":2,"sender-location":point("47.16,77.75"),"message":" love sprint its shortcut-menu is awesome:)"}
+ {"message-id":2,"author-id":1,"in-response-to":4,"sender-location":point("41.66,80.87"),"message":" dislike iphone its touch-screen is horrible"}
+ {"message-id":3,"author-id":2,"in-response-to":4,"sender-location":point("48.09,81.01"),"message":" like samsung the plan is amazing"}
+ {"message-id":4,"author-id":1,"in-response-to":2,"sender-location":point("37.73,97.04"),"message":" can't stand at&t the network is horrible:("}
+ {"message-id":5,"author-id":6,"in-response-to":2,"sender-location":point("34.7,90.76"),"message":" love sprint the customization is mind-blowing"}
+ {"message-id":6,"author-id":2,"in-response-to":1,"sender-location":point("31.5,75.56"),"message":" like t-mobile its platform is mind-blowing"}
+ {"message-id":7,"author-id":5,"in-response-to":15,"sender-location":point("32.91,85.05"),"message":" dislike sprint the speed is horrible"}
+ {"message-id":8,"author-id":1,"in-response-to":11,"sender-location":point("40.33,80.87"),"message":" like verizon the 3G is awesome:)"}
+ {"message-id":9,"author-id":3,"in-response-to":12,"sender-location":point("34.45,96.48"),"message":" love verizon its wireless is good"}
+ {"message-id":10,"author-id":1,"in-response-to":12,"sender-location":point("42.5,70.01"),"message":" can't stand motorola the touch-screen is terrible"}
+ {"message-id":11,"author-id":1,"in-response-to":1,"sender-location":point("38.97,77.49"),"message":" can't stand at&t its plan is terrible"}
+ {"message-id":12,"author-id":10,"in-response-to":6,"sender-location":point("42.26,77.76"),"message":" can't stand t-mobile its voicemail-service is OMG:("}
+ {"message-id":13,"author-id":10,"in-response-to":4,"sender-location":point("42.77,78.92"),"message":" dislike iphone the voice-command is bad:("}
+ {"message-id":14,"author-id":9,"in-response-to":12,"sender-location":point("41.33,85.28"),"message":" love at&t its 3G is good:)"}
+ {"message-id":15,"author-id":7,"in-response-to":11,"sender-location":point("44.47,67.11"),"message":" like iphone the voicemail-service is awesome"}
+</pre></div></div>
+<p>It’s loading time! We can use AQL <i>load</i> statements to populate our datasets with the sample records shown above. The following shows how loading can be done for data stored in <tt>.adm</tt> files in your local filesystem. <i>Note:</i> You <i>MUST</i> replace the <tt><Host Name></tt> and <tt><Absolute File Path></tt> placeholders in each load statement below with valid values based on the host IP address (or host name) for the machine and directory that you have downloaded the provided <tt>.adm</tt> files to. As you do so, be very, very careful to retain the two slashes in the load statements, i.e., do not delete the two slashes that appear in front of the absolute path to your <tt>.adm</tt> files. (This will lead to a three-slash character sequence at the start of each load statement’s file input path specification.)</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ load dataset FacebookUsers using localfs
+ (("path"="<Host Name>://<Absolute File Path>/fbu.adm"),("format"="adm"));
+ load dataset FacebookMessages using localfs
+ (("path"="<Host Name>://<Absolute File Path>/fbm.adm"),("format"="adm"));
+ load dataset TwitterUsers using localfs
+ (("path"="<Host Name>://<Absolute File Path>/twu.adm"),("format"="adm"));
+ load dataset TweetMessages using localfs
+ (("path"="<Host Name>://<Absolute File Path>/twm.adm"),("format"="adm"));
+</pre></div></div>
+<p>—-</p></div>
+<div class="section">
+<h2><a name="AQL:_Querying_Your_AsterixDB_Data"></a>AQL: Querying Your AsterixDB Data</h2>
+<p>Congratulations! You now have sample social data stored (and indexed) in AsterixDB. (You are part of an elite and adventurous group of individuals. :-)) Now that you have successfully loaded the provided sample data into the datasets that we defined, you can start running queries against them.</p>
+<p>The query language for AsterixDB is AQL—the Asterix Query Language. AQL is loosely based on XQuery, the language developed and standardized in the early to mid 2000’s by the World Wide Web Consortium (W3C) for querying semistructured data stored in their XML format. We have tossed all of the “XML cruft” out of their language but retained many of its core ideas. We did this because its design was developed over a period of years by a diverse committee of smart and experienced language designers, including “SQL people”, “functional programming people”, and “XML people”, all of whom were focused on how to design a new query language that operates well over semistructured data. (We decided to stand on their shoulders instead of starting from scratch and revisiting many of the same issues.) Note that AQL is not SQL and not based on SQL: In other words, AsterixDB is fully “NoSQL compliant”. :-)</p>
+<p>In this section we introduce AQL via a set of example queries, along with their expected results, based on the data above, to help you get started. Many of the most important features of AQL are presented in this set of representative queries. You can find more details in the document on the <a href="datamodel.html">Asterix Data Model (ADM)</a>, in the <a href="manual.html">AQL Reference Manual</a>, and a complete list of built-in functions is available in the <a href="functions.html">Asterix Functions</a> document.</p>
+<p>AQL is an expression language. Even the expression 1+1 is a valid AQL query that evaluates to 2. (Try it for yourself! Okay, maybe that’s <i>not</i> the best use of a 512-node shared-nothing compute cluster.) Most useful AQL queries will be based on the <i>FLWOR</i> (pronounced “flower”) expression structure that AQL has borrowed from XQuery ((<a class="externalLink" href="http://en.wikipedia.org/wiki/FLWOR))">http://en.wikipedia.org/wiki/FLWOR))</a>. The FLWOR expression syntax supports both the incremental binding (<i>for</i>) of variables to ADM data instances in a dataset (or in the result of any AQL expression, actually) and the full binding (<i>let</i>) of variables to entire intermediate results in a fashion similar to temporary views in the SQL world. FLWOR is an acronym that is short for <i>for</i>-<i>let</i>-<i>where</i>-<i>order by</i>-<i>return</i>, naming five of the most frequently used clauses from the syntax of a full AQL query. AQL also includes <i>group by</i> and <i>limit</i> clauses, as you will see shortly. Roughly speaking, for SQL afficiandos, the <i>for</i> clause in AQL is like the <i>from</i> clause in SQL, the <i>return</i> clause in AQL is like the <i>select</i> clause in SQL (but appears at the end instead of the beginning of a query), the <i>let</i> clause in AQL is like SQL’s <i>with</i> clause, and the <i>where</i> and <i>order by</i> clauses in both languages are similar.</p>
+<p>In order to allow SQL fans to write queries in their favored ways, AQL provides synonyms: <i>from</i> for <i>for</i>, <i>select</i> for <i>return</i>, <i>with</i> for <i>let</i>, and <i>keeping</i> for <i>with</i> in the group by clause.</p>
+<p>Enough talk! Let’s go ahead and try writing some queries and see about learning AQL by example.</p>
+<div class="section">
+<h3><a name="Query_0-A_-_Exact-Match_Lookup"></a>Query 0-A - Exact-Match Lookup</h3>
+<p>For our first query, let’s find a Facebook user based on his or her user id. Suppose the user we want is the user whose id is 8:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+ from $user in dataset FacebookUsers
+ where $user.id = 8
+ select $user;
+</pre></div></div>
+<p>The query’s <i>from</i> clause binds the variable <tt>$user</tt> incrementally to the data instances residing in the dataset named FacebookUsers. Its <i>where</i> clause selects only those bindings having a user id of interest, filtering out the rest. The <i>select</i> clause returns the (entire) data instance for each binding that satisfies the predicate. Since this dataset is indexed on user id (its primary key), this query will be done via a quick index lookup.</p>
+<p>The expected result for our sample data is as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 8, "alias": "Nila", "name": "NilaMilliron", "user-since": datetime("2008-01-01T10:10:00.000Z"), "friend-ids": {{ 3 }}, "employment": [ { "organization-name": "Plexlane", "start-date": date("2010-02-28"), "end-date": null } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_0-B_-_Range_Scan"></a>Query 0-B - Range Scan</h3>
+<p>AQL, like SQL, supports a variety of different predicates. For example, for our next query, let’s find the Facebook users whose ids are in the range between 2 and 4:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ from $user in dataset FacebookUsers
+ where $user.id >= 2 and $user.id <= 4
+ select $user;
+</pre></div></div>
+<p>This query’s expected result, also evaluable using the primary index on user id, is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 2, "alias": "Isbel", "name": "IsbelDull", "user-since": datetime("2011-01-22T10:10:00.000Z"), "friend-ids": {{ 1, 4 }}, "employment": [ { "organization-name": "Hexviafind", "start-date": date("2010-04-27"), "end-date": null } ] }
+ { "id": 3, "alias": "Emory", "name": "EmoryUnk", "user-since": datetime("2012-07-10T10:10:00.000Z"), "friend-ids": {{ 1, 5, 8, 9 }}, "employment": [ { "organization-name": "geomedia", "start-date": date("2010-06-17"), "end-date": date("2010-01-26") } ] }
+ { "id": 4, "alias": "Nicholas", "name": "NicholasStroh", "user-since": datetime("2010-12-27T10:10:00.000Z"), "friend-ids": {{ 2 }}, "employment": [ { "organization-name": "Zamcorporation", "start-date": date("2010-06-08"), "end-date": null } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_1_-_Other_Query_Filters"></a>Query 1 - Other Query Filters</h3>
+<p>AQL can do range queries on any data type that supports the appropriate set of comparators. As an example, this next query retrieves the Facebook users who joined between July 22, 2010 and July 29, 2012:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+ from $user in dataset FacebookUsers
+ where $user.user-since >= datetime('2010-07-22T00:00:00')
+ and $user.user-since <= datetime('2012-07-29T23:59:59')
+ select $user;
+</pre></div></div>
+<p>The expected result for this query, also an indexable query, is as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 2, "alias": "Isbel", "name": "IsbelDull", "user-since": datetime("2011-01-22T10:10:00.000Z"), "friend-ids": {{ 1, 4 }}, "employment": [ { "organization-name": "Hexviafind", "start-date": date("2010-04-27"), "end-date": null } ] }
+ { "id": 3, "alias": "Emory", "name": "EmoryUnk", "user-since": datetime("2012-07-10T10:10:00.000Z"), "friend-ids": {{ 1, 5, 8, 9 }}, "employment": [ { "organization-name": "geomedia", "start-date": date("2010-06-17"), "end-date": date("2010-01-26") } ] }
+ { "id": 4, "alias": "Nicholas", "name": "NicholasStroh", "user-since": datetime("2010-12-27T10:10:00.000Z"), "friend-ids": {{ 2 }}, "employment": [ { "organization-name": "Zamcorporation", "start-date": date("2010-06-08"), "end-date": null } ] }
+ { "id": 10, "alias": "Bram", "name": "BramHatch", "user-since": datetime("2010-10-16T10:10:00.000Z"), "friend-ids": {{ 1, 5, 9 }}, "employment": [ { "organization-name": "physcane", "start-date": date("2007-06-05"), "end-date": date("2011-11-05") } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_2-A_-_Equijoin"></a>Query 2-A - Equijoin</h3>
+<p>In addition to simply binding variables to data instances and returning them “whole”, an AQL query can construct new ADM instances to return based on combinations of its variable bindings. This gives AQL the power to do joins much like those done using multi-table <i>from</i> clauses in SQL. For example, suppose we wanted a list of all Facebook users paired with their associated messages, with the list enumerating the author name and the message text associated with each Facebook message. We could do this as follows in AQL:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ from $user in dataset FacebookUsers
+ from $message in dataset FacebookMessages
+ where $message.author-id = $user.id
+ select {
+ "uname": $user.name,
+ "message": $message.message
+ };
+</pre></div></div>
+<p>The result of this query is a sequence of new ADM instances, one for each author/message pair. Each instance in the result will be an ADM record containing two fields, “uname” and “message”, containing the user’s name and the message text, respectively, for each author/message pair. (Note that “uname” and “message” are both simple AQL expressions themselves—so in the most general case, even the resulting field names can be computed as part of the query, making AQL a very powerful tool for slicing and dicing semistructured data.)</p>
+<p>The expected result of this example AQL join query for our sample data set is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "uname": "MargaritaStoddard", "message": " dislike iphone its touch-screen is horrible" }
+ { "uname": "MargaritaStoddard", "message": " can't stand at&t the network is horrible:(" }
+ { "uname": "MargaritaStoddard", "message": " like verizon the 3G is awesome:)" }
+ { "uname": "MargaritaStoddard", "message": " can't stand motorola the touch-screen is terrible" }
+ { "uname": "MargaritaStoddard", "message": " can't stand at&t its plan is terrible" }
+ { "uname": "IsbelDull", "message": " like samsung the plan is amazing" }
+ { "uname": "IsbelDull", "message": " like t-mobile its platform is mind-blowing" }
+ { "uname": "EmoryUnk", "message": " love sprint its shortcut-menu is awesome:)" }
+ { "uname": "EmoryUnk", "message": " love verizon its wireless is good" }
+ { "uname": "VonKemble", "message": " dislike sprint the speed is horrible" }
+ { "uname": "WillisWynne", "message": " love sprint the customization is mind-blowing" }
+ { "uname": "SuzannaTillson", "message": " like iphone the voicemail-service is awesome" }
+ { "uname": "WoodrowNehling", "message": " love at&t its 3G is good:)" }
+ { "uname": "BramHatch", "message": " can't stand t-mobile its voicemail-service is OMG:(" }
+ { "uname": "BramHatch", "message": " dislike iphone the voice-command is bad:(" }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_2-B_-_Index_join"></a>Query 2-B - Index join</h3>
+<p>By default, AsterixDB evaluates equijoin queries using hash-based join methods that work well for doing ad hoc joins of very large data sets (<a class="externalLink" href="http://en.wikipedia.org/wiki/Hash_join">http://en.wikipedia.org/wiki/Hash_join</a>). On a cluster, hash partitioning is employed as AsterixDB’s divide-and-conquer strategy for computing large parallel joins. AsterixDB includes other join methods, but in the absence of data statistics and selectivity estimates, it doesn’t (yet) have the know-how to intelligently choose among its alternatives. We therefore asked ourselves the classic question—WWOD?—What Would Oracle Do?—and in the interim, AQL includes a clunky (but useful) hint-based mechanism for addressing the occasional need to suggest to AsterixDB which join method it should use for a particular AQL query.</p>
+<p>The following query is similar to Query 2-A but includes a suggestion to AsterixDB that it should consider employing an index-based nested-loop join technique to process the query:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ from $user in dataset FacebookUsers
+ from $message in dataset FacebookMessages
+ where $message.author-id /*+ indexnl */ = $user.id
+ select {
+ "uname": $user.name,
+ "message": $message.message
+ };
+</pre></div></div>
+<p>The expected result is (of course) the same as before, modulo the order of the instances. Result ordering is (intentionally) undefined in AQL in the absence of an <i>order by</i> clause. The query result for our sample data in this case is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "uname": "EmoryUnk", "message": " love sprint its shortcut-menu is awesome:)" }
+ { "uname": "MargaritaStoddard", "message": " dislike iphone its touch-screen is horrible" }
+ { "uname": "IsbelDull", "message": " like samsung the plan is amazing" }
+ { "uname": "MargaritaStoddard", "message": " can't stand at&t the network is horrible:(" }
+ { "uname": "WillisWynne", "message": " love sprint the customization is mind-blowing" }
+ { "uname": "IsbelDull", "message": " like t-mobile its platform is mind-blowing" }
+ { "uname": "VonKemble", "message": " dislike sprint the speed is horrible" }
+ { "uname": "MargaritaStoddard", "message": " like verizon the 3G is awesome:)" }
+ { "uname": "EmoryUnk", "message": " love verizon its wireless is good" }
+ { "uname": "MargaritaStoddard", "message": " can't stand motorola the touch-screen is terrible" }
+ { "uname": "MargaritaStoddard", "message": " can't stand at&t its plan is terrible" }
+ { "uname": "BramHatch", "message": " can't stand t-mobile its voicemail-service is OMG:(" }
+ { "uname": "BramHatch", "message": " dislike iphone the voice-command is bad:(" }
+ { "uname": "WoodrowNehling", "message": " love at&t its 3G is good:)" }
+ { "uname": "SuzannaTillson", "message": " like iphone the voicemail-service is awesome" }
+</pre></div></div>
+<p>(It is worth knowing, with respect to influencing AsterixDB’s query evaluation, that nested <i>from</i> clauses—a.k.a. joins— are currently evaluated with the “outer” clause probing the data of the “inner” clause.)</p></div>
+<div class="section">
+<h3><a name="Query_3_-_Nested_Outer_Join"></a>Query 3 - Nested Outer Join</h3>
+<p>In order to support joins between tables with missing/dangling join tuples, the designers of SQL ended up shoe-horning a subset of the relational algebra into SQL’s <i>from</i> clause syntax—and providing a variety of join types there for users to choose from. Left outer joins are particularly important in SQL, e.g., to print a summary of customers and orders, grouped by customer, without omitting those customers who haven’t placed any orders yet.</p>
+<p>The AQL language supports nesting, both of queries and of query results, and the combination allows for an arguably cleaner/more natural approach to such queries. As an example, supposed we wanted, for each Facebook user, to produce a record that has his/her name plus a list of the messages written by that user. In SQL, this would involve a left outer join between users and messages, grouping by user, and having the user name repeated along side each message. In AQL, this sort of use case can be handled (more naturally) as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ from $user in dataset FacebookUsers
+ select {
+ "uname": $user.name,
+ "messages": from $message in dataset FacebookMessages
+ where $message.author-id = $user.id
+ select $message.message
+ };
+</pre></div></div>
+<p>This AQL query binds the variable <tt>$user</tt> to the data instances in FacebookUsers; for each user, it constructs a result record containing a “uname” field with the user’s name and a “messages” field with a nested collection of all messages for that user. The nested collection for each user is specified by using a correlated subquery. (Note: While it looks like nested loops could be involved in computing the result, AsterixDB recogizes the equivalence of such a query to an outerjoin, and it will use an efficient hash-based strategy when actually computing the query’s result.)</p>
+<p>Here is this example query’s expected output:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "uname": "MargaritaStoddard", "messages": [ " dislike iphone its touch-screen is horrible", " can't stand at&t the network is horrible:(", " like verizon the 3G is awesome:)", " can't stand motorola the touch-screen is terrible", " can't stand at&t its plan is terrible" ] }
+ { "uname": "IsbelDull", "messages": [ " like samsung the plan is amazing", " like t-mobile its platform is mind-blowing" ] }
+ { "uname": "EmoryUnk", "messages": [ " love sprint its shortcut-menu is awesome:)", " love verizon its wireless is good" ] }
+ { "uname": "NicholasStroh", "messages": [ ] }
+ { "uname": "VonKemble", "messages": [ " dislike sprint the speed is horrible" ] }
+ { "uname": "WillisWynne", "messages": [ " love sprint the customization is mind-blowing" ] }
+ { "uname": "SuzannaTillson", "messages": [ " like iphone the voicemail-service is awesome" ] }
+ { "uname": "NilaMilliron", "messages": [ ] }
+ { "uname": "WoodrowNehling", "messages": [ " love at&t its 3G is good:)" ] }
+ { "uname": "BramHatch", "messages": [ " dislike iphone the voice-command is bad:(", " can't stand t-mobile its voicemail-service is OMG:(" ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_4_-_Theta_Join"></a>Query 4 - Theta Join</h3>
+<p>Not all joins are expressible as equijoins and computable using equijoin-oriented algorithms. The join predicates for some use cases involve predicates with functions; AsterixDB supports the expression of such queries and will still evaluate them as best it can using nested loop based techniques (and broadcast joins in the parallel case).</p>
+<p>As an example of such a use case, suppose that we wanted, for each tweet T, to find all of the other tweets that originated from within a circle of radius of 1 surrounding tweet T’s location. In AQL, this can be specified in a manner similar to the previous query using one of the built-in functions on the spatial data type instead of id equality in the correlated query’s <i>where</i> clause:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ from $t in dataset TweetMessages
+ select {
+ "message": $t.message-text,
+ "nearby-messages": from $t2 in dataset TweetMessages
+ where spatial-distance($t.sender-location, $t2.sender-location) <= 1
+ select { "msgtxt":$t2.message-text}
+ };
+</pre></div></div>
+<p>Here is the expected result for this query:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "message": " love t-mobile its customization is good:)", "nearby-messages": [ { "msgtxt": " love t-mobile its customization is good:)" } ] }
+ { "message": " hate verizon its voice-clarity is OMG:(", "nearby-messages": [ { "msgtxt": " like motorola the speed is good:)" }, { "msgtxt": " hate verizon its voice-clarity is OMG:(" } ] }
+ { "message": " can't stand iphone its platform is terrible", "nearby-messages": [ { "msgtxt": " can't stand iphone its platform is terrible" } ] }
+ { "message": " like samsung the voice-command is amazing:)", "nearby-messages": [ { "msgtxt": " like samsung the voice-command is amazing:)" } ] }
+ { "message": " like verizon its shortcut-menu is awesome:)", "nearby-messages": [ { "msgtxt": " like verizon its shortcut-menu is awesome:)" } ] }
+ { "message": " like motorola the speed is good:)", "nearby-messages": [ { "msgtxt": " hate verizon its voice-clarity is OMG:(" }, { "msgtxt": " like motorola the speed is good:)" } ] }
+ { "message": " like sprint the voice-command is mind-blowing:)", "nearby-messages": [ { "msgtxt": " like sprint the voice-command is mind-blowing:)" } ] }
+ { "message": " can't stand motorola its speed is terrible:(", "nearby-messages": [ { "msgtxt": " can't stand motorola its speed is terrible:(" } ] }
+ { "message": " like iphone the voice-clarity is good:)", "nearby-messages": [ { "msgtxt": " like iphone the voice-clarity is good:)" } ] }
+ { "message": " like samsung the platform is good", "nearby-messages": [ { "msgtxt": " like samsung the platform is good" } ] }
+ { "message": " like t-mobile the shortcut-menu is awesome:)", "nearby-messages": [ { "msgtxt": " like t-mobile the shortcut-menu is awesome:)" } ] }
+ { "message": " love verizon its voicemail-service is awesome", "nearby-messages": [ { "msgtxt": " love verizon its voicemail-service is awesome" } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_5_-_Fuzzy_Join"></a>Query 5 - Fuzzy Join</h3>
+<p>As another example of a non-equijoin use case, we could ask AsterixDB to find, for each Facebook user, all Twitter users with names “similar” to their name. AsterixDB supports a variety of “fuzzy match” functions for use with textual and set-based data. As one example, we could choose to use edit distance with a threshold of 3 as the definition of name similarity, in which case we could write the following query using AQL’s operator-based syntax (~=) for testing whether or not two values are similar:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ set simfunction "edit-distance";
+ set simthreshold "3";
+ from $fbu in dataset FacebookUsers
+ select {
+ "id": $fbu.id,
+ "name": $fbu.name,
+ "similar-users": from $t in dataset TweetMessages
+ with $tu := $t.user
+ where $tu.name ~= $fbu.name
+ select {
+ "twitter-screenname": $tu.screen-name,
+ "twitter-name": $tu.name
+ }
+ };
+</pre></div></div>
+<p>The expected result for this query against our sample data is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 1, "name": "MargaritaStoddard", "similar-users": [ ] }
+ { "id": 2, "name": "IsbelDull", "similar-users": [ ] }
+ { "id": 3, "name": "EmoryUnk", "similar-users": [ ] }
+ { "id": 4, "name": "NicholasStroh", "similar-users": [ ] }
+ { "id": 5, "name": "VonKemble", "similar-users": [ ] }
+ { "id": 6, "name": "WillisWynne", "similar-users": [ ] }
+ { "id": 7, "name": "SuzannaTillson", "similar-users": [ ] }
+ { "id": 8, "name": "NilaMilliron", "similar-users": [ { "twitter-screenname": "NilaMilliron_tw", "twitter-name": "Nila Milliron" } ] }
+ { "id": 9, "name": "WoodrowNehling", "similar-users": [ ] }
+ { "id": 10, "name": "BramHatch", "similar-users": [ ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_6_-_Existential_Quantification"></a>Query 6 - Existential Quantification</h3>
+<p>The expressive power of AQL includes support for queries involving “some” (existentially quantified) and “all” (universally quantified) query semantics. As an example of an existential AQL query, here we show a query to list the Facebook users who are currently employed. Such employees will have an employment history containing a record with a null end-date value, which leads us to the following AQL query:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ from $fbu in dataset FacebookUsers
+ where (some $e in $fbu.employment satisfies is-null($e.end-date))
+ select $fbu;
+</pre></div></div>
+<p>The expected result in this case is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "user-since": datetime("2012-08-20T10:10:00.000Z"), "friend-ids": {{ 2, 3, 6, 10 }}, "employment": [ { "organization-name": "Codetechno", "start-date": date("2006-08-06"), "end-date": null } ] }
+ { "id": 2, "alias": "Isbel", "name": "IsbelDull", "user-since": datetime("2011-01-22T10:10:00.000Z"), "friend-ids": {{ 1, 4 }}, "employment": [ { "organization-name": "Hexviafind", "start-date": date("2010-04-27"), "end-date": null } ] }
+ { "id": 4, "alias": "Nicholas", "name": "NicholasStroh", "user-since": datetime("2010-12-27T10:10:00.000Z"), "friend-ids": {{ 2 }}, "employment": [ { "organization-name": "Zamcorporation", "start-date": date("2010-06-08"), "end-date": null } ] }
+ { "id": 5, "alias": "Von", "name": "VonKemble", "user-since": datetime("2010-01-05T10:10:00.000Z"), "friend-ids": {{ 3, 6, 10 }}, "employment": [ { "organization-name": "Kongreen", "start-date": date("2010-11-27"), "end-date": null } ] }
+ { "id": 6, "alias": "Willis", "name": "WillisWynne", "user-since": datetime("2005-01-17T10:10:00.000Z"), "friend-ids": {{ 1, 3, 7 }}, "employment": [ { "organization-name": "jaydax", "start-date": date("2009-05-15"), "end-date": null } ] }
+ { "id": 7, "alias": "Suzanna", "name": "SuzannaTillson", "user-since": datetime("2012-08-07T10:10:00.000Z"), "friend-ids": {{ 6 }}, "employment": [ { "organization-name": "Labzatron", "start-date": date("2011-04-19"), "end-date": null } ] }
+ { "id": 8, "alias": "Nila", "name": "NilaMilliron", "user-since": datetime("2008-01-01T10:10:00.000Z"), "friend-ids": {{ 3 }}, "employment": [ { "organization-name": "Plexlane", "start-date": date("2010-02-28"), "end-date": null } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_7_-_Universal_Quantification"></a>Query 7 - Universal Quantification</h3>
+<p>As an example of a universal AQL query, here we show a query to list the Facebook users who are currently unemployed. Such employees will have an employment history containing no records with null end-date values, leading us to the following AQL query:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ from $fbu in dataset FacebookUsers
+ where (every $e in $fbu.employment satisfies not(is-null($e.end-date)))
+ select $fbu;
+</pre></div></div>
+<p>Here is the expected result for our sample data:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 3, "alias": "Emory", "name": "EmoryUnk", "user-since": datetime("2012-07-10T10:10:00.000Z"), "friend-ids": {{ 1, 5, 8, 9 }}, "employment": [ { "organization-name": "geomedia", "start-date": date("2010-06-17"), "end-date": date("2010-01-26") } ] }
+ { "id": 9, "alias": "Woodrow", "name": "WoodrowNehling", "user-since": datetime("2005-09-20T10:10:00.000Z"), "friend-ids": {{ 3, 10 }}, "employment": [ { "organization-name": "Zuncan", "start-date": date("2003-04-22"), "end-date": date("2009-12-13") } ] }
+ { "id": 10, "alias": "Bram", "name": "BramHatch", "user-since": datetime("2010-10-16T10:10:00.000Z"), "friend-ids": {{ 1, 5, 9 }}, "employment": [ { "organization-name": "physcane", "start-date": date("2007-06-05"), "end-date": date("2011-11-05") } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_8_-_Simple_Aggregation"></a>Query 8 - Simple Aggregation</h3>
+<p>Like SQL, the AQL language of AsterixDB provides support for computing aggregates over large amounts of data. As a very simple example, the following AQL query computes the total number of Facebook users:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ count(from $fbu in dataset FacebookUsers select $fbu);
+</pre></div></div>
+<p>In AQL, aggregate functions can be applied to arbitrary subquery results; in this case, the count function is applied to the result of a query that enumerates the Facebook users. The expected result here is:</p>
+
+<div class="source">
+<div class="source">
+<pre> 10
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_9-A_-_Grouping_and_Aggregation"></a>Query 9-A - Grouping and Aggregation</h3>
+<p>Also like SQL, AQL supports grouped aggregation. For every Twitter user, the following group-by/aggregate query counts the number of tweets sent by that user:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ from $t in dataset TweetMessages
+ group by $uid := $t.user.screen-name keeping $t
+ select {
+ "user": $uid,
+ "count": count($t)
+ };
+</pre></div></div>
+<p>The <i>from</i> clause incrementally binds $t to tweets, and the <i>group by</i> clause groups the tweets by its issuer’s Twitter screen-name. Unlike SQL, where data is tabular—flat—the data model underlying AQL allows for nesting. Thus, following the <i>group by</i> clause, the <i>select</i> clause in this query sees a sequence of $t groups, with each such group having an associated $uid variable value (i.e., the tweeting user’s screen name). In the context of the <i>select</i> clause, due to “… keeping $t …”, $uid is bound to the tweeter’s id and $t is bound to the <i>set</i> of tweets issued by that tweeter. The <i>select</i> clause constructs a result record containing the tweeter’s user id and the count of the items in the associated tweet set. The query result will contain one such record per screen name. This query also illustrates another feature of AQL; notice that each user’s screen name is accessed via a path syntax that traverses each tweet’s nested record structure.</p>
+<p>Here is the expected result for this query over the sample data:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "user": "ChangEwing_573", "count": 1 }
+ { "user": "ColineGeyer@63", "count": 3 }
+ { "user": "NathanGiesen@211", "count": 6 }
+ { "user": "NilaMilliron_tw", "count": 1 }
+ { "user": "OliJackson_512", "count": 1 }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_9-B_-_Hash-Based_Grouping_and_Aggregation"></a>Query 9-B - (Hash-Based) Grouping and Aggregation</h3>
+<p>As for joins, AsterixDB has multiple evaluation strategies available for processing grouped aggregate queries. For grouped aggregation, the system knows how to employ both sort-based and hash-based aggregation methods, with sort-based methods being used by default and a hint being available to suggest that a different approach be used in processing a particular AQL query.</p>
+<p>The following query is similar to Query 9-A, but adds a hash-based aggregation hint:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ from $t in dataset TweetMessages
+ /*+ hash*/
+ group by $uid := $t.user.screen-name keeping $t
+ select {
+ "user": $uid,
+ "count": count($t)
+ };
+</pre></div></div>
+<p>Here is the expected result:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "user": "OliJackson_512", "count": 1 }
+ { "user": "ColineGeyer@63", "count": 3 }
+ { "user": "NathanGiesen@211", "count": 6 }
+ { "user": "NilaMilliron_tw", "count": 1 }
+ { "user": "ChangEwing_573", "count": 1 }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_10_-_Grouping_and_Limits"></a>Query 10 - Grouping and Limits</h3>
+<p>In some use cases it is not necessary to compute the entire answer to a query. In some cases, just having the first <i>N</i> or top <i>N</i> results is sufficient. This is expressible in AQL using the <i>limit</i> clause combined with the <i>order by</i> clause.</p>
+<p>The following AQL query returns the top 3 Twitter users based on who has issued the most tweets:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ from $t in dataset TweetMessages
+ group by $uid := $t.user.screen-name keeping $t
+ with $c := count($t)
+ order by $c desc
+ limit 3
+ select {
+ "user": $uid,
+ "count": $c
+ };
+</pre></div></div>
+<p>The expected result for this query is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "user": "NathanGiesen@211", "count": 6 }
+ { "user": "ColineGeyer@63", "count": 3 }
+ { "user": "NilaMilliron_tw", "count": 1 }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_11_-_Left_Outer_Fuzzy_Join"></a>Query 11 - Left Outer Fuzzy Join</h3>
+<p>As a last example of AQL and its query power, the following query, for each tweet, finds all of the tweets that are similar based on the topics that they refer to:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ set simfunction "jaccard";
+ set simthreshold "0.3";
+ from $t in dataset TweetMessages
+ select {
+ "tweet": $t,
+ "similar-tweets": from $t2 in dataset TweetMessages
+ where $t2.referred-topics ~= $t.referred-topics
+ and $t2.tweetid != $t.tweetid
+ select $t2.referred-topics
+ };
+</pre></div></div>
+<p>This query illustrates several things worth knowing in order to write fuzzy queries in AQL. First, as mentioned earlier, AQL offers an operator-based syntax for seeing whether two values are “similar” to one another or not. Second, recall that the referred-topics field of records of datatype TweetMessageType is a bag of strings. This query sets the context for its similarity join by requesting that Jaccard-based similarity semantics (<a class="externalLink" href="http://en.wikipedia.org/wiki/Jaccard_index">http://en.wikipedia.org/wiki/Jaccard_index</a>) be used for the query’s similarity operator and that a similarity index of 0.3 be used as its similarity threshold.</p>
+<p>The expected result for this fuzzy join query is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "tweet": { "tweetid": "1", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("47.44,80.65"), "send-time": datetime("2008-04-26T10:10:00.000Z"), "referred-topics": {{ "t-mobile", "customization" }}, "message-text": " love t-mobile its customization is good:)" }, "similar-tweets": [ {{ "t-mobile", "shortcut-menu" }} ] }
+ { "tweet": { "tweetid": "10", "user": { "screen-name": "ColineGeyer@63", "lang": "en", "friends_count": 121, "statuses_count": 362, "name": "Coline Geyer", "followers_count": 17159 }, "sender-location": point("29.15,76.53"), "send-time": datetime("2008-01-26T10:10:00.000Z"), "referred-topics": {{ "verizon", "voice-clarity" }}, "message-text": " hate verizon its voice-clarity is OMG:(" }, "similar-tweets": [ {{ "iphone", "voice-clarity" }}, {{ "verizon", "voicemail-service" }}, {{ "verizon", "shortcut-menu" }} ] }
+ { "tweet": { "tweetid": "11", "user": { "screen-name": "NilaMilliron_tw", "lang": "en", "friends_count": 445, "statuses_count": 164, "name": "Nila Milliron", "followers_count": 22649 }, "sender-location": point("37.59,68.42"), "send-time": datetime("2008-03-09T10:10:00.000Z"), "referred-topics": {{ "iphone", "platform" }}, "message-text": " can't stand iphone its platform is terrible" }, "similar-tweets": [ {{ "iphone", "voice-clarity" }}, {{ "samsung", "platform" }} ] }
+ { "tweet": { "tweetid": "12", "user": { "screen-name": "OliJackson_512", "lang": "en", "friends_count": 445, "statuses_count": 164, "name": "Oli Jackson", "followers_count": 22649 }, "sender-location": point("24.82,94.63"), "send-time": datetime("2010-02-13T10:10:00.000Z"), "referred-topics": {{ "samsung", "voice-command" }}, "message-text": " like samsung the voice-command is amazing:)" }, "similar-tweets": [ {{ "samsung", "platform" }}, {{ "sprint", "voice-command" }} ] }
+ { "tweet": { "tweetid": "2", "user": { "screen-name": "ColineGeyer@63", "lang": "en", "friends_count": 121, "statuses_count": 362, "name": "Coline Geyer", "followers_count": 17159 }, "sender-location": point("32.84,67.14"), "send-time": datetime("2010-05-13T10:10:00.000Z"), "referred-topics": {{ "verizon", "shortcut-menu" }}, "message-text": " like verizon its shortcut-menu is awesome:)" }, "similar-tweets": [ {{ "verizon", "voicemail-service" }}, {{ "verizon", "voice-clarity" }}, {{ "t-mobile", "shortcut-menu" }} ] }
+ { "tweet": { "tweetid": "3", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("29.72,75.8"), "send-time": datetime("2006-11-04T10:10:00.000Z"), "referred-topics": {{ "motorola", "speed" }}, "message-text": " like motorola the speed is good:)" }, "similar-tweets": [ {{ "motorola", "speed" }} ] }
+ { "tweet": { "tweetid": "4", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("39.28,70.48"), "send-time": datetime("2011-12-26T10:10:00.000Z"), "referred-topics": {{ "sprint", "voice-command" }}, "message-text": " like sprint the voice-command is mind-blowing:)" }, "similar-tweets": [ {{ "samsung", "voice-command" }} ] }
+ { "tweet": { "tweetid": "5", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("40.09,92.69"), "send-time": datetime("2006-08-04T10:10:00.000Z"), "referred-topics": {{ "motorola", "speed" }}, "message-text": " can't stand motorola its speed is terrible:(" }, "similar-tweets": [ {{ "motorola", "speed" }} ] }
+ { "tweet": { "tweetid": "6", "user": { "screen-name": "ColineGeyer@63", "lang": "en", "friends_count": 121, "statuses_count": 362, "name": "Coline Geyer", "followers_count": 17159 }, "sender-location": point("47.51,83.99"), "send-time": datetime("2010-05-07T10:10:00.000Z"), "referred-topics": {{ "iphone", "voice-clarity" }}, "message-text": " like iphone the voice-clarity is good:)" }, "similar-tweets": [ {{ "verizon", "voice-clarity" }}, {{ "iphone", "platform" }} ] }
+ { "tweet": { "tweetid": "7", "user": { "screen-name": "ChangEwing_573", "lang": "en", "friends_count": 182, "statuses_count": 394, "name": "Chang Ewing", "followers_count": 32136 }, "sender-location": point("36.21,72.6"), "send-time": datetime("2011-08-25T10:10:00.000Z"), "referred-topics": {{ "samsung", "platform" }}, "message-text": " like samsung the platform is good" }, "similar-tweets": [ {{ "iphone", "platform" }}, {{ "samsung", "voice-command" }} ] }
+ { "tweet": { "tweetid": "8", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("46.05,93.34"), "send-time": datetime("2005-10-14T10:10:00.000Z"), "referred-topics": {{ "t-mobile", "shortcut-menu" }}, "message-text": " like t-mobile the shortcut-menu is awesome:)" }, "similar-tweets": [ {{ "t-mobile", "customization" }}, {{ "verizon", "shortcut-menu" }} ] }
+ { "tweet": { "tweetid": "9", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("36.86,74.62"), "send-time": datetime("2012-07-21T10:10:00.000Z"), "referred-topics": {{ "verizon", "voicemail-service" }}, "message-text": " love verizon its voicemail-service is awesome" }, "similar-tweets": [ {{ "verizon", "voice-clarity" }}, {{ "verizon", "shortcut-menu" }} ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Inserting_New_Data"></a>Inserting New Data</h3>
+<p>In addition to loading and querying data, AsterixDB supports incremental additions to datasets via the AQL <i>insert</i> statement.</p>
+<p>The following example adds a new tweet by user “<a class="externalLink" href="mailto:NathanGiesen@211">NathanGiesen@211</a>” to the TweetMessages dataset. (An astute reader may notice that this tweet was issued a half an hour after his last tweet, so his counts have all gone up in the interim, although he appears not to have moved in the last half hour.)</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ insert into dataset TweetMessages
+ (
+ {"tweetid":"13",
+ "user":
+ {"screen-name":"NathanGiesen@211",
+ "lang":"en",
+ "friends_count":39345,
+ "statuses_count":479,
+ "name":"Nathan Giesen",
+ "followers_count":49420
+ },
+ "sender-location":point("47.44,80.65"),
+ "send-time":datetime("2008-04-26T10:10:35"),
+ "referred-topics":{{"tweeting"}},
+ "message-text":"tweety tweet, my fellow tweeters!"
+ }
+ );
+</pre></div></div>
+<p>In general, the data to be inserted may be specified using any valid AQL query expression. The insertion of a single object instance, as in this example, is just a special case where the query expression happens to be a record constructor involving only constants.</p></div>
+<div class="section">
+<h3><a name="Deleting_Existing_Data"></a>Deleting Existing Data</h3>
+<p>In addition to inserting new data, AsterixDB supports deletion from datasets via the AQL <i>delete</i> statement. The statement supports “searched delete” semantics, and its <i>where</i> clause can involve any valid XQuery expression.</p>
+<p>The following example deletes the tweet that we just added from user "<a class="externalLink" href="mailto:NathanGiesen@211"">NathanGiesen@211"</a>. (Easy come, easy go. :-))</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ delete $tm from dataset TweetMessages where $tm.tweetid = "13";
+</pre></div></div>
+<p>It should be noted that one form of data change not yet supported by AsterixDB is in-place data modification (<i>update</i>). Currently, only insert and delete operations are supported; update is not. To achieve the effect of an update, two statements are currently needed—one to delete the old record from the dataset where it resides, and another to insert the new replacement record (with the same primary key but with different field values for some of the associated data content).</p></div>
+<div class="section">
+<h3><a name="Transaction_Support"></a>Transaction Support</h3>
+<p>AsterixDB supports record-level ACID transactions that begin and terminate implicitly for each record inserted, deleted, or searched while a given AQL statement is being executed. This is quite similar to the level of transaction support found in today’s NoSQL stores. AsterixDB does not support multi-statement transactions, and in fact an AQL statement that involves multiple records can itself involve multiple independent record-level transactions. An example consequence of this is that, when an AQL statement attempts to insert 1000 records, it is possible that the first 800 records could end up being committed while the remaining 200 records fail to be inserted. This situation could happen, for example, if a duplicate key exception occurs as the 801st insertion is attempted. If this happens, AsterixDB will report the error (e.g., a duplicate key exception) as the result of the offending AQL insert statement, and the application logic above will need to take the appropriate action(s) needed to assess the resulting state and to clean up and/or continue as appropriate.</p></div></div>
+<div class="section">
+<h2><a name="Further_Help"></a>Further Help</h2>
+<p>That’s it You are now armed and dangerous with respect to semistructured data management using AsterixDB.</p>
+<p>AsterixDB is a powerful new BDMS—Big Data Management System—that we hope may usher in a new era of much more declarative Big Data management. AsterixDB is powerful, so use it wisely, and remember: “With great power comes great responsibility…” :-)</p>
+<p>Please e-mail the AsterixDB user group (users (at) asterixdb.incubator.apache.org) if you run into any problems or simply have further questions about the AsterixDB system, its features, or their proper use.</p></div>
+ </div>
+ </div>
+ </div>
+
+ <hr/>
+
+ <footer>
+ <div class="container-fluid">
+ <div class="row span12">Copyright © 2015
+ <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+ All Rights Reserved.
+
+ </div>
+
+ <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+ feather logo, and the Apache AsterixDB project logo are either
+ registered trademarks or trademarks of The Apache Software
+ Foundation in the United States and other countries.
+ All other marks mentioned may be trademarks or registered
+ trademarks of their respective owners.</div>
+
+
+ </div>
+ </footer>
+ </body>
+</html>
diff --git a/docs/0.8.7-incubating/aql/primer.html b/docs/0.8.7-incubating/aql/primer.html
new file mode 100644
index 0000000..053f0f3
--- /dev/null
+++ b/docs/0.8.7-incubating/aql/primer.html
@@ -0,0 +1,896 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2015-11-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+ <head>
+ <meta charset="UTF-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+ <meta name="Date-Revision-yyyymmdd" content="20151124" />
+ <meta http-equiv="Content-Language" content="en" />
+ <title>AsterixDB – AsterixDB 101: An ADM and AQL Primer</title>
+ <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+ <link rel="stylesheet" href="../css/site.css" />
+ <link rel="stylesheet" href="../css/print.css" media="print" />
+
+
+ <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+
+
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');</script>
+
+ </head>
+ <body class="topBarDisabled">
+
+
+
+
+ <div class="container-fluid">
+ <div id="banner">
+ <div class="pull-left">
+ <a href="http://asterixdb.apache.org/" id="bannerLeft">
+ <img src="../images/asterixlogo.png" alt="AsterixDB"/>
+ </a>
+ </div>
+ <div class="pull-right"> </div>
+ <div class="clear"><hr/></div>
+ </div>
+
+ <div id="breadcrumbs">
+ <ul class="breadcrumb">
+
+
+ <li id="publishDate">Last Published: 2015-11-24</li>
+
+
+
+ <li id="projectVersion" class="pull-right">Version: 0.8.7-incubating</li>
+
+ <li class="divider pull-right">|</li>
+
+ <li class="pull-right"> <a href="../index.html" title="Documentation Home">
+ Documentation Home</a>
+ </li>
+
+ </ul>
+ </div>
+
+
+ <div class="row-fluid">
+ <div id="leftColumn" class="span3">
+ <div class="well sidebar-nav">
+
+
+ <ul class="nav nav-list">
+ <li class="nav-header">Documentation</li>
+
+ <li>
+
+ <a href="../install.html" title="Installing and Managing AsterixDB using Managix">
+ <i class="none"></i>
+ Installing and Managing AsterixDB using Managix</a>
+ </li>
+
+ <li>
+
+ <a href="../yarn.html" title="Deploying AsterixDB using YARN">
+ <i class="none"></i>
+ Deploying AsterixDB using YARN</a>
+ </li>
+
+ <li class="active">
+
+ <a href="#"><i class="none"></i>AsterixDB 101: An ADM and AQL Primer</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer-sql-like.html" title="AsterixDB 101: An ADM and AQL Primer (For SQL Fans)">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer (For SQL Fans)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/js-sdk.html" title="AsterixDB Javascript SDK">
+ <i class="none"></i>
+ AsterixDB Javascript SDK</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/datamodel.html" title="Asterix Data Model (ADM)">
+ <i class="none"></i>
+ Asterix Data Model (ADM)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/manual.html" title="Asterix Query Language (AQL)">
+ <i class="none"></i>
+ Asterix Query Language (AQL)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/functions.html" title="AQL Functions">
+ <i class="none"></i>
+ AQL Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/allens.html" title="AQL Allen's Relations Functions">
+ <i class="none"></i>
+ AQL Allen's Relations Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/similarity.html" title="AQL Support of Similarity Queries">
+ <i class="none"></i>
+ AQL Support of Similarity Queries</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/externaldata.html" title="Accessing External Data">
+ <i class="none"></i>
+ Accessing External Data</a>
+ </li>
+
+ <li>
+
+ <a href="../feeds/tutorial.html" title="Support for Data Ingestion in AsterixDB">
+ <i class="none"></i>
+ Support for Data Ingestion in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../udf.html" title="Support for User Defined Functions in AsterixDB">
+ <i class="none"></i>
+ Support for User Defined Functions in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
+ <i class="none"></i>
+ Filter-Based LSM Index Acceleration</a>
+ </li>
+
+ <li>
+
+ <a href="../api.html" title="HTTP API to AsterixDB">
+ <i class="none"></i>
+ HTTP API to AsterixDB</a>
+ </li>
+ </ul>
+
+
+
+ <hr class="divider" />
+
+ <div id="poweredBy">
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <a href="https://code.google.com/p/hyracks/" title="Hyracks" class="builtBy">
+ <img class="builtBy" alt="Hyracks" src="../images/hyrax_ts.png" />
+ </a>
+ </div>
+ </div>
+ </div>
+
+
+ <div id="bodyColumn" class="span9" >
+
+ <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements. See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership. The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License. You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied. See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>AsterixDB 101: An ADM and AQL Primer</h1>
+<div class="section">
+<h2><a name="Welcome_to_AsterixDB"></a>Welcome to AsterixDB!</h2>
+<p>This document introduces the main features of AsterixDB’s data model (ADM) and query language (AQL) by example. The example is a simple scenario involving (synthetic) sample data modeled after data from the social domain. This document describes a set of sample ADM datasets, together with a set of illustrative AQL queries, to introduce you to the “AsterixDB user experience”. The complete set of steps required to create and load a handful of sample datasets, along with runnable queries and the expected results for each query, are included.</p>
+<p>This document assumes that you are at least vaguely familiar with AsterixDB and why you might want to use it. Most importantly, it assumes you already have a running instance of AsterixDB and that you know how to query it using AsterixDB’s basic web interface. For more information on these topics, you should go through the steps in <a href="../install.html">Installing Asterix Using Managix</a> before reading this document and make sure that you have a running AsterixDB instance ready to go. To get your feet wet, you should probably start with a simple local installation of AsterixDB on your favorite machine, accepting all of the default settings that Managix offers. Later you can graduate to trying AsterixDB on a cluster, its real intended home (since it targets Big Data). (Note: With the exception of specifying the correct locations where you put the source data for this example, there should no changes needed in your ADM or AQL statements to run the examples locally and/or to run them on a cluster when you are ready to take that step.)</p>
+<p>As you read through this document, you should try each step for yourself on your own AsterixDB instance. Once you have reached the end, you will be fully armed and dangerous, with all the basic AsterixDB knowledge that you’ll need to start down the path of modeling, storing, and querying your own semistructured data.</p>
+<p>—-</p></div>
+<div class="section">
+<h2><a name="ADM:_Modeling_Semistructed_Data_in_AsterixDB"></a>ADM: Modeling Semistructed Data in AsterixDB</h2>
+<p>In this section you will learn all about modeling Big Data using ADM, the data model of the AsterixDB BDMS.</p>
+<div class="section">
+<h3><a name="Dataverses_Datatypes_and_Datasets"></a>Dataverses, Datatypes, and Datasets</h3>
+<p>The top-level organizing concept in the AsterixDB world is the <i>dataverse</i>. A dataverse—short for “data universe”—is a place (similar to a database in a relational DBMS) in which to create and manage the types, datasets, functions, and other artifacts for a given AsterixDB application. When you start using an AsterixDB instance for the first time, it starts out “empty”; it contains no data other than the AsterixDB system catalogs (which live in a special dataverse called the Metadata dataverse). To store your data in AsterixDB, you will first create a dataverse and then you use it for the <i>datatypes</i> and <i>datasets</i> for managing your own data. A datatype tells AsterixDB what you know (or more accurately, what you want it to know) a priori about one of the kinds of data instances that you want AsterixDB to hold for you. A dataset is a collection of data instances of a datatype, and AsterixDB makes sure that the data instances that you put in it conform to its specified type. Since AsterixDB targets semistructured data, you can use <i>open</i> datatypes and tell it as little or as much as you wish about your data up front; the more you tell it up front, the less information it will have to store repeatedly in the individual data instances that you give it. Instances of open datatypes are permitted to have additional content, beyond what the datatype says, as long as they at least contain the information prescribed by the datatype definition. Open typing allows data to vary from one instance to another and it leaves wiggle room for application evolution in terms of what might need to be stored in the future. If you want to restrict data instances in a dataset to have only what the datatype says, and nothing extra, you can define a <i>closed</i> datatype for that dataset and AsterixDB will keep users from storing objects that have extra data in them. Datatypes are open by default unless you tell AsterixDB otherwise. Let’s put these concepts to work</p>
+<p>Our little sample scenario involves hypothetical information about users of two popular social networks, Facebook and Twitter, and their messages. We’ll start by defining a dataverse called “TinySocial” to hold our datatypes and datasets. The AsterixDB data model (ADM) is essentially a superset of JSON—it’s what you get by extending JSON with more data types and additional data modeling constructs borrowed from object databases. The following is how we can create the TinySocial dataverse plus a set of ADM types for modeling Twitter users, their Tweets, Facebook users, their users’ employment information, and their messages. (Note: Keep in mind that this is just a tiny and somewhat silly example intended for illustrating some of the key features of AsterixDB. :-))</p>
+
+<div class="source">
+<div class="source">
+<pre> drop dataverse TinySocial if exists;
+ create dataverse TinySocial;
+ use dataverse TinySocial;
+
+ create type TwitterUserType as open {
+ screen-name: string,
+ lang: string,
+ friends_count: int64,
+ statuses_count: int64,
+ name: string,
+ followers_count: int64
+ }
+
+ create type TweetMessageType as closed {
+ tweetid: string,
+ user: TwitterUserType,
+ sender-location: point?,
+ send-time: datetime,
+ referred-topics: {{ string }},
+ message-text: string
+ }
+
+ create type EmploymentType as open {
+ organization-name: string,
+ start-date: date,
+ end-date: date?
+ }
+
+ create type FacebookUserType as closed {
+ id: int64,
+ alias: string,
+ name: string,
+ user-since: datetime,
+ friend-ids: {{ int64 }},
+ employment: [EmploymentType]
+ }
+
+ create type FacebookMessageType as closed {
+ message-id: int64,
+ author-id: int64,
+ in-response-to: int64?,
+ sender-location: point?,
+ message: string
+ }
+</pre></div></div>
+<p>The first three lines above tell AsterixDB to drop the old TinySocial dataverse, if one already exists, and then to create a brand new one and make it the focus of the statements that follow. The first type creation statement creates a datatype for holding information about Twitter users. It is a record type with a mix of integer and string data, very much like a (flat) relational tuple. The indicated fields are all mandatory, but because the type is open, additional fields are welcome. The second statement creates a datatype for Twitter messages; this shows how to specify a closed type. Interestingly (based on one of Twitter’s APIs), each Twitter message actually embeds an instance of the sending user’s information (current as of when the message was sent), so this is an example of a nested record in ADM. Twitter messages can optionally contain the sender’s location, which is modeled via the sender-location field of spatial type <i>point</i>; the question mark following the field type indicates its optionality. An optional field is like a nullable field in SQL—it may be present or missing, but when it’s present, its data type will conform to the datatype’s specification. The send-time field illustrates the use of a temporal primitive type, <i>datetime</i>. Lastly, the referred-topics field illustrates another way that ADM is richer than the relational model; this field holds a bag (a.k.a. an unordered list) of strings. Since the overall datatype definition for Twitter messages says “closed”, the fields that it lists are the only fields that instances of this type will be allowed to contain. The next two create type statements create a record type for holding information about one component of the employment history of a Facebook user and then a record type for holding the user information itself. The Facebook user type highlights a few additional ADM data model features. Its friend-ids field is a bag of integers, presumably the Facebook user ids for this user’s friends, and its employment field is an ordered list of employment records. The final create type statement defines a type for handling the content of a Facebook message in our hypothetical social data storage scenario.</p>
+<p>Before going on, we need to once again emphasize the idea that AsterixDB is aimed at storing and querying not just Big Data, but Big <i>Semistructured</i> Data. This means that most of the fields listed in the create type statements above could have been omitted without changing anything other than the resulting size of stored data instances on disk. AsterixDB stores its information about the fields defined a priori as separate metadata, whereas the information about other fields that are “just there” in instances of open datatypes is stored with each instance—making for more bits on disk and longer times for operations affected by data size (e.g., dataset scans). The only fields that <i>must</i> be specified a priori are the primary key. Indexes can be built on fields that don’t belong to the pre-specified part of datatype’s schema as long as their type is specified at index create time and and the <i>enforced</i> keyword is provided at the end of the index definition. (The <i>enforced</i> keyword asks the system to ensure that the indexed field or fields conform to this specified type in all of the dataset’s record instances where they are present.) Additionally, indexed fields may be nested arbitrarily deep within a dataset’s records as long as the nesting does not go pass through a list (be it ordered or unordered) along the way.</p></div>
+<div class="section">
+<h3><a name="Creating_Datasets_and_Indexes"></a>Creating Datasets and Indexes</h3>
+<p>Now that we have defined our datatypes, we can move on and create datasets to store the actual data. (If we wanted to, we could even have several named datasets based on any one of these datatypes.) We can do this as follows, utilizing the DDL capabilities of AsterixDB.</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ create dataset FacebookUsers(FacebookUserType)
+ primary key id;
+
+ create dataset FacebookMessages(FacebookMessageType)
+ primary key message-id;
+
+ create dataset TwitterUsers(TwitterUserType)
+ primary key screen-name;
+
+ create dataset TweetMessages(TweetMessageType)
+ primary key tweetid
+ hints(cardinality=100);
+
+ create index fbUserSinceIdx on FacebookUsers(user-since);
+ create index fbAuthorIdx on FacebookMessages(author-id) type btree;
+ create index fbSenderLocIndex on FacebookMessages(sender-location) type rtree;
+ create index fbMessageIdx on FacebookMessages(message) type keyword;
+
+ for $ds in dataset Metadata.Dataset return $ds;
+ for $ix in dataset Metadata.Index return $ix;
+</pre></div></div>
+<p>The ADM DDL statements above create four datasets for holding our social data in the TinySocial dataverse: FacebookUsers, FacebookMessages, TwitterUsers, and TweetMessages. The first statement creates the FacebookUsers data set. It specifies that this dataset will store data instances conforming to FacebookUserType and that it has a primary key which is the id field of each instance. The primary key information is used by AsterixDB to uniquely identify instances for the purpose of later lookup and for use in secondary indexes. Each AsterixDB dataset is stored (and indexed) in the form of a B+ tree on primary key; secondary indexes point to their indexed data by primary key. In AsterixDB clusters, the primary key is also used to hash-partition (a.k.a. shard) the dataset across the nodes of the cluster. The next three create dataset statements are similar. The last one illustrates an optional clause for providing useful hints to AsterixDB. In this case, the hint tells AsterixDB that the dataset definer is anticipating that the TweetMessages dataset will contain roughly 100 objects; knowing this can help AsterixDB to more efficiently manage and query this dataset. (AsterixDB does not yet gather and maintain data statistics; it will currently, abitrarily, assume a cardinality of one million objects per dataset in the absence of such an optional definition-time hint.)</p>
+<p>The create dataset statements above are followed by four more DDL statements, each of which creates a secondary index on a field of one of the datasets. The first one indexes the FacebookUsers dataset on its user-since field. This index will be a B+ tree index; its type is unspecified and <i>btree</i> is the default type. The other three illustrate how you can explicitly specify the desired type of index. In addition to btree, <i>rtree</i> and inverted <i>keyword</i> indexes are supported by AsterixDB. Indexes can also have composite keys, and more advanced text indexing is available as well (ngram(k), where k is the desired gram length).</p></div>
+<div class="section">
+<h3><a name="Querying_the_Metadata_Dataverse"></a>Querying the Metadata Dataverse</h3>
+<p>The last two statements above show how you can use queries in AQL to examine the AsterixDB system catalogs and tell what artifacts you have created. Just as relational DBMSs use their own tables to store their catalogs, AsterixDB uses its own datasets to persist descriptions of its datasets, datatypes, indexes, and so on. Running the first of the two queries above will list all of your newly created datasets, and it will also show you a full list of all the metadata datasets. (You can then explore from there on your own if you are curious) These last two queries also illustrate one other factoid worth knowing: AsterixDB allows queries to span dataverses by allowing the optional use of fully-qualified dataset names (i.e., <i>dataversename.datasetname</i>) to reference datasets that live in a dataverse other than the one that was named in the most recently executed <i>use dataverse</i> directive.</p>
+<p>—-</p></div></div>
+<div class="section">
+<h2><a name="Loading_Data_Into_AsterixDB"></a>Loading Data Into AsterixDB</h2>
+<p>Okay, so far so good—AsterixDB is now ready for data, so let’s give it some data to store Our next task will be to load some sample data into the four datasets that we just defined. Here we will load a tiny set of records, defined in ADM format (a superset of JSON), into each dataset. In the boxes below you can see the actual data instances contained in each of the provided sample files. In order to load this data yourself, you should first store the four corresponding <tt>.adm</tt> files (whose URLs are indicated on top of each box below) into a filesystem directory accessible to your running AsterixDB instance. Take a few minutes to look carefully at each of the sample data sets. This will give you a better sense of the nature of the data that we are about to load and query. We should note that ADM format is a textual serialization of what AsterixDB will actually store; when persisted in AsterixDB, the data format will be binary and the data in the predefined fields of the data instances will be stored separately from their associated field name and type metadata.</p>
+<p><a href="../data/twu.adm">Twitter Users</a></p>
+
+<div class="source">
+<div class="source">
+<pre> {"screen-name":"NathanGiesen@211","lang":"en","friends_count":18,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416}
+ {"screen-name":"ColineGeyer@63","lang":"en","friends_count":121,"statuses_count":362,"name":"Coline Geyer","followers_count":17159}
+ {"screen-name":"NilaMilliron_tw","lang":"en","friends_count":445,"statuses_count":164,"name":"Nila Milliron","followers_count":22649}
+ {"screen-name":"ChangEwing_573","lang":"en","friends_count":182,"statuses_count":394,"name":"Chang Ewing","followers_count":32136}
+</pre></div></div>
+<p><a href="../data/twm.adm">Tweet Messages</a></p>
+
+<div class="source">
+<div class="source">
+<pre> {"tweetid":"1","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("47.44,80.65"),"send-time":datetime("2008-04-26T10:10:00"),"referred-topics":{{"t-mobile","customization"}},"message-text":" love t-mobile its customization is good:)"}
+ {"tweetid":"2","user":{"screen-name":"ColineGeyer@63","lang":"en","friends_count":121,"statuses_count":362,"name":"Coline Geyer","followers_count":17159},"sender-location":point("32.84,67.14"),"send-time":datetime("2010-05-13T10:10:00"),"referred-topics":{{"verizon","shortcut-menu"}},"message-text":" like verizon its shortcut-menu is awesome:)"}
+ {"tweetid":"3","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("29.72,75.8"),"send-time":datetime("2006-11-04T10:10:00"),"referred-topics":{{"motorola","speed"}},"message-text":" like motorola the speed is good:)"}
+ {"tweetid":"4","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("39.28,70.48"),"send-time":datetime("2011-12-26T10:10:00"),"referred-topics":{{"sprint","voice-command"}},"message-text":" like sprint the voice-command is mind-blowing:)"}
+ {"tweetid":"5","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("40.09,92.69"),"send-time":datetime("2006-08-04T10:10:00"),"referred-topics":{{"motorola","speed"}},"message-text":" can't stand motorola its speed is terrible:("}
+ {"tweetid":"6","user":{"screen-name":"ColineGeyer@63","lang":"en","friends_count":121,"statuses_count":362,"name":"Coline Geyer","followers_count":17159},"sender-location":point("47.51,83.99"),"send-time":datetime("2010-05-07T10:10:00"),"referred-topics":{{"iphone","voice-clarity"}},"message-text":" like iphone the voice-clarity is good:)"}
+ {"tweetid":"7","user":{"screen-name":"ChangEwing_573","lang":"en","friends_count":182,"statuses_count":394,"name":"Chang Ewing","followers_count":32136},"sender-location":point("36.21,72.6"),"send-time":datetime("2011-08-25T10:10:00"),"referred-topics":{{"samsung","platform"}},"message-text":" like samsung the platform is good"}
+ {"tweetid":"8","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("46.05,93.34"),"send-time":datetime("2005-10-14T10:10:00"),"referred-topics":{{"t-mobile","shortcut-menu"}},"message-text":" like t-mobile the shortcut-menu is awesome:)"}
+ {"tweetid":"9","user":{"screen-name":"NathanGiesen@211","lang":"en","friends_count":39339,"statuses_count":473,"name":"Nathan Giesen","followers_count":49416},"sender-location":point("36.86,74.62"),"send-time":datetime("2012-07-21T10:10:00"),"referred-topics":{{"verizon","voicemail-service"}},"message-text":" love verizon its voicemail-service is awesome"}
+ {"tweetid":"10","user":{"screen-name":"ColineGeyer@63","lang":"en","friends_count":121,"statuses_count":362,"name":"Coline Geyer","followers_count":17159},"sender-location":point("29.15,76.53"),"send-time":datetime("2008-01-26T10:10:00"),"referred-topics":{{"verizon","voice-clarity"}},"message-text":" hate verizon its voice-clarity is OMG:("}
+ {"tweetid":"11","user":{"screen-name":"NilaMilliron_tw","lang":"en","friends_count":445,"statuses_count":164,"name":"Nila Milliron","followers_count":22649},"sender-location":point("37.59,68.42"),"send-time":datetime("2008-03-09T10:10:00"),"referred-topics":{{"iphone","platform"}},"message-text":" can't stand iphone its platform is terrible"}
+ {"tweetid":"12","user":{"screen-name":"OliJackson_512","lang":"en","friends_count":445,"statuses_count":164,"name":"Oli Jackson","followers_count":22649},"sender-location":point("24.82,94.63"),"send-time":datetime("2010-02-13T10:10:00"),"referred-topics":{{"samsung","voice-command"}},"message-text":" like samsung the voice-command is amazing:)"}
+</pre></div></div>
+<p><a href="../data/fbu.adm">Facebook Users</a></p>
+
+<div class="source">
+<div class="source">
+<pre> {"id":1,"alias":"Margarita","name":"MargaritaStoddard","user-since":datetime("2012-08-20T10:10:00"),"friend-ids":{{2,3,6,10}},"employment":[{"organization-name":"Codetechno","start-date":date("2006-08-06")}]}
+ {"id":2,"alias":"Isbel","name":"IsbelDull","user-since":datetime("2011-01-22T10:10:00"),"friend-ids":{{1,4}},"employment":[{"organization-name":"Hexviafind","start-date":date("2010-04-27")}]}
+ {"id":3,"alias":"Emory","name":"EmoryUnk","user-since":datetime("2012-07-10T10:10:00"),"friend-ids":{{1,5,8,9}},"employment":[{"organization-name":"geomedia","start-date":date("2010-06-17"),"end-date":date("2010-01-26")}]}
+ {"id":4,"alias":"Nicholas","name":"NicholasStroh","user-since":datetime("2010-12-27T10:10:00"),"friend-ids":{{2}},"employment":[{"organization-name":"Zamcorporation","start-date":date("2010-06-08")}]}
+ {"id":5,"alias":"Von","name":"VonKemble","user-since":datetime("2010-01-05T10:10:00"),"friend-ids":{{3,6,10}},"employment":[{"organization-name":"Kongreen","start-date":date("2010-11-27")}]}
+ {"id":6,"alias":"Willis","name":"WillisWynne","user-since":datetime("2005-01-17T10:10:00"),"friend-ids":{{1,3,7}},"employment":[{"organization-name":"jaydax","start-date":date("2009-05-15")}]}
+ {"id":7,"alias":"Suzanna","name":"SuzannaTillson","user-since":datetime("2012-08-07T10:10:00"),"friend-ids":{{6}},"employment":[{"organization-name":"Labzatron","start-date":date("2011-04-19")}]}
+ {"id":8,"alias":"Nila","name":"NilaMilliron","user-since":datetime("2008-01-01T10:10:00"),"friend-ids":{{3}},"employment":[{"organization-name":"Plexlane","start-date":date("2010-02-28")}]}
+ {"id":9,"alias":"Woodrow","name":"WoodrowNehling","user-since":datetime("2005-09-20T10:10:00"),"friend-ids":{{3,10}},"employment":[{"organization-name":"Zuncan","start-date":date("2003-04-22"),"end-date":date("2009-12-13")}]}
+ {"id":10,"alias":"Bram","name":"BramHatch","user-since":datetime("2010-10-16T10:10:00"),"friend-ids":{{1,5,9}},"employment":[{"organization-name":"physcane","start-date":date("2007-06-05"),"end-date":date("2011-11-05")}]}
+</pre></div></div>
+<p><a href="../data/fbm.adm">Facebook Messages</a></p>
+
+<div class="source">
+<div class="source">
+<pre> {"message-id":1,"author-id":3,"in-response-to":2,"sender-location":point("47.16,77.75"),"message":" love sprint its shortcut-menu is awesome:)"}
+ {"message-id":2,"author-id":1,"in-response-to":4,"sender-location":point("41.66,80.87"),"message":" dislike iphone its touch-screen is horrible"}
+ {"message-id":3,"author-id":2,"in-response-to":4,"sender-location":point("48.09,81.01"),"message":" like samsung the plan is amazing"}
+ {"message-id":4,"author-id":1,"in-response-to":2,"sender-location":point("37.73,97.04"),"message":" can't stand at&t the network is horrible:("}
+ {"message-id":5,"author-id":6,"in-response-to":2,"sender-location":point("34.7,90.76"),"message":" love sprint the customization is mind-blowing"}
+ {"message-id":6,"author-id":2,"in-response-to":1,"sender-location":point("31.5,75.56"),"message":" like t-mobile its platform is mind-blowing"}
+ {"message-id":7,"author-id":5,"in-response-to":15,"sender-location":point("32.91,85.05"),"message":" dislike sprint the speed is horrible"}
+ {"message-id":8,"author-id":1,"in-response-to":11,"sender-location":point("40.33,80.87"),"message":" like verizon the 3G is awesome:)"}
+ {"message-id":9,"author-id":3,"in-response-to":12,"sender-location":point("34.45,96.48"),"message":" love verizon its wireless is good"}
+ {"message-id":10,"author-id":1,"in-response-to":12,"sender-location":point("42.5,70.01"),"message":" can't stand motorola the touch-screen is terrible"}
+ {"message-id":11,"author-id":1,"in-response-to":1,"sender-location":point("38.97,77.49"),"message":" can't stand at&t its plan is terrible"}
+ {"message-id":12,"author-id":10,"in-response-to":6,"sender-location":point("42.26,77.76"),"message":" can't stand t-mobile its voicemail-service is OMG:("}
+ {"message-id":13,"author-id":10,"in-response-to":4,"sender-location":point("42.77,78.92"),"message":" dislike iphone the voice-command is bad:("}
+ {"message-id":14,"author-id":9,"in-response-to":12,"sender-location":point("41.33,85.28"),"message":" love at&t its 3G is good:)"}
+ {"message-id":15,"author-id":7,"in-response-to":11,"sender-location":point("44.47,67.11"),"message":" like iphone the voicemail-service is awesome"}
+</pre></div></div>
+<p>It’s loading time! We can use AQL <i>load</i> statements to populate our datasets with the sample records shown above. The following shows how loading can be done for data stored in <tt>.adm</tt> files in your local filesystem. <i>Note:</i> You <i>MUST</i> replace the <tt><Host Name></tt> and <tt><Absolute File Path></tt> placeholders in each load statement below with valid values based on the host IP address (or host name) for the machine and directory that you have downloaded the provided <tt>.adm</tt> files to. As you do so, be very, very careful to retain the two slashes in the load statements, i.e., do not delete the two slashes that appear in front of the absolute path to your <tt>.adm</tt> files. (This will lead to a three-slash character sequence at the start of each load statement’s file input path specification.)</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ load dataset FacebookUsers using localfs
+ (("path"="<Host Name>://<Absolute File Path>/fbu.adm"),("format"="adm"));
+
+ load dataset FacebookMessages using localfs
+ (("path"="<Host Name>://<Absolute File Path>/fbm.adm"),("format"="adm"));
+
+ load dataset TwitterUsers using localfs
+ (("path"="<Host Name>://<Absolute File Path>/twu.adm"),("format"="adm"));
+
+ load dataset TweetMessages using localfs
+ (("path"="<Host Name>://<Absolute File Path>/twm.adm"),("format"="adm"));
+</pre></div></div>
+<p>—-</p></div>
+<div class="section">
+<h2><a name="AQL:_Querying_Your_AsterixDB_Data"></a>AQL: Querying Your AsterixDB Data</h2>
+<p>Congratulations! You now have sample social data stored (and indexed) in AsterixDB. (You are part of an elite and adventurous group of individuals. :-)) Now that you have successfully loaded the provided sample data into the datasets that we defined, you can start running queries against them.</p>
+<p>The query language for AsterixDB is AQL—the Asterix Query Language. AQL is loosely based on XQuery, the language developed and standardized in the early to mid 2000’s by the World Wide Web Consortium (W3C) for querying semistructured data stored in their XML format. We have tossed all of the “XML cruft” out of their language but retained many of its core ideas. We did this because its design was developed over a period of years by a diverse committee of smart and experienced language designers, including “SQL people”, “functional programming people”, and “XML people”, all of whom were focused on how to design a new query language that operates well over semistructured data. (We decided to stand on their shoulders instead of starting from scratch and revisiting many of the same issues.) Note that AQL is not SQL and not based on SQL: In other words, AsterixDB is fully “NoSQL compliant”. :-)</p>
+<p>In this section we introduce AQL via a set of example queries, along with their expected results, based on the data above, to help you get started. Many of the most important features of AQL are presented in this set of representative queries. You can find more details in the document on the <a href="datamodel.html">Asterix Data Model (ADM)</a>, in the <a href="manual.html">AQL Reference Manual</a>, and a complete list of built-in functions is available in the <a href="functions.html">Asterix Functions</a> document.</p>
+<p>AQL is an expression language. Even the expression 1+1 is a valid AQL query that evaluates to 2. (Try it for yourself! Okay, maybe that’s <i>not</i> the best use of a 512-node shared-nothing compute cluster.) Most useful AQL queries will be based on the <i>FLWOR</i> (pronounced “flower”) expression structure that AQL has borrowed from XQuery ((<a class="externalLink" href="http://en.wikipedia.org/wiki/FLWOR))">http://en.wikipedia.org/wiki/FLWOR))</a>. The FLWOR expression syntax supports both the incremental binding (<i>for</i>) of variables to ADM data instances in a dataset (or in the result of any AQL expression, actually) and the full binding (<i>let</i>) of variables to entire intermediate results in a fashion similar to temporary views in the SQL world. FLWOR is an acronym that is short for <i>for</i>-<i>let</i>-<i>where</i>-<i>order by</i>-<i>return</i>, naming five of the most frequently used clauses from the syntax of a full AQL query. AQL also includes <i>group by</i> and <i>limit</i> clauses, as you will see shortly. Roughly speaking, for SQL afficiandos, the <i>for</i> clause in AQL is like the <i>from</i> clause in SQL, the <i>return</i> clause in AQL is like the <i>select</i> clause in SQL (but appears at the end instead of the beginning of a query), the <i>let</i> clause in AQL is like SQL’s <i>with</i> clause, and the <i>where</i> and <i>order by</i> clauses in both languages are similar.</p>
+<p>Enough talk! Let’s go ahead and try writing some queries and see about learning AQL by example.</p>
+<div class="section">
+<h3><a name="Query_0-A_-_Exact-Match_Lookup"></a>Query 0-A - Exact-Match Lookup</h3>
+<p>For our first query, let’s find a Facebook user based on his or her user id. Suppose the user we want is the user whose id is 8:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset FacebookUsers
+ where $user.id = 8
+ return $user;
+</pre></div></div>
+<p>The query’s <i>for</i> clause binds the variable <tt>$user</tt> incrementally to the data instances residing in the dataset named FacebookUsers. Its <i>where</i> clause selects only those bindings having a user id of interest, filtering out the rest. The <i>return</i> clause returns the (entire) data instance for each binding that satisfies the predicate. Since this dataset is indexed on user id (its primary key), this query will be done via a quick index lookup.</p>
+<p>The expected result for our sample data is as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 8, "alias": "Nila", "name": "NilaMilliron", "user-since": datetime("2008-01-01T10:10:00.000Z"), "friend-ids": {{ 3 }}, "employment": [ { "organization-name": "Plexlane", "start-date": date("2010-02-28"), "end-date": null } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_0-B_-_Range_Scan"></a>Query 0-B - Range Scan</h3>
+<p>AQL, like SQL, supports a variety of different predicates. For example, for our next query, let’s find the Facebook users whose ids are in the range between 2 and 4:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset FacebookUsers
+ where $user.id >= 2 and $user.id <= 4
+ return $user;
+</pre></div></div>
+<p>This query’s expected result, also evaluable using the primary index on user id, is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 2, "alias": "Isbel", "name": "IsbelDull", "user-since": datetime("2011-01-22T10:10:00.000Z"), "friend-ids": {{ 1, 4 }}, "employment": [ { "organization-name": "Hexviafind", "start-date": date("2010-04-27"), "end-date": null } ] }
+ { "id": 3, "alias": "Emory", "name": "EmoryUnk", "user-since": datetime("2012-07-10T10:10:00.000Z"), "friend-ids": {{ 1, 5, 8, 9 }}, "employment": [ { "organization-name": "geomedia", "start-date": date("2010-06-17"), "end-date": date("2010-01-26") } ] }
+ { "id": 4, "alias": "Nicholas", "name": "NicholasStroh", "user-since": datetime("2010-12-27T10:10:00.000Z"), "friend-ids": {{ 2 }}, "employment": [ { "organization-name": "Zamcorporation", "start-date": date("2010-06-08"), "end-date": null } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_1_-_Other_Query_Filters"></a>Query 1 - Other Query Filters</h3>
+<p>AQL can do range queries on any data type that supports the appropriate set of comparators. As an example, this next query retrieves the Facebook users who joined between July 22, 2010 and July 29, 2012:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset FacebookUsers
+ where $user.user-since >= datetime('2010-07-22T00:00:00')
+ and $user.user-since <= datetime('2012-07-29T23:59:59')
+ return $user;
+</pre></div></div>
+<p>The expected result for this query, also an indexable query, is as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 2, "alias": "Isbel", "name": "IsbelDull", "user-since": datetime("2011-01-22T10:10:00.000Z"), "friend-ids": {{ 1, 4 }}, "employment": [ { "organization-name": "Hexviafind", "start-date": date("2010-04-27"), "end-date": null } ] }
+ { "id": 3, "alias": "Emory", "name": "EmoryUnk", "user-since": datetime("2012-07-10T10:10:00.000Z"), "friend-ids": {{ 1, 5, 8, 9 }}, "employment": [ { "organization-name": "geomedia", "start-date": date("2010-06-17"), "end-date": date("2010-01-26") } ] }
+ { "id": 4, "alias": "Nicholas", "name": "NicholasStroh", "user-since": datetime("2010-12-27T10:10:00.000Z"), "friend-ids": {{ 2 }}, "employment": [ { "organization-name": "Zamcorporation", "start-date": date("2010-06-08"), "end-date": null } ] }
+ { "id": 10, "alias": "Bram", "name": "BramHatch", "user-since": datetime("2010-10-16T10:10:00.000Z"), "friend-ids": {{ 1, 5, 9 }}, "employment": [ { "organization-name": "physcane", "start-date": date("2007-06-05"), "end-date": date("2011-11-05") } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_2-A_-_Equijoin"></a>Query 2-A - Equijoin</h3>
+<p>In addition to simply binding variables to data instances and returning them “whole”, an AQL query can construct new ADM instances to return based on combinations of its variable bindings. This gives AQL the power to do joins much like those done using multi-table <i>from</i> clauses in SQL. For example, suppose we wanted a list of all Facebook users paired with their associated messages, with the list enumerating the author name and the message text associated with each Facebook message. We could do this as follows in AQL:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset FacebookUsers
+ for $message in dataset FacebookMessages
+ where $message.author-id = $user.id
+ return {
+ "uname": $user.name,
+ "message": $message.message
+ };
+</pre></div></div>
+<p>The result of this query is a sequence of new ADM instances, one for each author/message pair. Each instance in the result will be an ADM record containing two fields, “uname” and “message”, containing the user’s name and the message text, respectively, for each author/message pair. (Note that “uname” and “message” are both simple AQL expressions themselves—so in the most general case, even the resulting field names can be computed as part of the query, making AQL a very powerful tool for slicing and dicing semistructured data.)</p>
+<p>The expected result of this example AQL join query for our sample data set is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "uname": "MargaritaStoddard", "message": " dislike iphone its touch-screen is horrible" }
+ { "uname": "MargaritaStoddard", "message": " can't stand at&t the network is horrible:(" }
+ { "uname": "MargaritaStoddard", "message": " like verizon the 3G is awesome:)" }
+ { "uname": "MargaritaStoddard", "message": " can't stand motorola the touch-screen is terrible" }
+ { "uname": "MargaritaStoddard", "message": " can't stand at&t its plan is terrible" }
+ { "uname": "IsbelDull", "message": " like samsung the plan is amazing" }
+ { "uname": "IsbelDull", "message": " like t-mobile its platform is mind-blowing" }
+ { "uname": "EmoryUnk", "message": " love sprint its shortcut-menu is awesome:)" }
+ { "uname": "EmoryUnk", "message": " love verizon its wireless is good" }
+ { "uname": "VonKemble", "message": " dislike sprint the speed is horrible" }
+ { "uname": "WillisWynne", "message": " love sprint the customization is mind-blowing" }
+ { "uname": "SuzannaTillson", "message": " like iphone the voicemail-service is awesome" }
+ { "uname": "WoodrowNehling", "message": " love at&t its 3G is good:)" }
+ { "uname": "BramHatch", "message": " can't stand t-mobile its voicemail-service is OMG:(" }
+ { "uname": "BramHatch", "message": " dislike iphone the voice-command is bad:(" }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_2-B_-_Index_join"></a>Query 2-B - Index join</h3>
+<p>By default, AsterixDB evaluates equijoin queries using hash-based join methods that work well for doing ad hoc joins of very large data sets (<a class="externalLink" href="http://en.wikipedia.org/wiki/Hash_join">http://en.wikipedia.org/wiki/Hash_join</a>). On a cluster, hash partitioning is employed as AsterixDB’s divide-and-conquer strategy for computing large parallel joins. AsterixDB includes other join methods, but in the absence of data statistics and selectivity estimates, it doesn’t (yet) have the know-how to intelligently choose among its alternatives. We therefore asked ourselves the classic question—WWOD?—What Would Oracle Do?—and in the interim, AQL includes a clunky (but useful) hint-based mechanism for addressing the occasional need to suggest to AsterixDB which join method it should use for a particular AQL query.</p>
+<p>The following query is similar to Query 2-A but includes a suggestion to AsterixDB that it should consider employing an index-based nested-loop join technique to process the query:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset FacebookUsers
+ for $message in dataset FacebookMessages
+ where $message.author-id /*+ indexnl */ = $user.id
+ return {
+ "uname": $user.name,
+ "message": $message.message
+ };
+</pre></div></div>
+<p>The expected result is (of course) the same as before, modulo the order of the instances. Result ordering is (intentionally) undefined in AQL in the absence of an <i>order by</i> clause. The query result for our sample data in this case is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "uname": "EmoryUnk", "message": " love sprint its shortcut-menu is awesome:)" }
+ { "uname": "MargaritaStoddard", "message": " dislike iphone its touch-screen is horrible" }
+ { "uname": "IsbelDull", "message": " like samsung the plan is amazing" }
+ { "uname": "MargaritaStoddard", "message": " can't stand at&t the network is horrible:(" }
+ { "uname": "WillisWynne", "message": " love sprint the customization is mind-blowing" }
+ { "uname": "IsbelDull", "message": " like t-mobile its platform is mind-blowing" }
+ { "uname": "VonKemble", "message": " dislike sprint the speed is horrible" }
+ { "uname": "MargaritaStoddard", "message": " like verizon the 3G is awesome:)" }
+ { "uname": "EmoryUnk", "message": " love verizon its wireless is good" }
+ { "uname": "MargaritaStoddard", "message": " can't stand motorola the touch-screen is terrible" }
+ { "uname": "MargaritaStoddard", "message": " can't stand at&t its plan is terrible" }
+ { "uname": "BramHatch", "message": " can't stand t-mobile its voicemail-service is OMG:(" }
+ { "uname": "BramHatch", "message": " dislike iphone the voice-command is bad:(" }
+ { "uname": "WoodrowNehling", "message": " love at&t its 3G is good:)" }
+ { "uname": "SuzannaTillson", "message": " like iphone the voicemail-service is awesome" }
+</pre></div></div>
+<p>(It is worth knowing, with respect to influencing AsterixDB’s query evaluation, that nested <i>for</i> clauses—a.k.a. joins— are currently evaluated with the “outer” clause probing the data of the “inner” clause.)</p></div>
+<div class="section">
+<h3><a name="Query_3_-_Nested_Outer_Join"></a>Query 3 - Nested Outer Join</h3>
+<p>In order to support joins between tables with missing/dangling join tuples, the designers of SQL ended up shoe-horning a subset of the relational algebra into SQL’s <i>from</i> clause syntax—and providing a variety of join types there for users to choose from. Left outer joins are particularly important in SQL, e.g., to print a summary of customers and orders, grouped by customer, without omitting those customers who haven’t placed any orders yet.</p>
+<p>The AQL language supports nesting, both of queries and of query results, and the combination allows for an arguably cleaner/more natural approach to such queries. As an example, supposed we wanted, for each Facebook user, to produce a record that has his/her name plus a list of the messages written by that user. In SQL, this would involve a left outer join between users and messages, grouping by user, and having the user name repeated along side each message. In AQL, this sort of use case can be handled (more naturally) as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset FacebookUsers
+ return {
+ "uname": $user.name,
+ "messages": for $message in dataset FacebookMessages
+ where $message.author-id = $user.id
+ return $message.message
+ };
+</pre></div></div>
+<p>This AQL query binds the variable <tt>$user</tt> to the data instances in FacebookUsers; for each user, it constructs a result record containing a “uname” field with the user’s name and a “messages” field with a nested collection of all messages for that user. The nested collection for each user is specified by using a correlated subquery. (Note: While it looks like nested loops could be involved in computing the result, AsterixDB recogizes the equivalence of such a query to an outerjoin, and it will use an efficient hash-based strategy when actually computing the query’s result.)</p>
+<p>Here is this example query’s expected output:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "uname": "MargaritaStoddard", "messages": [ " dislike iphone its touch-screen is horrible", " can't stand at&t the network is horrible:(", " like verizon the 3G is awesome:)", " can't stand motorola the touch-screen is terrible", " can't stand at&t its plan is terrible" ] }
+ { "uname": "IsbelDull", "messages": [ " like samsung the plan is amazing", " like t-mobile its platform is mind-blowing" ] }
+ { "uname": "EmoryUnk", "messages": [ " love sprint its shortcut-menu is awesome:)", " love verizon its wireless is good" ] }
+ { "uname": "NicholasStroh", "messages": [ ] }
+ { "uname": "VonKemble", "messages": [ " dislike sprint the speed is horrible" ] }
+ { "uname": "WillisWynne", "messages": [ " love sprint the customization is mind-blowing" ] }
+ { "uname": "SuzannaTillson", "messages": [ " like iphone the voicemail-service is awesome" ] }
+ { "uname": "NilaMilliron", "messages": [ ] }
+ { "uname": "WoodrowNehling", "messages": [ " love at&t its 3G is good:)" ] }
+ { "uname": "BramHatch", "messages": [ " dislike iphone the voice-command is bad:(", " can't stand t-mobile its voicemail-service is OMG:(" ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_4_-_Theta_Join"></a>Query 4 - Theta Join</h3>
+<p>Not all joins are expressible as equijoins and computable using equijoin-oriented algorithms. The join predicates for some use cases involve predicates with functions; AsterixDB supports the expression of such queries and will still evaluate them as best it can using nested loop based techniques (and broadcast joins in the parallel case).</p>
+<p>As an example of such a use case, suppose that we wanted, for each tweet T, to find all of the other tweets that originated from within a circle of radius of 1 surrounding tweet T’s location. In AQL, this can be specified in a manner similar to the previous query using one of the built-in functions on the spatial data type instead of id equality in the correlated query’s <i>where</i> clause:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $t in dataset TweetMessages
+ return {
+ "message": $t.message-text,
+ "nearby-messages": for $t2 in dataset TweetMessages
+ where spatial-distance($t.sender-location, $t2.sender-location) <= 1
+ return { "msgtxt":$t2.message-text}
+ };
+</pre></div></div>
+<p>Here is the expected result for this query:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "message": " love t-mobile its customization is good:)", "nearby-messages": [ { "msgtxt": " love t-mobile its customization is good:)" } ] }
+ { "message": " hate verizon its voice-clarity is OMG:(", "nearby-messages": [ { "msgtxt": " like motorola the speed is good:)" }, { "msgtxt": " hate verizon its voice-clarity is OMG:(" } ] }
+ { "message": " can't stand iphone its platform is terrible", "nearby-messages": [ { "msgtxt": " can't stand iphone its platform is terrible" } ] }
+ { "message": " like samsung the voice-command is amazing:)", "nearby-messages": [ { "msgtxt": " like samsung the voice-command is amazing:)" } ] }
+ { "message": " like verizon its shortcut-menu is awesome:)", "nearby-messages": [ { "msgtxt": " like verizon its shortcut-menu is awesome:)" } ] }
+ { "message": " like motorola the speed is good:)", "nearby-messages": [ { "msgtxt": " hate verizon its voice-clarity is OMG:(" }, { "msgtxt": " like motorola the speed is good:)" } ] }
+ { "message": " like sprint the voice-command is mind-blowing:)", "nearby-messages": [ { "msgtxt": " like sprint the voice-command is mind-blowing:)" } ] }
+ { "message": " can't stand motorola its speed is terrible:(", "nearby-messages": [ { "msgtxt": " can't stand motorola its speed is terrible:(" } ] }
+ { "message": " like iphone the voice-clarity is good:)", "nearby-messages": [ { "msgtxt": " like iphone the voice-clarity is good:)" } ] }
+ { "message": " like samsung the platform is good", "nearby-messages": [ { "msgtxt": " like samsung the platform is good" } ] }
+ { "message": " like t-mobile the shortcut-menu is awesome:)", "nearby-messages": [ { "msgtxt": " like t-mobile the shortcut-menu is awesome:)" } ] }
+ { "message": " love verizon its voicemail-service is awesome", "nearby-messages": [ { "msgtxt": " love verizon its voicemail-service is awesome" } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_5_-_Fuzzy_Join"></a>Query 5 - Fuzzy Join</h3>
+<p>As another example of a non-equijoin use case, we could ask AsterixDB to find, for each Facebook user, all Twitter users with names “similar” to their name. AsterixDB supports a variety of “fuzzy match” functions for use with textual and set-based data. As one example, we could choose to use edit distance with a threshold of 3 as the definition of name similarity, in which case we could write the following query using AQL’s operator-based syntax (~=) for testing whether or not two values are similar:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ set simfunction "edit-distance";
+ set simthreshold "3";
+
+ for $fbu in dataset FacebookUsers
+ return {
+ "id": $fbu.id,
+ "name": $fbu.name,
+ "similar-users": for $t in dataset TweetMessages
+ let $tu := $t.user
+ where $tu.name ~= $fbu.name
+ return {
+ "twitter-screenname": $tu.screen-name,
+ "twitter-name": $tu.name
+ }
+ };
+</pre></div></div>
+<p>The expected result for this query against our sample data is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 1, "name": "MargaritaStoddard", "similar-users": [ ] }
+ { "id": 2, "name": "IsbelDull", "similar-users": [ ] }
+ { "id": 3, "name": "EmoryUnk", "similar-users": [ ] }
+ { "id": 4, "name": "NicholasStroh", "similar-users": [ ] }
+ { "id": 5, "name": "VonKemble", "similar-users": [ ] }
+ { "id": 6, "name": "WillisWynne", "similar-users": [ ] }
+ { "id": 7, "name": "SuzannaTillson", "similar-users": [ ] }
+ { "id": 8, "name": "NilaMilliron", "similar-users": [ { "twitter-screenname": "NilaMilliron_tw", "twitter-name": "Nila Milliron" } ] }
+ { "id": 9, "name": "WoodrowNehling", "similar-users": [ ] }
+ { "id": 10, "name": "BramHatch", "similar-users": [ ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_6_-_Existential_Quantification"></a>Query 6 - Existential Quantification</h3>
+<p>The expressive power of AQL includes support for queries involving “some” (existentially quantified) and “all” (universally quantified) query semantics. As an example of an existential AQL query, here we show a query to list the Facebook users who are currently employed. Such employees will have an employment history containing a record with a null end-date value, which leads us to the following AQL query:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $fbu in dataset FacebookUsers
+ where (some $e in $fbu.employment satisfies is-null($e.end-date))
+ return $fbu;
+</pre></div></div>
+<p>The expected result in this case is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "user-since": datetime("2012-08-20T10:10:00.000Z"), "friend-ids": {{ 2, 3, 6, 10 }}, "employment": [ { "organization-name": "Codetechno", "start-date": date("2006-08-06"), "end-date": null } ] }
+ { "id": 2, "alias": "Isbel", "name": "IsbelDull", "user-since": datetime("2011-01-22T10:10:00.000Z"), "friend-ids": {{ 1, 4 }}, "employment": [ { "organization-name": "Hexviafind", "start-date": date("2010-04-27"), "end-date": null } ] }
+ { "id": 4, "alias": "Nicholas", "name": "NicholasStroh", "user-since": datetime("2010-12-27T10:10:00.000Z"), "friend-ids": {{ 2 }}, "employment": [ { "organization-name": "Zamcorporation", "start-date": date("2010-06-08"), "end-date": null } ] }
+ { "id": 5, "alias": "Von", "name": "VonKemble", "user-since": datetime("2010-01-05T10:10:00.000Z"), "friend-ids": {{ 3, 6, 10 }}, "employment": [ { "organization-name": "Kongreen", "start-date": date("2010-11-27"), "end-date": null } ] }
+ { "id": 6, "alias": "Willis", "name": "WillisWynne", "user-since": datetime("2005-01-17T10:10:00.000Z"), "friend-ids": {{ 1, 3, 7 }}, "employment": [ { "organization-name": "jaydax", "start-date": date("2009-05-15"), "end-date": null } ] }
+ { "id": 7, "alias": "Suzanna", "name": "SuzannaTillson", "user-since": datetime("2012-08-07T10:10:00.000Z"), "friend-ids": {{ 6 }}, "employment": [ { "organization-name": "Labzatron", "start-date": date("2011-04-19"), "end-date": null } ] }
+ { "id": 8, "alias": "Nila", "name": "NilaMilliron", "user-since": datetime("2008-01-01T10:10:00.000Z"), "friend-ids": {{ 3 }}, "employment": [ { "organization-name": "Plexlane", "start-date": date("2010-02-28"), "end-date": null } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_7_-_Universal_Quantification"></a>Query 7 - Universal Quantification</h3>
+<p>As an example of a universal AQL query, here we show a query to list the Facebook users who are currently unemployed. Such employees will have an employment history containing no records with null end-date values, leading us to the following AQL query:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $fbu in dataset FacebookUsers
+ where (every $e in $fbu.employment satisfies not(is-null($e.end-date)))
+ return $fbu;
+</pre></div></div>
+<p>Here is the expected result for our sample data:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "id": 3, "alias": "Emory", "name": "EmoryUnk", "user-since": datetime("2012-07-10T10:10:00.000Z"), "friend-ids": {{ 1, 5, 8, 9 }}, "employment": [ { "organization-name": "geomedia", "start-date": date("2010-06-17"), "end-date": date("2010-01-26") } ] }
+ { "id": 9, "alias": "Woodrow", "name": "WoodrowNehling", "user-since": datetime("2005-09-20T10:10:00.000Z"), "friend-ids": {{ 3, 10 }}, "employment": [ { "organization-name": "Zuncan", "start-date": date("2003-04-22"), "end-date": date("2009-12-13") } ] }
+ { "id": 10, "alias": "Bram", "name": "BramHatch", "user-since": datetime("2010-10-16T10:10:00.000Z"), "friend-ids": {{ 1, 5, 9 }}, "employment": [ { "organization-name": "physcane", "start-date": date("2007-06-05"), "end-date": date("2011-11-05") } ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_8_-_Simple_Aggregation"></a>Query 8 - Simple Aggregation</h3>
+<p>Like SQL, the AQL language of AsterixDB provides support for computing aggregates over large amounts of data. As a very simple example, the following AQL query computes the total number of Facebook users:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ count(for $fbu in dataset FacebookUsers return $fbu);
+</pre></div></div>
+<p>In AQL, aggregate functions can be applied to arbitrary subquery results; in this case, the count function is applied to the result of a query that enumerates the Facebook users. The expected result here is:</p>
+
+<div class="source">
+<div class="source">
+<pre> 10
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_9-A_-_Grouping_and_Aggregation"></a>Query 9-A - Grouping and Aggregation</h3>
+<p>Also like SQL, AQL supports grouped aggregation. For every Twitter user, the following group-by/aggregate query counts the number of tweets sent by that user:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $t in dataset TweetMessages
+ group by $uid := $t.user.screen-name with $t
+ return {
+ "user": $uid,
+ "count": count($t)
+ };
+</pre></div></div>
+<p>The <i>for</i> clause incrementally binds $t to tweets, and the <i>group by</i> clause groups the tweets by its issuer’s Twitter screen-name. Unlike SQL, where data is tabular—flat—the data model underlying AQL allows for nesting. Thus, following the <i>group by</i> clause, the <i>return</i> clause in this query sees a sequence of $t groups, with each such group having an associated $uid variable value (i.e., the tweeting user’s screen name). In the context of the return clause, due to “… with $t …”, $uid is bound to the tweeter’s id and $t is bound to the <i>set</i> of tweets issued by that tweeter. The return clause constructs a result record containing the tweeter’s user id and the count of the items in the associated tweet set. The query result will contain one such record per screen name. This query also illustrates another feature of AQL; notice that each user’s screen name is accessed via a path syntax that traverses each tweet’s nested record structure.</p>
+<p>Here is the expected result for this query over the sample data:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "user": "ChangEwing_573", "count": 1 }
+ { "user": "ColineGeyer@63", "count": 3 }
+ { "user": "NathanGiesen@211", "count": 6 }
+ { "user": "NilaMilliron_tw", "count": 1 }
+ { "user": "OliJackson_512", "count": 1 }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_9-B_-_Hash-Based_Grouping_and_Aggregation"></a>Query 9-B - (Hash-Based) Grouping and Aggregation</h3>
+<p>As for joins, AsterixDB has multiple evaluation strategies available for processing grouped aggregate queries. For grouped aggregation, the system knows how to employ both sort-based and hash-based aggregation methods, with sort-based methods being used by default and a hint being available to suggest that a different approach be used in processing a particular AQL query.</p>
+<p>The following query is similar to Query 9-A, but adds a hash-based aggregation hint:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $t in dataset TweetMessages
+ /*+ hash*/
+ group by $uid := $t.user.screen-name with $t
+ return {
+ "user": $uid,
+ "count": count($t)
+ };
+</pre></div></div>
+<p>Here is the expected result:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "user": "OliJackson_512", "count": 1 }
+ { "user": "ColineGeyer@63", "count": 3 }
+ { "user": "NathanGiesen@211", "count": 6 }
+ { "user": "NilaMilliron_tw", "count": 1 }
+ { "user": "ChangEwing_573", "count": 1 }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_10_-_Grouping_and_Limits"></a>Query 10 - Grouping and Limits</h3>
+<p>In some use cases it is not necessary to compute the entire answer to a query. In some cases, just having the first <i>N</i> or top <i>N</i> results is sufficient. This is expressible in AQL using the <i>limit</i> clause combined with the <i>order by</i> clause.</p>
+<p>The following AQL query returns the top 3 Twitter users based on who has issued the most tweets:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $t in dataset TweetMessages
+ group by $uid := $t.user.screen-name with $t
+ let $c := count($t)
+ order by $c desc
+ limit 3
+ return {
+ "user": $uid,
+ "count": $c
+ };
+</pre></div></div>
+<p>The expected result for this query is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "user": "NathanGiesen@211", "count": 6 }
+ { "user": "ColineGeyer@63", "count": 3 }
+ { "user": "NilaMilliron_tw", "count": 1 }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_11_-_Left_Outer_Fuzzy_Join"></a>Query 11 - Left Outer Fuzzy Join</h3>
+<p>As a last example of AQL and its query power, the following query, for each tweet, finds all of the tweets that are similar based on the topics that they refer to:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ set simfunction "jaccard";
+ set simthreshold "0.3";
+
+ for $t in dataset TweetMessages
+ return {
+ "tweet": $t,
+ "similar-tweets": for $t2 in dataset TweetMessages
+ where $t2.referred-topics ~= $t.referred-topics
+ and $t2.tweetid != $t.tweetid
+ return $t2.referred-topics
+ };
+</pre></div></div>
+<p>This query illustrates several things worth knowing in order to write fuzzy queries in AQL. First, as mentioned earlier, AQL offers an operator-based syntax for seeing whether two values are “similar” to one another or not. Second, recall that the referred-topics field of records of datatype TweetMessageType is a bag of strings. This query sets the context for its similarity join by requesting that Jaccard-based similarity semantics (<a class="externalLink" href="http://en.wikipedia.org/wiki/Jaccard_index">http://en.wikipedia.org/wiki/Jaccard_index</a>) be used for the query’s similarity operator and that a similarity index of 0.3 be used as its similarity threshold.</p>
+<p>The expected result for this fuzzy join query is:</p>
+
+<div class="source">
+<div class="source">
+<pre> { "tweet": { "tweetid": "1", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("47.44,80.65"), "send-time": datetime("2008-04-26T10:10:00.000Z"), "referred-topics": {{ "t-mobile", "customization" }}, "message-text": " love t-mobile its customization is good:)" }, "similar-tweets": [ {{ "t-mobile", "shortcut-menu" }} ] }
+ { "tweet": { "tweetid": "10", "user": { "screen-name": "ColineGeyer@63", "lang": "en", "friends_count": 121, "statuses_count": 362, "name": "Coline Geyer", "followers_count": 17159 }, "sender-location": point("29.15,76.53"), "send-time": datetime("2008-01-26T10:10:00.000Z"), "referred-topics": {{ "verizon", "voice-clarity" }}, "message-text": " hate verizon its voice-clarity is OMG:(" }, "similar-tweets": [ {{ "iphone", "voice-clarity" }}, {{ "verizon", "voicemail-service" }}, {{ "verizon", "shortcut-menu" }} ] }
+ { "tweet": { "tweetid": "11", "user": { "screen-name": "NilaMilliron_tw", "lang": "en", "friends_count": 445, "statuses_count": 164, "name": "Nila Milliron", "followers_count": 22649 }, "sender-location": point("37.59,68.42"), "send-time": datetime("2008-03-09T10:10:00.000Z"), "referred-topics": {{ "iphone", "platform" }}, "message-text": " can't stand iphone its platform is terrible" }, "similar-tweets": [ {{ "iphone", "voice-clarity" }}, {{ "samsung", "platform" }} ] }
+ { "tweet": { "tweetid": "12", "user": { "screen-name": "OliJackson_512", "lang": "en", "friends_count": 445, "statuses_count": 164, "name": "Oli Jackson", "followers_count": 22649 }, "sender-location": point("24.82,94.63"), "send-time": datetime("2010-02-13T10:10:00.000Z"), "referred-topics": {{ "samsung", "voice-command" }}, "message-text": " like samsung the voice-command is amazing:)" }, "similar-tweets": [ {{ "samsung", "platform" }}, {{ "sprint", "voice-command" }} ] }
+ { "tweet": { "tweetid": "2", "user": { "screen-name": "ColineGeyer@63", "lang": "en", "friends_count": 121, "statuses_count": 362, "name": "Coline Geyer", "followers_count": 17159 }, "sender-location": point("32.84,67.14"), "send-time": datetime("2010-05-13T10:10:00.000Z"), "referred-topics": {{ "verizon", "shortcut-menu" }}, "message-text": " like verizon its shortcut-menu is awesome:)" }, "similar-tweets": [ {{ "verizon", "voicemail-service" }}, {{ "verizon", "voice-clarity" }}, {{ "t-mobile", "shortcut-menu" }} ] }
+ { "tweet": { "tweetid": "3", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("29.72,75.8"), "send-time": datetime("2006-11-04T10:10:00.000Z"), "referred-topics": {{ "motorola", "speed" }}, "message-text": " like motorola the speed is good:)" }, "similar-tweets": [ {{ "motorola", "speed" }} ] }
+ { "tweet": { "tweetid": "4", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("39.28,70.48"), "send-time": datetime("2011-12-26T10:10:00.000Z"), "referred-topics": {{ "sprint", "voice-command" }}, "message-text": " like sprint the voice-command is mind-blowing:)" }, "similar-tweets": [ {{ "samsung", "voice-command" }} ] }
+ { "tweet": { "tweetid": "5", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("40.09,92.69"), "send-time": datetime("2006-08-04T10:10:00.000Z"), "referred-topics": {{ "motorola", "speed" }}, "message-text": " can't stand motorola its speed is terrible:(" }, "similar-tweets": [ {{ "motorola", "speed" }} ] }
+ { "tweet": { "tweetid": "6", "user": { "screen-name": "ColineGeyer@63", "lang": "en", "friends_count": 121, "statuses_count": 362, "name": "Coline Geyer", "followers_count": 17159 }, "sender-location": point("47.51,83.99"), "send-time": datetime("2010-05-07T10:10:00.000Z"), "referred-topics": {{ "iphone", "voice-clarity" }}, "message-text": " like iphone the voice-clarity is good:)" }, "similar-tweets": [ {{ "verizon", "voice-clarity" }}, {{ "iphone", "platform" }} ] }
+ { "tweet": { "tweetid": "7", "user": { "screen-name": "ChangEwing_573", "lang": "en", "friends_count": 182, "statuses_count": 394, "name": "Chang Ewing", "followers_count": 32136 }, "sender-location": point("36.21,72.6"), "send-time": datetime("2011-08-25T10:10:00.000Z"), "referred-topics": {{ "samsung", "platform" }}, "message-text": " like samsung the platform is good" }, "similar-tweets": [ {{ "iphone", "platform" }}, {{ "samsung", "voice-command" }} ] }
+ { "tweet": { "tweetid": "8", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("46.05,93.34"), "send-time": datetime("2005-10-14T10:10:00.000Z"), "referred-topics": {{ "t-mobile", "shortcut-menu" }}, "message-text": " like t-mobile the shortcut-menu is awesome:)" }, "similar-tweets": [ {{ "t-mobile", "customization" }}, {{ "verizon", "shortcut-menu" }} ] }
+ { "tweet": { "tweetid": "9", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("36.86,74.62"), "send-time": datetime("2012-07-21T10:10:00.000Z"), "referred-topics": {{ "verizon", "voicemail-service" }}, "message-text": " love verizon its voicemail-service is awesome" }, "similar-tweets": [ {{ "verizon", "voice-clarity" }}, {{ "verizon", "shortcut-menu" }} ] }
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Inserting_New_Data"></a>Inserting New Data</h3>
+<p>In addition to loading and querying data, AsterixDB supports incremental additions to datasets via the AQL <i>insert</i> statement.</p>
+<p>The following example adds a new tweet by user “<a class="externalLink" href="mailto:NathanGiesen@211">NathanGiesen@211</a>” to the TweetMessages dataset. (An astute reader may notice that this tweet was issued a half an hour after his last tweet, so his counts have all gone up in the interim, although he appears not to have moved in the last half hour.)</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ insert into dataset TweetMessages
+ (
+ {"tweetid":"13",
+ "user":
+ {"screen-name":"NathanGiesen@211",
+ "lang":"en",
+ "friends_count":39345,
+ "statuses_count":479,
+ "name":"Nathan Giesen",
+ "followers_count":49420
+ },
+ "sender-location":point("47.44,80.65"),
+ "send-time":datetime("2008-04-26T10:10:35"),
+ "referred-topics":{{"tweeting"}},
+ "message-text":"tweety tweet, my fellow tweeters!"
+ }
+ );
+</pre></div></div>
+<p>In general, the data to be inserted may be specified using any valid AQL query expression. The insertion of a single object instance, as in this example, is just a special case where the query expression happens to be a record constructor involving only constants.</p></div>
+<div class="section">
+<h3><a name="Deleting_Existing_Data"></a>Deleting Existing Data</h3>
+<p>In addition to inserting new data, AsterixDB supports deletion from datasets via the AQL <i>delete</i> statement. The statement supports “searched delete” semantics, and its <i>where</i> clause can involve any valid XQuery expression.</p>
+<p>The following example deletes the tweet that we just added from user "<a class="externalLink" href="mailto:NathanGiesen@211"">NathanGiesen@211"</a>. (Easy come, easy go. :-))</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ delete $tm from dataset TweetMessages where $tm.tweetid = "13";
+</pre></div></div>
+<p>It should be noted that one form of data change not yet supported by AsterixDB is in-place data modification (<i>update</i>). Currently, only insert and delete operations are supported; update is not. To achieve the effect of an update, two statements are currently needed—one to delete the old record from the dataset where it resides, and another to insert the new replacement record (with the same primary key but with different field values for some of the associated data content).</p></div>
+<div class="section">
+<h3><a name="Transaction_Support"></a>Transaction Support</h3>
+<p>AsterixDB supports record-level ACID transactions that begin and terminate implicitly for each record inserted, deleted, or searched while a given AQL statement is being executed. This is quite similar to the level of transaction support found in today’s NoSQL stores. AsterixDB does not support multi-statement transactions, and in fact an AQL statement that involves multiple records can itself involve multiple independent record-level transactions. An example consequence of this is that, when an AQL statement attempts to insert 1000 records, it is possible that the first 800 records could end up being committed while the remaining 200 records fail to be inserted. This situation could happen, for example, if a duplicate key exception occurs as the 801st insertion is attempted. If this happens, AsterixDB will report the error (e.g., a duplicate key exception) as the result of the offending AQL insert statement, and the application logic above will need to take the appropriate action(s) needed to assess the resulting state and to clean up and/or continue as appropriate.</p></div></div>
+<div class="section">
+<h2><a name="Further_Help"></a>Further Help</h2>
+<p>That’s it You are now armed and dangerous with respect to semistructured data management using AsterixDB.</p>
+<p>AsterixDB is a powerful new BDMS—Big Data Management System—that we hope may usher in a new era of much more declarative Big Data management. AsterixDB is powerful, so use it wisely, and remember: “With great power comes great responsibility…” :-)</p>
+<p>Please e-mail the AsterixDB user group (users (at) asterixdb.incubator.apache.org) if you run into any problems or simply have further questions about the AsterixDB system, its features, or their proper use.</p></div>
+ </div>
+ </div>
+ </div>
+
+ <hr/>
+
+ <footer>
+ <div class="container-fluid">
+ <div class="row span12">Copyright © 2015
+ <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+ All Rights Reserved.
+
+ </div>
+
+ <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+ feather logo, and the Apache AsterixDB project logo are either
+ registered trademarks or trademarks of The Apache Software
+ Foundation in the United States and other countries.
+ All other marks mentioned may be trademarks or registered
+ trademarks of their respective owners.</div>
+
+
+ </div>
+ </footer>
+ </body>
+</html>
diff --git a/docs/0.8.7-incubating/aql/similarity.html b/docs/0.8.7-incubating/aql/similarity.html
new file mode 100644
index 0000000..ce239aa
--- /dev/null
+++ b/docs/0.8.7-incubating/aql/similarity.html
@@ -0,0 +1,431 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2015-11-24
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+ <head>
+ <meta charset="UTF-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+ <meta name="Date-Revision-yyyymmdd" content="20151124" />
+ <meta http-equiv="Content-Language" content="en" />
+ <title>AsterixDB – AsterixDB Support of Similarity Queries</title>
+ <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+ <link rel="stylesheet" href="../css/site.css" />
+ <link rel="stylesheet" href="../css/print.css" media="print" />
+
+
+ <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+
+
+<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');</script>
+
+ </head>
+ <body class="topBarDisabled">
+
+
+
+
+ <div class="container-fluid">
+ <div id="banner">
+ <div class="pull-left">
+ <a href="http://asterixdb.apache.org/" id="bannerLeft">
+ <img src="../images/asterixlogo.png" alt="AsterixDB"/>
+ </a>
+ </div>
+ <div class="pull-right"> </div>
+ <div class="clear"><hr/></div>
+ </div>
+
+ <div id="breadcrumbs">
+ <ul class="breadcrumb">
+
+
+ <li id="publishDate">Last Published: 2015-11-24</li>
+
+
+
+ <li id="projectVersion" class="pull-right">Version: 0.8.7-incubating</li>
+
+ <li class="divider pull-right">|</li>
+
+ <li class="pull-right"> <a href="../index.html" title="Documentation Home">
+ Documentation Home</a>
+ </li>
+
+ </ul>
+ </div>
+
+
+ <div class="row-fluid">
+ <div id="leftColumn" class="span3">
+ <div class="well sidebar-nav">
+
+
+ <ul class="nav nav-list">
+ <li class="nav-header">Documentation</li>
+
+ <li>
+
+ <a href="../install.html" title="Installing and Managing AsterixDB using Managix">
+ <i class="none"></i>
+ Installing and Managing AsterixDB using Managix</a>
+ </li>
+
+ <li>
+
+ <a href="../yarn.html" title="Deploying AsterixDB using YARN">
+ <i class="none"></i>
+ Deploying AsterixDB using YARN</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer.html" title="AsterixDB 101: An ADM and AQL Primer">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/primer-sql-like.html" title="AsterixDB 101: An ADM and AQL Primer (For SQL Fans)">
+ <i class="none"></i>
+ AsterixDB 101: An ADM and AQL Primer (For SQL Fans)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/js-sdk.html" title="AsterixDB Javascript SDK">
+ <i class="none"></i>
+ AsterixDB Javascript SDK</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/datamodel.html" title="Asterix Data Model (ADM)">
+ <i class="none"></i>
+ Asterix Data Model (ADM)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/manual.html" title="Asterix Query Language (AQL)">
+ <i class="none"></i>
+ Asterix Query Language (AQL)</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/functions.html" title="AQL Functions">
+ <i class="none"></i>
+ AQL Functions</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/allens.html" title="AQL Allen's Relations Functions">
+ <i class="none"></i>
+ AQL Allen's Relations Functions</a>
+ </li>
+
+ <li class="active">
+
+ <a href="#"><i class="none"></i>AQL Support of Similarity Queries</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/externaldata.html" title="Accessing External Data">
+ <i class="none"></i>
+ Accessing External Data</a>
+ </li>
+
+ <li>
+
+ <a href="../feeds/tutorial.html" title="Support for Data Ingestion in AsterixDB">
+ <i class="none"></i>
+ Support for Data Ingestion in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../udf.html" title="Support for User Defined Functions in AsterixDB">
+ <i class="none"></i>
+ Support for User Defined Functions in AsterixDB</a>
+ </li>
+
+ <li>
+
+ <a href="../aql/filters.html" title="Filter-Based LSM Index Acceleration">
+ <i class="none"></i>
+ Filter-Based LSM Index Acceleration</a>
+ </li>
+
+ <li>
+
+ <a href="../api.html" title="HTTP API to AsterixDB">
+ <i class="none"></i>
+ HTTP API to AsterixDB</a>
+ </li>
+ </ul>
+
+
+
+ <hr class="divider" />
+
+ <div id="poweredBy">
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <div class="clear"></div>
+ <a href="https://code.google.com/p/hyracks/" title="Hyracks" class="builtBy">
+ <img class="builtBy" alt="Hyracks" src="../images/hyrax_ts.png" />
+ </a>
+ </div>
+ </div>
+ </div>
+
+
+ <div id="bodyColumn" class="span9" >
+
+ <!-- ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements. See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership. The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License. You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied. See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ ! --><h1>AsterixDB Support of Similarity Queries</h1>
+<div class="section">
+<h2><a name="Table_of_Contents"></a><a name="toc" id="toc">Table of Contents</a></h2>
+
+<ul>
+
+<li><a href="#Motivation">Motivation</a></li>
+
+<li><a href="#DataTypesAndSimilarityFunctions">Data Types and Similarity Functions</a></li>
+
+<li><a href="#SimilaritySelectionQueries">Similarity Selection Queries</a></li>
+
+<li><a href="#SimilarityJoinQueries">Similarity Join Queries</a></li>
+
+<li><a href="#UsingIndexesToSupportSimilarityQueries">Using Indexes to Support Similarity Queries</a></li>
+</ul></div>
+<div class="section">
+<h2><a name="Motivation_Back_to_TOC"></a><a name="Motivation" id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>Similarity queries are widely used in applications where users need to find records that satisfy a similarity predicate, while exact matching is not sufficient. These queries are especially important for social and Web applications, where errors, abbreviations, and inconsistencies are common. As an example, we may want to find all the movies starring Schwarzenegger, while we don’t know the exact spelling of his last name (despite his popularity in both the movie industry and politics :-)). As another example, we want to find all the Facebook users who have similar friends. To meet this type of needs, AsterixDB supports similarity queries using efficient indexes and algorithms.</p></div>
+<div class="section">
+<h2><a name="Data_Types_and_Similarity_Functions_Back_to_TOC"></a><a name="DataTypesAndSimilarityFunctions" id="DataTypesAndSimilarityFunctions">Data Types and Similarity Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>AsterixDB supports <a class="externalLink" href="http://en.wikipedia.org/wiki/Levenshtein_distance">edit distance</a> (on strings) and <a class="externalLink" href="http://en.wikipedia.org/wiki/Jaccard_index">Jaccard</a> (on sets). For instance, in our <a href="primer.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB">TinySocial</a> example, the <tt>friend-ids</tt> of a Facebook user forms a set of friends, and we can define a similarity between the sets of friends of two users. We can also convert a string to a set of grams of a length “n” (called “n-grams”) and define the Jaccard similarity between the two gram sets of the two strings. Formally, the “n-grams” of a string are its substrings of length “n”. For instance, the 3-grams of the string <tt>schwarzenegger</tt> are <tt>sch</tt>, <tt>chw</tt>, <tt>hwa</tt>, …, <tt>ger</tt>.</p>
+<p>AsterixDB provides <a href="functions.html#Tokenizing_Functions">tokenization functions</a> to convert strings to sets, and the <a href="functions.html#Similarity_Functions">similarity functions</a>.</p></div>
+<div class="section">
+<h2><a name="Similarity_Selection_Queries_Back_to_TOC"></a><a name="SimilaritySelectionQueries" id="SimilaritySelectionQueries">Similarity Selection Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>The following query asks for all the Facebook users whose name is similar to <tt>Suzanna Tilson</tt>, i.e., their edit distance is at most 2.</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset('FacebookUsers')
+ let $ed := edit-distance($user.name, "Suzanna Tilson")
+ where $ed <= 2
+ return $user
+</pre></div></div>
+<p>The following query asks for all the Facebook users whose set of friend ids is similar to <tt>[1,5,9,10]</tt>, i.e., their Jaccard similarity is at least 0.6.</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset('FacebookUsers')
+ let $sim := similarity-jaccard($user.friend-ids, [1,5,9,10])
+ where $sim >= 0.6f
+ return $user
+</pre></div></div>
+<p>AsterixDB allows a user to use a similarity operator <tt>~=</tt> to express a condition by defining the similarity function and threshold using “set” statements earlier. For instance, the above query can be equivalently written as:</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ set simfunction "jaccard";
+ set simthreshold "0.6f";
+
+ for $user in dataset('FacebookUsers')
+ where $user.friend-ids ~= [1,5,9,10]
+ return $user
+</pre></div></div>
+<p>In this query, we first declare Jaccard as the similarity function using <tt>simfunction</tt> and then specify the threshold <tt>0.6f</tt> using <tt>simthreshold</tt>.</p></div>
+<div class="section">
+<h2><a name="Similarity_Join_Queries_Back_to_TOC"></a><a name="SimilarityJoinQueries" id="SimilarityJoinQueries">Similarity Join Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>AsterixDB supports fuzzy joins between two sets. The following <a href="primer.html#Query_5_-_Fuzzy_Join">query</a> finds, for each Facebook user, all Twitter users with names similar to their name based on the edit distance.</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ set simfunction "edit-distance";
+ set simthreshold "3";
+
+ for $fbu in dataset FacebookUsers
+ return {
+ "id": $fbu.id,
+ "name": $fbu.name,
+ "similar-users": for $t in dataset TweetMessages
+ let $tu := $t.user
+ where $tu.name ~= $fbu.name
+ return {
+ "twitter-screenname": $tu.screen-name,
+ "twitter-name": $tu.name
+ }
+ };
+</pre></div></div></div>
+<div class="section">
+<h2><a name="Using_Indexes_to_Support_Similarity_Queries_Back_to_TOC"></a><a name="UsingIndexesToSupportSimilarityQueries" id="UsingIndexesToSupportSimilarityQueries">Using Indexes to Support Similarity Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font></h2>
+<p>AsterixDB uses two types of indexes to support similarity queries, namely “ngram index” and “keyword index”.</p>
+<div class="section">
+<h3><a name="NGram_Index"></a>NGram Index</h3>
+<p>An “ngram index” is constructed on a set of strings. We generate n-grams for each string, and build an inverted list for each n-gram that includes the ids of the strings with this gram. A similarity query can be answered efficiently by accessing the inverted lists of the grams in the query and counting the number of occurrences of the string ids on these inverted lists. The similar idea can be used to answer queries with Jaccard similarity. A detailed description of these techniques is available at this <a class="externalLink" href="http://www.ics.uci.edu/~chenli/pub/icde2009-memreducer.pdf">paper</a>.</p>
+<p>For instance, the following DDL statements create an ngram index on the <tt>FacebookUsers.name</tt> attribute using an inverted index of 3-grams.</p>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ create index fbUserIdx on FacebookUsers(name) type ngram(3);
+</pre></div></div>
+<p>The number “3” in “ngram(3)” is the length “n” in the grams. This index can be used to optimize similarity queries on this attribute using <a href="functions.html#edit-distance">edit-distance</a>, <a href="functions.html#edit-distance-check">edit-distance-check</a>, <a href="functions.html#similarity-jaccard">similarity-jaccard</a>, or <a href="functions.html#similarity-jaccard-check">similarity-jaccard-check</a> queries on this attribute where the similarity is defined on sets of 3-grams. This index can also be used to optimize queries with the “<a href="functions.html#contains">contains()</a>” predicate (i.e., substring matching) since it can be also be solved by counting on the inverted lists of the grams in the query string.</p>
+<div class="section">
+<h4><a name="NGram_Index_usage_case_-_edit-distance"></a>NGram Index usage case - <a href="functions.html#edit-distance">edit-distance</a></h4>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset('FacebookUsers')
+ let $ed := edit-distance($user.name, "Suzanna Tilson")
+ where $ed <= 2
+ return $user
+</pre></div></div></div>
+<div class="section">
+<h4><a name="NGram_Index_usage_case_-_edit-distance-check"></a>NGram Index usage case - <a href="functions.html#edit-distance-check">edit-distance-check</a></h4>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset('FacebookUsers')
+ let $ed := edit-distance-check($user.name, "Suzanna Tilson", 2)
+ where $ed[0]
+ return $ed[1]
+</pre></div></div></div>
+<div class="section">
+<h4><a name="NGram_Index_usage_case_-_similarity-jaccard"></a>NGram Index usage case - <a href="functions.html#similarity-jaccard">similarity-jaccard</a></h4>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset('FacebookUsers')
+ let $sim := similarity-jaccard($user.friend-ids, [1,5,9,10])
+ where $sim >= 0.6f
+ return $user
+</pre></div></div></div>
+<div class="section">
+<h4><a name="NGram_Index_usage_case_-_similarity-jaccard-check"></a>NGram Index usage case - <a href="functions.html#similarity-jaccard-check">similarity-jaccard-check</a></h4>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $user in dataset('FacebookUsers')
+ let $sim := similarity-jaccard-check($user.friend-ids, [1,5,9,10], 0.6f)
+ where $sim[0]
+ return $user
+</pre></div></div></div>
+<div class="section">
+<h4><a name="NGram_Index_usage_case_-_contains"></a>NGram Index usage case - <a href="functions.html#contains">contains()</a></h4>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ for $i in dataset('FacebookMessages')
+ where contains($i.message, "phone")
+ return {"mid": $i.message-id, "message": $i.message}
+</pre></div></div></div></div>
+<div class="section">
+<h3><a name="Keyword_Index"></a>Keyword Index</h3>
+<p>A “keyword index” is constructed on a set of strings or sets (e.g., OrderedList, UnorderedList). Instead of generating grams as in an ngram index, we generate tokens (e.g., words) and for each token, construct an inverted list that includes the ids of the records with this token. The following two examples show how to create keyword index on two different types:</p>
+<div class="section">
+<h4><a name="Keyword_Index_on_String_Type"></a>Keyword Index on String Type</h4>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ drop index FacebookMessages.fbMessageIdx if exists;
+ create index fbMessageIdx on FacebookMessages(message) type keyword;
+
+ for $o in dataset('FacebookMessages')
+ let $jacc := similarity-jaccard-check(word-tokens($o.message), word-tokens("love like verizon"), 0.2f)
+ where $jacc[0]
+ return $o
+</pre></div></div></div>
+<div class="section">
+<h4><a name="Keyword_Index_on_UnorderedList_Type"></a>Keyword Index on UnorderedList Type</h4>
+
+<div class="source">
+<div class="source">
+<pre> use dataverse TinySocial;
+
+ create index fbUserIdx_fids on FacebookUsers(friend-ids) type keyword;
+
+ for $c in dataset('FacebookUsers')
+ let $jacc := similarity-jaccard-check($c.friend-ids, {{3,10}}, 0.5f)
+ where $jacc[0]
+ return $c
+</pre></div></div>
+<p>As shown above, keyword index can be used to optimize queries with token-based similarity predicates, including <a href="functions.html#similarity-jaccard">similarity-jaccard</a> and <a href="functions.html#similarity-jaccard-check">similarity-jaccard-check</a>.</p></div></div></div>
+ </div>
+ </div>
+ </div>
+
+ <hr/>
+
+ <footer>
+ <div class="container-fluid">
+ <div class="row span12">Copyright © 2015
+ <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+ All Rights Reserved.
+
+ </div>
+
+ <?xml version="1.0" encoding="UTF-8"?>
+<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
+ feather logo, and the Apache AsterixDB project logo are either
+ registered trademarks or trademarks of The Apache Software
+ Foundation in the United States and other countries.
+ All other marks mentioned may be trademarks or registered
+ trademarks of their respective owners.</div>
+
+
+ </div>
+ </footer>
+ </body>
+</html>