Merge tag 'asterix-0.8.3' into documentation
Fix version numbers in documentation
diff --git a/asterix-doc/src/site/markdown/api.md b/asterix-doc/src/site/markdown/api.md
index b48f24a..4d6edd2 100644
--- a/asterix-doc/src/site/markdown/api.md
+++ b/asterix-doc/src/site/markdown/api.md
@@ -1,6 +1,16 @@
# REST API to AsterixDB #
-## DDL API ##
+## <a id="toc">Table of Contents</a>
+
+* [DDL API](#DdlApi)
+* [Update API](#UpdateApi)
+* [Query API](#QueryApi)
+* [Asynchronous Result API](#AsynchronousResultApi)
+* [Query Status API](#QueryStatusApi)
+* [Error Codes](#ErrorCodes)
+
+
+## <a id="DdlApi">DDL API</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
*End point for the data definition statements*
@@ -48,7 +58,7 @@
*HTTP OK 200*
`<NO PAYLOAD>`
-## Update API ##
+## <a id="UpdateApi">Update API</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
*End point for update statements (INSERT, DELETE and LOAD)*
@@ -89,7 +99,7 @@
*HTTP OK 200*
`<NO PAYLOAD>`
-## Query API ##
+## <a id="QueryApi">Query API</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
*End point for query statements*
@@ -169,7 +179,7 @@
}
-## Asynchronous Result API ##
+## <a id="AsynchronousResultApi">Asynchronous Result API</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
*End point to fetch the results of an asynchronous query*
@@ -231,7 +241,7 @@
}
-## Query Status API ##
+## <a id="QueryStatusApi">Query Status API</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
*End point to check the status of the query asynchronous*
@@ -261,7 +271,7 @@
-## Error Codes ##
+## <a id="ErrorCodes">Error Codes</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
Table of error codes and their types:
diff --git a/asterix-doc/src/site/markdown/aql/allens.md b/asterix-doc/src/site/markdown/aql/allens.md
new file mode 100644
index 0000000..a07e287
--- /dev/null
+++ b/asterix-doc/src/site/markdown/aql/allens.md
@@ -0,0 +1,229 @@
+# AsterixDB Temporal Functions: Allen's Relations #
+
+## <a id="toc">Table of Contents</a> ##
+
+* [About Allen's Relations](#AboutAllensRelations)
+* [Allen's Relations Functions](#AllensRelatonsFunctions)
+
+
+## <a id="AboutAllensRelations">About Allen's Relations</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
+
+AsterixDB supports Allen's relations over interval types. Allen's relations are also called Allen's interval algebra. There are totally 13 base relations described by this algebra, and all of them are supported in AsterixDB (note that `interval-equals` is supported by the `=` comparison symbol so there is no extra function for it).
+
+A detailed description of Allen's relations can be found from its [wikipedia entry](http://en.wikipedia.org/wiki/Allen's_interval_algebra).
+
+## <a id="AllensRelatonsFunctions">Allen's Relations Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
+
+### interval-before, interval-after ###
+
+ * Syntax:
+
+ interval-before(interval1, interval2)
+ interval-after(interval1, interval2)
+
+ * These two functions check whether an interval happens before/after another interval.
+ * Arguments:
+ * `interval1`, `interval2`: two intervals to be compared
+ * Return Value:
+
+ A `boolean` value. Specifically, `interval-before(interval1, interval2)` is true if and only if `interval1.end < interval2.start`, and `interval-after(interval1, interval2)` is true if and only if `interval1.start > interval2.end`. If any of the two inputs is `null`, `null` is returned.
+
+ * Examples:
+
+ let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+ let $itv2 := interval-from-date("2005-05-01", "2012-09-09")
+ return {"interval-before": interval-before($itv1, $itv2), "interval-after": interval-after($itv2, $itv1)}
+
+ * The expected result is:
+
+ { "interval-before": true, "interval-after": true }
+
+### interval-meets, interval-met-by ###
+
+ * Syntax:
+
+ interval-meets(interval1, interval2)
+ interval-met-by(interval1, interval2)
+
+ * These two functions check whether an interval meets with another interval.
+ * Arguments:
+ * `interval1`, `interval2`: two intervals to be compared
+ * Return Value:
+
+ A `boolean` value. Specifically, `interval-meets(interval1, interval2)` is true if and only if `interval1.end = interval2.start`, and `interval-met-by(interval1, interval2)` is true if and only if `interval1.start = interval2.end`. If any of the two inputs is `null`, `null` is returned.
+
+ * Examples:
+
+ let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+ let $itv2 := interval-from-date("2005-01-01", "2012-09-09")
+ let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+ let $itv4 := interval-from-date("2004-09-10", "2006-08-01")
+ return {"meets": interval-meets($itv1, $itv2), "metby": interval-met-by($itv3, $itv4)}
+
+ * The expected result is:
+
+ { "meets": true, "metby": true }
+
+
+### interval-overlaps, interval-overlapped-by, overlap ###
+
+ * Syntax:
+
+ interval-overlaps(interval1, interval2)
+ interval-overlapped-by(interval1, interval2)
+ overlap(interval1, interval2)
+
+ * These functions check whether two intervals overlap with each other.
+ * Arguments:
+ * `interval1`, `interval2`: two intervals to be compared
+ * Return Value:
+
+ A `boolean` value. Specifically, `interval-overlaps(interval1, interval2)` is true if and only if
+
+ interval1.start < interval2.start
+ AND interval2.end > interval1.end
+ AND interval1.end > interval2.start
+
+ `interval-overlapped-by(interval1, interval2)` is true if and only if
+
+ interval2.start < interval1.start
+ AND interval1.end > interval2.end
+ AND interval2.end > interval1.start
+
+ `overlap(interval1, interval2)` is true if
+
+ (interval2.start >= interval1.start
+ AND interval2.start < interval1.end)
+ OR
+ (interval2.end > interval1.start
+ AND interval2.end <= interval.end)
+
+ For all these functions, if any of the two inputs is `null`, `null` is returned.
+
+ Note that `interval-overlaps` and `interval-overlapped-by` are following the Allen's relations on the definition of overlap. `overlap` is a syntactic sugar for the case that the intersect of two intervals is not empty.
+
+ * Examples:
+
+ let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+ let $itv2 := interval-from-date("2004-05-01", "2012-09-09")
+ let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+ let $itv4 := interval-from-date("2004-09-10", "2006-12-31")
+ return {"overlaps": interval-overlaps($itv1, $itv2),
+ "overlapped-by": interval-overlapped-by($itv3, $itv4),
+ "overlapping1": overlap($itv1, $itv2),
+ "overlapping2": overlap($itv3, $itv4)}
+
+ * The expected result is:
+
+ { "overlaps": true, "overlapped-by": true, "overlapping1": true, "overlapping2": true }
+
+
+### interval-starts, interval-started-by ###
+
+ * Syntax:
+
+ interval-starts(interval1, interval2)
+ interval-started-by(interval1, interval2)
+
+ * These two functions check whether one interval starts with the other interval.
+ * Arguments:
+ * `interval1`, `interval2`: two intervals to be compared
+ * Return Value:
+
+ A `boolean` value. Specifically, `interval-starts(interval1, interval2)` returns true if and only if
+
+ interval1.start = interval2.start
+ AND interval1.end <= interval2.end
+
+ `interval-started-by(interval1, interval2)` returns true if and only if
+
+ interval1.start = interval2.start
+ AND interval2.end <= interval1.end
+
+ For both functions, if any of the two inputs is `null`, `null` is returned.
+
+ * Examples:
+
+ let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+ let $itv2 := interval-from-date("2000-01-01", "2012-09-09")
+ let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+ let $itv4 := interval-from-date("2006-08-01", "2006-08-01")
+ return {"interval-starts": interval-starts($itv1, $itv2), "interval-started-by": interval-started-by($itv3, $itv4)}
+
+ * The expected result is:
+
+ { "interval-starts": true, "interval-started-by": true }
+
+
+### interval-covers, interval-covered-by ###
+
+ * Syntax:
+
+ interval-covers(interval1, interval2)
+ interval-covered-by(interval1, interval2)
+
+ * These two functions check whether one interval covers the other interval.
+ * Arguments:
+ * `interval1`, `interval2`: two intervals to be compared
+ * Return Value:
+
+ A `boolean` value. Specifically, `interval-covers(interval1, interval2)` is true if and only if
+
+ interval1.start <= interval2.start
+ AND interval2.end >= interval1.end
+
+ `interval-covered-by(interval1, interval2)` is true if and only if
+
+ interval2.start <= interval1.start
+ AND interval1.end >= interval2.end
+
+ For both functions, if any of the two inputs is `null`, `null` is returned.
+
+ * Examples:
+
+ let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+ let $itv2 := interval-from-date("2000-03-01", "2004-09-09")
+ let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+ let $itv4 := interval-from-date("2004-09-10", "2012-08-01")
+ return {"interval-covers": interval-covers($itv1, $itv2), "interval-covered-by": interval-covered-by($itv3, $itv4)}
+
+ * The expected result is:
+
+ { "interval-covers": true, "interval-covered-by": true }
+
+
+### interval-ends, interval-ended-by ###
+
+* Syntax:
+
+ interval-ends(interval1, interval2)
+ interval-ended-by(interval1, interval2)
+
+ * These two functions check whether one interval ends with the other interval.
+ * Arguments:
+ * `interval1`, `interval2`: two intervals to be compared
+ * Return Value:
+
+ A `boolean` value. Specifically, `interval-ends(interval1, interval2)` returns true if and only if
+
+ interval1.end = interval2.end
+ AND interval1.start >= interval2.start
+
+ `interval-ended-by(interval1, interval2)` returns true if and only if
+
+ interval2.end = interval1.end
+ AND interval2.start >= interval1.start
+
+ For both functions, if any of the two inputs is `null`, `null` is returned.
+
+* Examples:
+
+ let $itv1 := interval-from-date("2000-01-01", "2005-01-01")
+ let $itv2 := interval-from-date("1998-01-01", "2005-01-01")
+ let $itv3 := interval-from-date("2006-08-01", "2007-03-01")
+ let $itv4 := interval-from-date("2006-09-10", "2007-03-01")
+ return {"interval-ends": interval-ends($itv1, $itv2), "interval-ended-by": interval-ended-by($itv3, $itv4) }
+
+* The expected result is:
+
+ { "interval-ends": true, "interval-ended-by": true }
diff --git a/asterix-doc/src/site/markdown/aql/datamodel.md b/asterix-doc/src/site/markdown/aql/datamodel.md
index 3e54d61..71a5cbb 100644
--- a/asterix-doc/src/site/markdown/aql/datamodel.md
+++ b/asterix-doc/src/site/markdown/aql/datamodel.md
@@ -1,11 +1,33 @@
# Asterix Data Model (ADM) #
+## <a id="toc">Table of Contents</a> ##
+
+* [Primitive Types](#PrimitiveTypes)
+ * [Boolean](#PrimitiveTypesBoolean)
+ * [Int8 / Int16 / Int32 / Int64](#PrimitiveTypesInt)
+ * [Float](#PrimitiveTypesFloat)
+ * [Double](#PrimitiveTypesDouble)
+ * [String](#PrimitiveTypesString)
+ * [Point](#PrimitiveTypesPoint)
+ * [Line](#PrimitiveTypesLine)
+ * [Rectangle](#PrimitiveTypesRectangle)
+ * [Circle](#PrimitiveTypesCircle)
+ * [Polygon](#PrimitiveTypesPolygon)
+ * [Date](#PrimitiveTypesDate)
+ * [Time](#PrimitiveTypesTime)
+ * [Datetime](#PrimitiveTypesDateTime)
+ * [Duration/Year-month-duration/Day-time-duration](#PrimitiveTypesDuration)
+ * [Interval](#PrimitiveTypesInterval)
+* [Derived Types](#DerivedTypes)
+ * [Record](#DerivedTypesRecord)
+ * [OrderedList](#DerivedTypesOrderedList)
+ * [UnorderedList](#DerivedTypesUnorderedList)
An instance of Asterix data model (ADM) can be a _*primitive type*_ (`int32`, `int64`, `string`, `float`, `double`, `date`, `time`, `datetime`, etc. or `null`) or a _*derived type*_.
-## Primitive Types ##
+## <a id="PrimitiveTypes">Primitive Types</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
-### Boolean ###
+### <a id="PrimitiveTypesBoolean">Boolean</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`boolean` data type can have one of the two values: _*true*_ or _*false*_.
* Example:
@@ -21,7 +43,7 @@
-### Int8 / Int16 / Int32 / Int64 ###
+### <a id="PrimitiveTypesInt">Int8 / Int16 / Int32 / Int64</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
Integer types using 8, 16, 32, or 64 bits. The ranges of these types are:
- `int8`: -127 to 127
@@ -43,7 +65,7 @@
{ "int8": 125i8, "int16": 32765i16, "int32": 294967295, "int64": 1700000000000000000i64 }
-### Float ###
+### <a id="PrimitiveTypesFloat">Float</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`float` represents approximate numeric data values using 4 bytes. The range of a float value can be from 2^(-149) to (2-2^(-23)·2^(127) for both positive and negative. Beyond these ranges will get `INF` or `-INF`.
* Example:
@@ -60,7 +82,7 @@
{ "v1": NaNf, "v2": Infinityf, "v3": -Infinityf, "v4": -2013.5f }
-### Double ###
+### <a id="PrimitiveTypesDouble">Double</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`double` represents approximate numeric data values using 8 bytes. The range of a double value can be from (2^(-1022)) to (2-2^(-52))·2^(1023) for both positive and negative. Beyond these ranges will get `INF` or `-INF`.
* Example:
@@ -77,7 +99,7 @@
{ "v1": NaNd, "v2": Infinityd, "v3": -Infinityd, "v4": -2013.5938237483274d }
-### String ###
+### <a id="PrimitiveTypesString">String</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`string` represents a sequence of characters.
* Example:
@@ -92,7 +114,7 @@
{ "v1": "This is a string.", "v2": "\"This is a quoted string\"" }
-### Point ###
+### <a id="PrimitiveTypesPoint">Point</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`point` is the fundamental two-dimensional building block for spatial types. It consists of two `double` coordinates x and y.
* Example:
@@ -107,7 +129,7 @@
{ "v1": point("80.1,-1000000.0"), "v2": point("5.1E-10,-1000000.0") }
-### Line ###
+### <a id="PrimitiveTypesLine">Line</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`line` consists of two points that represent the start and the end points of a line segment.
* Example:
@@ -122,7 +144,7 @@
{ "v1": line("10.1234,1.11 0.102,-11.22"), "v2": line("0.1234,-1.0E-10 0.105,-1.02") }
-### Rectangle ###
+### <a id="PrimitiveTypesRectangle">Rectangle</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`rectangle` consists of two points that represent the _*bottom left*_ and _*upper right*_ corners of a rectangle.
* Example:
@@ -137,7 +159,7 @@
{ "v1": rectangle("5.1,11.8 87.6,15.6548"), "v2": rectangle("0.1234,-1.0E-10 5.5487,0.48765") }
-### Circle ###
+### <a id="PrimitiveTypesCircle">Circle</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`circle` consists of one point that represents the center of the circle and a radius of type `double`.
* Example:
@@ -152,7 +174,7 @@
{ "v1": circle("10.1234,1.11 0.102"), "v2": circle("0.1234,-1.0E-10 0.105") }
-### Polygon ###
+### <a id="PrimitiveTypesPolygon">Polygon</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`polygon` consists of _*n*_ points that represent the vertices of a _*simple closed*_ polygon.
* Example:
@@ -167,7 +189,7 @@
{ "v1": polygon("-1.2,130.0 -214000.0,2.15 -350.0,3.6 -0.0046,4.81"), "v2": polygon("-1.0,1050.0 -2.15E50,2.5 -1.0,3300.0 -250000.0,20.15 350.0,3.6 -0.0046,4.75 -2.0,100.0 -200000.0,20.1 30.5,3.25 -0.00433,4.75") }
-### Date ###
+### <a id="PrimitiveTypesDate">Date</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`date` represents a time point along the Gregorian calendar system specified by the year, month and day. ASTERIX supports the date from `-9999-01-01` to `9999-12-31`.
A date value can be represented in two formats, extended format and basic format.
@@ -187,7 +209,7 @@
{ "v1": date("2013-01-01"), "v2": date("-1970-01-01") }
-### Time ###
+### <a id="PrimitiveTypesTime">Time</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`time` type describes the time within the range of a day. It is represented by three fields: hour, minute and second. Millisecond field is optional as the fraction of the second field. Its extended format is as `hh:mm:ss[.mmm]` and the basic format is `hhmmss[mmm]`. The value domain is from `00:00:00.000` to `23:59:59.999`.
Timezone field is optional for a time value. Timezone is represented as `[+|-]hh:mm` for extended format or `[+|-]hhmm` for basic format. Note that the sign designators cannot be omitted. `Z` can also be used to represent the UTC local time. If no timezone information is given, it is UTC by default.
@@ -204,7 +226,7 @@
{ "v1": time("12:12:12.039Z"), "v2": time("08:00:00.000Z") }
-### Datetime ###
+### <a id="PrimitiveTypesDateTime">Datetime</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
A `datetime` value is a combination of an `date` and `time`, representing a fixed time point along the Gregorian calendar system. The value is among `-9999-01-01 00:00:00.000` and `9999-12-31 23:59:59.999`.
A `datetime` value is represented as a combination of the representation of its `date` part and `time` part, separated by a separator `T`. Either extended or basic format can be used, and the two parts should be the same format.
@@ -223,13 +245,15 @@
{ "v1": datetime("2013-01-01T12:12:12.039Z"), "v2": datetime("-1970-01-01T08:00:00.000Z") }
-### Duration ###
+### <a id="PrimitiveTypesDuration">Duration/Year-month-duration/Day-time-duration</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`duration` represents a duration of time. A duration value is specified by integers on at least one of the following fields: year, month, day, hour, minute, second, and millisecond.
A duration value is in the format of `[-]PnYnMnDTnHnMn.mmmS`. The millisecond part (as the fraction of the second field) is optional, and when no millisecond field is used, the decimal point should also be absent.
Negative durations are also supported for the arithmetic operations between time instance types (`date`, `time` and `datetime`), and is used to roll the time back for the given duration. For example `date("2012-01-01") + duration("-P3D")` will return `date("2011-12-29")`.
+There are also two sub-duration types, namely `year-month-duration` and `day-time-duration`. `year-month-duration` represents only the years and months of a duration, while `day-time-duration` represents only the day to millisecond fields. Different from the `duration` type, both these two subtypes are totally ordered, so they can be used for comparison and index construction.
+
Note that a canonical representation of the duration is always returned, regardless whether the duration is in the canonical representation or not from the user's input. More information about canonical representation can be found from [XPath dayTimeDuration Canonical Representation](http://www.w3.org/TR/xpath-functions/#canonical-dayTimeDuration) and [yearMonthDuration Canonical Representation](http://www.w3.org/TR/xpath-functions/#canonical-yearMonthDuration).
* Example:
@@ -244,7 +268,7 @@
{ "v1": duration("P101YT12M"), "v2": duration("-PT20.943S") }
-### Interval ###
+### <a id="PrimitiveTypesInterval">Interval</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
`interval` represents inclusive-exclusive ranges of time. It is defined by two time point values with the same temporal type(`date`, `time` or `datetime`).
* Example:
@@ -260,9 +284,9 @@
{ "v1": interval-date("2013-01-01, 2013-05-05"), "v2": interval-time("00:01:01.000Z, 13:39:01.049Z"), "v3": interval-datetime("2013-01-01T00:01:01.000Z, 2013-05-05T13:39:01.049Z") }
-## Derived Types ##
+## <a id="DerivedTypes">Derived Types</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ##
-### Record ###
+### <a id="DerivedTypesRecord">Record</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
A `record` contains a set of fields, where each field is described by its name and type. A record type is either open or closed. Open records can contain fields that are not part of the type definition, while closed records cannot. Syntactically, record constructors are surrounded by curly braces "{...}".
An example would be
@@ -271,7 +295,7 @@
{ "id": 213508, "name": "Alice Bob" }
-### OrderedList ###
+### <a id="DerivedTypesOrderedList">OrderedList</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
An `orderedList` is a sequence of values for which the order is determined by creation or insertion. OrderedList constructors are denoted by brackets: "[...]".
An example would be
@@ -280,7 +304,7 @@
["alice", 123, "bob", null]
-### UnorderedList ###
+### <a id="DerivedTypesUnorderedList">UnorderedList</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ###
An `unorderedList` is an unordered sequence of values, similar to bags in SQL. UnorderedList constructors are denoted by two opening flower braces followed by data and two closing flower braces, like "{{...}}".
An example would be
diff --git a/asterix-doc/src/site/markdown/aql/externaldata.md b/asterix-doc/src/site/markdown/aql/externaldata.md
index e603954..ca350b9 100644
--- a/asterix-doc/src/site/markdown/aql/externaldata.md
+++ b/asterix-doc/src/site/markdown/aql/externaldata.md
@@ -1,12 +1,19 @@
# Accessing External Data in AsterixDB #
-## Introduction ##
+## <a id="toc">Table of Contents</a> ##
+
+* [Introduction](#Introduction)
+ * [Adapter for an External Dataset](#IntroductionAdapterForAnExternalDataset)
+ * [Creating an External Dataset](#IntroductionCreatingAnExternalDataset)
+* [Writing Queries against an External Dataset](#WritingQueriesAgainstAnExternalDataset)
+
+## <a id="Introduction">Introduction</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
Data that needs to be processed by ASTERIX could be residing outside ASTERIX storage. Examples include data files on a distributed file system such as HDFS or on the local file system of a machine that is part of an ASTERIX cluster. For ASTERIX to process such data, end-user may create a regular dataset in ASTERIX (a.k.a. internal dataset) and load the dataset with the data. ASTERIX supports ''external datasets'' so that it is not necessary to “load” all data prior to using it. This also avoids creating multiple copies of data and the need to keep the copies in sync.
-### Adapter for an External Dataset ###
+### <a id="IntroductionAdapterForAnExternalDataset">Adapter for an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
External data is accessed using wrappers (adapters in ASTERIX) that abstract away the mechanism of connecting with an external service, receiving data and transforming the data into ADM records that are understood by ASTERIX. ASTERIX comes with built-in adapters for common storage systems such as HDFS or the local file system.
-### Creating an External Dataset ###
+### <a id="IntroductionCreatingAnExternalDataset">Creating an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
As an example we consider the Lineitem dataset from [TPCH schema](http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSTPCHLinkedData/tpch.sql).
@@ -168,7 +175,7 @@
You may now run the sample query in next section.
-## Writing Queries against an External Dataset ##
+## <a id="WritingQueriesAgainstAnExternalDataset">Writing Queries against an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
You may write AQL queries against an external dataset. Following is an example AQL query that applies a filter and returns an ordered result.
diff --git a/asterix-doc/src/site/markdown/aql/functions.md b/asterix-doc/src/site/markdown/aql/functions.md
index 0c183a9..ca91581 100644
--- a/asterix-doc/src/site/markdown/aql/functions.md
+++ b/asterix-doc/src/site/markdown/aql/functions.md
@@ -1,7 +1,154 @@
# Asterix: Using Functions #
-Asterix provides rich support of various classes of functions to support operations on string, spatial, and temporal data. This document explains how to use these functions.
-## String Functions ##
+## <a id="toc">Table of Contents</a> ##
+
+* [Numeric Functions](#NumericFunctions)
+* [String Functions](#StringFunctions)
+* [Aggregate Functions](#AggregateFunctions)
+* [Spatial Functions](#SpatialFunctions)
+* [Similarity Functions](#SimilarityFunctions)
+* [Tokenizing Functions](#TokenizingFunctions)
+* [Temporal Functions](#TemporalFunctions)
+* [Other Functions](#OtherFunctions)
+
+Asterix provides various classes of functions to support operations on numeric, string, spatial, and temporal data. This document explains how to use these functions.
+
+## <a id="NumericFunctions">Numeric Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
+### numeric-abs ###
+ * Syntax:
+
+ numeric-abs(numeric_expression)
+
+ * Computes the absolute value of the argument.
+ * Arguments:
+ * `numeric_expression`: A `int8`/`int16`/`int32`/`int64`/`float`/`double` value.
+ * Return Value:
+ * The absolute value of the argument with the same type as the input argument, or `null` if the argument is a `null` value.
+
+ * Example:
+
+ let $v1 := numeric-abs(2013)
+ let $v2 := numeric-abs(-4036)
+ let $v3 := numeric-abs(0)
+ let $v4 := numeric-abs(float("-2013.5"))
+ let $v5 := numeric-abs(double("-2013.593823748327284"))
+ return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4, "v5": $v5 }
+
+
+ * The expected result is:
+
+ { "v1": 2013, "v2": 4036, "v3": 0, "v4": 2013.5f, "v5": 2013.5938237483274d }
+
+
+### numeric-ceiling ###
+ * Syntax:
+
+ numeric-ceiling(numeric_expression)
+
+ * Computes the smallest (closest to negative infinity) number with no fractional part that is not less than the value of the argument. If the argument is already equal to mathematical integer, then the result is the same as the argument.
+ * Arguments:
+ * `numeric_expression`: A `int8`/`int16`/`int32`/`int64`/`float`/`double` value.
+ * Return Value:
+ * The ceiling value for the given number in the same type as the input argument, or `null` if the input is `null`.
+
+ * Example:
+
+ let $v1 := numeric-ceiling(2013)
+ let $v2 := numeric-ceiling(-4036)
+ let $v3 := numeric-ceiling(0.3)
+ let $v4 := numeric-ceiling(float("-2013.2"))
+ let $v5 := numeric-ceiling(double("-2013.893823748327284"))
+ return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4, "v5": $v5 }
+
+
+ * The expected result is:
+
+ { "v1": 2013, "v2": -4036, "v3": 1.0d, "v4": -2013.0f, "v5": -2013.0d }
+
+
+### numeric-floor ###
+ * Syntax:
+
+ numeric-floor(numeric_expression)
+
+ * Computes the largest (closest to positive infinity) number with no fractional part that is not greater than the value. If the argument is already equal to mathematical integer, then the result is the same as the argument.
+ * Arguments:
+ * `numeric_expression`: A `int8`/`int16`/`int32`/`int64`/`float`/`double` value.
+ * Return Value:
+ * The floor value for the given number in the same type as the input argument, or `null` if the input is `null`.
+
+ * Example:
+
+ let $v1 := numeric-floor(2013)
+ let $v2 := numeric-floor(-4036)
+ let $v3 := numeric-floor(0.8)
+ let $v4 := numeric-floor(float("-2013.2"))
+ let $v5 := numeric-floor(double("-2013.893823748327284"))
+ return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4, "v5": $v5 }
+
+
+ * The expected result is:
+
+ { "v1": 2013, "v2": -4036, "v3": 0.0d, "v4": -2014.0f, "v5": -2014.0d }
+
+
+### numeric-round ###
+ * Syntax:
+
+ numeric-round(numeric_expression)
+
+ * Computes the number with no fractional part that is closest (and also closest to positive infinity) to the argument.
+ * Arguments:
+ * `numeric_expression`: A `int8`/`int16`/`int32`/`int64`/`float`/`double` value.
+ * Return Value:
+ * The rounded value for the given number in the same type as the input argument, or `null` if the input is `null`.
+
+ * Example:
+
+ let $v1 := numeric-round(2013)
+ let $v2 := numeric-round(-4036)
+ let $v3 := numeric-round(0.8)
+ let $v4 := numeric-round(float("-2013.256"))
+ let $v5 := numeric-round(double("-2013.893823748327284"))
+ return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4, "v5": $v5 }
+
+
+ * The expected result is:
+
+ { "v1": 2013, "v2": -4036, "v3": 1.0d, "v4": -2013.0f, "v5": -2014.0d }
+
+
+### numeric-round-half-to-even ###
+ * Syntax:
+
+ numeric-round-half-to-even(numeric_expression, [precision])
+
+ * Computes the closest numeric value to `numeric_expression` that is a multiple of ten to the power of minus `precision`. `precision` is optional and by default value `0` is used.
+ * Arguments:
+ * `numeric_expression`: A `int8`/`int16`/`int32`/`int64`/`float`/`double` value.
+ * `precision`: An optional integer field representing the number of digits in the fraction of the the result
+ * Return Value:
+ * The rounded value for the given number in the same type as the input argument, or `null` if the input is `null`.
+
+ * Example:
+
+ let $v1 := numeric-round-half-to-even(2013)
+ let $v2 := numeric-round-half-to-even(-4036)
+ let $v3 := numeric-round-half-to-even(0.8)
+ let $v4 := numeric-round-half-to-even(float("-2013.256"))
+ let $v5 := numeric-round-half-to-even(double("-2013.893823748327284"))
+ let $v6 := numeric-round-half-to-even(double("-2013.893823748327284"), 2)
+ let $v7 := numeric-round-half-to-even(2013, 4)
+ let $v8 := numeric-round-half-to-even(float("-2013.256"), 5)
+ return { "v1": $v1, "v2": $v2, "v3": $v3, "v4": $v4, "v5": $v5, "v6": $v6, "v7": $v7, "v8": $v8 }
+
+
+ * The expected result is:
+
+ { "v1": 2013, "v2": -4036, "v3": 1.0d, "v4": -2013.0f, "v5": -2014.0d, "v6": -2013.89d, "v7": 2013, "v8": -2013.256f }
+
+
+## <a id="StringFunctions">String Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
### string-to-codepoint ###
* Syntax:
@@ -9,9 +156,9 @@
* Converts the string `string_expression` to its code-based representation.
* Arguments:
- * `string_expression` : A `string` that will be converted.
+ * `string_expression` : A `string` that will be converted.
* Return Value:
- * An `OrderedList` of the code points for the string `string_expression`.
+ * An `OrderedList` of the code points for the string `string_expression`.
### codepoint-to-string ###
* Syntax:
@@ -20,9 +167,9 @@
* Converts the ordered code-based representation `list_expression` to the corresponding string.
* Arguments:
- * `list_expression` : An `OrderedList` of code-points.
+ * `list_expression` : An `OrderedList` of code-points.
* Return Value:
- * A `string` representation of `list_expression`.
+ * A `string` representation of `list_expression`.
* Example:
@@ -46,10 +193,10 @@
* Checks whether the string `string_expression` contains the string `substring_to_contain`
* Arguments:
- * `string_expression` : A `string` that might contain the given substring.
- * `substring_to_contain` : A target `string` that might be contained.
+ * `string_expression` : A `string` that might contain the given substring.
+ * `substring_to_contain` : A target `string` that might be contained.
* Return Value:
- * A `boolean`, returns `true` if `string_expression` contains `substring_to_contain`, otherwise returns `false`.
+ * A `boolean` value, `true` if `string_expression` contains `substring_to_contain`, and `false` otherwise.
* Example:
@@ -67,41 +214,17 @@
{ "mid": 15, "message": " like iphone the voicemail-service is awesome" }
-### len ###
- * Syntax:
-
- len(list_expression)
-
- * Returns the length of the list `list_expression`.
- * Arguments:
- * `list_expression` : An `OrderedList`, `UnorderedList` or `null`, represents the list need to be checked.
- * Return Value:
- * An `int32` that represents the length of `list_expression`.
-
- * Example:
-
- use dataverse TinySocial;
-
- let $l := ["ASTERIX", "Hyracks"]
- return len($l)
-
-
- * The expected result is:
-
- 2
-
-
### like ###
* Syntax:
like(string_expression, string_pattern)
- * Checks whether the string `string_expression` contains the string pattern `string_pattern`. Compared with `contains` function, `like` function also supports regex keywords.
+ * Checks whether the string `string_expression` contains the string pattern `string_pattern`. Compared to the `contains` function, the `like` function also supports regular expressions.
* Arguments:
- * `string_expression` : A `string` that might contain the pattern or `null`.
- * `string_pattern` : A pattern `string` that might be contained or `null`.
+ * `string_expression` : A `string` that might contain the pattern or `null`.
+ * `string_pattern` : A pattern `string` that might be contained or `null`.
* Return Value:
- * A `boolean`, returns `true` if `string_expression` contains the pattern `string_pattern`, otherwise returns `false`.
+ * A `boolean` value, `true` if `string_expression` contains the pattern `string_pattern`, and `false` otherwise.
* Example:
@@ -126,10 +249,10 @@
* Checks whether the string `string_expression` starts with the string `substring_to_start_with`.
* Arguments:
- * `string_expression` : A `string` that might start with the given string.
- * `substring_to_start_with` : A `string` that might be contained as the starting substring.
+ * `string_expression` : A `string` that might start with the given string.
+ * `substring_to_start_with` : A `string` that might be contained as the starting substring.
* Return Value:
- * A `boolean`, returns `true` if `string_expression` starts with the string `substring_to_start_with`, otherwise returns `false`.
+ * A `boolean`, returns `true` if `string_expression` starts with the string `substring_to_start_with`, and `false` otherwise.
* Example:
@@ -155,10 +278,10 @@
* Checks whether the string `string_expression` ends with the string `substring_to_end_with`.
* Arguments:
- * `string_expression` : A `string` that might end with the given string.
- * `substring_to_end_with` : A `string` that might be contained as the ending substring.
+ * `string_expression` : A `string` that might end with the given string.
+ * `substring_to_end_with` : A `string` that might be contained as the ending substring.
* Return Value:
- * A `boolean`, returns `true` if `string_expression` ends with the string `substring_to_end_with`, otherwise returns `false`.
+ * A `boolean`, returns `true` if `string_expression` ends with the string `substring_to_end_with`, and `false` otherwise.
* Example:
@@ -183,9 +306,9 @@
* Concatenates a list of strings `list_expression` into a single string.
* Arguments:
- * `list_expression` : An `OrderedList` or `UnorderedList` of `string`s (could be `null`) to be concatenated.
+ * `list_expression` : An `OrderedList` or `UnorderedList` of `string`s (could be `null`) to be concatenated.
* Return Value:
- * Returns the concatenated `string` value.
+ * Returns the concatenated `string` value.
* Example:
@@ -200,31 +323,6 @@
"ASTERIX ROCKS!"
-### string-equal ###
- * Syntax:
-
- string-equal(string_expression1, string_expression2)
-
- * Checks whether the strings `string_expression1` and `string_expression2` are equal.
- * Arguments:
- * `string_expression1` : A `string` to be compared.
- * `string_expression2` : A `string` to be compared with.
- * Return Value:
- * A `boolean`, returns `true` if `string_expression1` and `string_expression2` are equal, otherwise returns `false`.
-
- * Example:
-
- use dataverse TinySocial;
-
- let $i := "Android"
- return {"Equal": string-equal($i, "Android"), "NotEqual": string-equal($i, "iphone")}
-
-
- * The expected result is:
-
- { "Equal": true, "NotEqual": false }
-
-
### string-join ###
* Syntax:
@@ -232,10 +330,10 @@
* Joins a list of strings `list_expression` with the given separator `string_expression` into a single string.
* Arguments:
- * `list_expression` : An `OrderedList` or `UnorderedList` of `string`s (could be `null`) to be joined.
- * `string_expression` : A `string` as the separator.
+ * `list_expression` : An `OrderedList` or `UnorderedList` of strings (could be `null`) to be joined.
+ * `string_expression` : A `string` as the separator.
* Return Value:
- * Returns the joined `String`.
+ * Returns the joined `String`.
* Example:
@@ -257,9 +355,9 @@
* Converts a given string `string_expression` to its lowercase form.
* Arguments:
- * `string_expression` : A `string` to be converted.
+ * `string_expression` : A `string` to be converted.
* Return Value:
- * Returns a `string` as the lowercase form of the given `string_expression`.
+ * Returns a `string` as the lowercase form of the given `string_expression`.
* Example:
@@ -281,10 +379,10 @@
* Checks whether the strings `string_expression` matches the given pattern `string_pattern`.
* Arguments:
- * `string_expression` : A `string` that might contain the pattern.
- * `string_pattern` : A pattern `string` to be matched.
+ * `string_expression` : A `string` that might contain the pattern.
+ * `string_pattern` : A pattern `string` to be matched.
* Return Value:
- * A `boolean`, returns `true` if `string_expression` matches the pattern `string_pattern`, otherwise returns `false`.
+ * A `boolean`, returns `true` if `string_expression` matches the pattern `string_pattern`, and `false` otherwise.
* Example:
@@ -306,13 +404,13 @@
replace(string_expression, string_pattern, string_replacement)
- * Checks whether the strings `string_expression` matches the given pattern `string_pattern`, and replace the matched pattern `string_pattern` with the new pattern `string_replacement`.
+ * Checks whether the string `string_expression` matches the given pattern `string_pattern`, and replace the matched pattern `string_pattern` with the new pattern `string_replacement`.
* Arguments:
- * `string_expression` : A `string` that might contain the pattern.
- * `string_pattern` : A pattern `string` to be matched.
- * `string_replacement` : A pattern `string` to be used as the replacement.
+ * `string_expression` : A `string` that might contain the pattern.
+ * `string_pattern` : A pattern `string` to be matched.
+ * `string_replacement` : A pattern `string` to be used as the replacement.
* Return Value:
- * Returns a `string` that is obtained after the replacements.
+ * Returns a `string` that is obtained after the replacements.
* Example:
@@ -335,9 +433,9 @@
* Returns the length of the string `string_expression`.
* Arguments:
- * `string_expression` : A `string` or `null`, represents the string to be checked.
+ * `string_expression` : A `string` or `null` that represents the string to be checked.
* Return Value:
- * An `int32` that represents the length of `string_expression`.
+ * An `int32` that represents the length of `string_expression`.
* Example:
@@ -373,11 +471,11 @@
* Returns the substring from the given string `string_expression` based on the given start offset `offset` with the optional `length`.
* Arguments:
- * `string_expression` : A `string` as the string to be extracted.
- * `offset` : An `int32` as the starting offset of the substring in `string_expression`.
- * `length` : (Optional) An `int32` as the length of the substring.
+ * `string_expression` : A `string` to be extracted.
+ * `offset` : An `int32` as the starting offset of the substring in `string_expression`.
+ * `length` : (Optional) An `int32` as the length of the substring.
* Return Value:
- * A `string` that represents the substring.
+ * A `string` that represents the substring.
* Example:
@@ -400,10 +498,10 @@
* Returns the substring from the given string `string_expression` before the given pattern `string_pattern`.
* Arguments:
- * `string_expression` : A `string` as the string to be extracted.
- * `string_pattern` : A `string` as the string pattern to be searched.
+ * `string_expression` : A `string` to be extracted.
+ * `string_pattern` : A `string` pattern to be searched.
* Return Value:
- * A `string` that represents the substring.
+ * A `string` that represents the substring.
* Example:
@@ -428,10 +526,10 @@
* Returns the substring from the given string `string_expression` after the given pattern `string_pattern`.
* Arguments:
- * `string_expression` : A `string` as the string to be extracted.
- * `string_pattern` : A `string` as the string pattern to be searched.
+ * `string_expression` : A `string` to be extracted.
+ * `string_pattern` : A `string` pattern to be searched.
* Return Value:
- * A `string` that represents the substring.
+ * A `string` that represents the substring.
* Example:
@@ -448,8 +546,97 @@
" the voice-command is bad:("
" the voicemail-service is awesome"
+## <a id="AggregateFunctions">Aggregate Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
+### count ###
+ * Syntax:
+
+ count(list)
+
+ * Gets the number of items in the given list.
+ * Arguments:
+ * `list`: An `orderedList` or `unorderedList` containing the items to be counted, or a `null` value.
+ * Return Value:
+ * An `int64` value representing the number of items in the given list. `0i64` is returned if the input is `null`.
+
+ * Example:
+
+ use dataverse TinySocial;
-## Spatial Functions ##
+ let $l1 := ['hello', 'world', 1, 2, 3]
+ let $l2 := for $i in dataset TwitterUsers return $i
+ return {"count1": count($l1), "count2": count($l2)}
+
+ * The expected result is:
+
+ { "count1": 5i64, "count2": 4i64 }
+
+### avg ###
+ * Syntax:
+
+ avg(num_list)
+
+ * Gets the average value of the items in the given list.
+ * Arguments:
+ * `num_list`: An `orderedList` or `unorderedList` containing numeric or null values, or a `null` value.
+ * Return Value:
+ * An `double` value representing the average of the numbers in the given list. `null` is returned if the input is `null`, or the input list contains `null`. Non-numeric types in the input list will cause an error.
+
+ * Example:
+
+ use dataverse TinySocial;
+
+ let $l := for $i in dataset TwitterUsers return $i.friends_count
+ return {"avg_friend_count": avg($l)}
+
+ * The expected result is:
+
+ { "avg_friend_count": 191.5d }
+
+### sum ###
+ * Syntax:
+
+ sum(num_list)
+
+ * Gets the sum of the items in the given list.
+ * Arguments:
+ * `num_list`: An `orderedList` or `unorderedList` containing numeric or null values, or a `null` value.
+ * Return Value:
+ * The sum of the numbers in the given list. The returning type is decided by the item type with the highest order in the numeric type promotion order (`int8`-> `int16`->`int32`->`float`->`double`, `int32`->`int64`->`double`) among items. `null` is returned if the input is `null`, or the input list contains `null`. Non-numeric types in the input list will cause an error.
+
+ * Example:
+
+ use dataverse TinySocial;
+
+ let $l := for $i in dataset TwitterUsers return $i.friends_count
+ return {"sum_friend_count": sum($l)}
+
+ * The expected result is:
+
+ { "sum_friend_count": 766 }
+
+### min/max ###
+ * Syntax:
+
+ min(num_list), max(num_list)
+
+ * Gets the min/max value of numeric items in the given list.
+ * Arguments:
+ * `num_list`: An `orderedList` or `unorderedList` containing the items to be compared, or a `null` value.
+ * Return Value:
+ * The min/max value of the given list. The returning type is decided by the item type with the highest order in the numeric type promotion order (`int8`-> `int16`->`int32`->`float`->`double`, `int32`->`int64`->`double`) among items. `null` is returned if the input is `null`, or the input list contains `null`. Non-numeric types in the input list will cause an error.
+
+ * Example:
+
+ use dataverse TinySocial;
+
+ let $l := for $i in dataset TwitterUsers return $i. friends_count
+ return {"min_friend_count": min($l), "max_friend_count": max($l)}
+
+ * The expected result is:
+
+ { "min_friend_count": 18, "max_friend_count": 445 }
+
+## <a id="SpatialFunctions">Spatial Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
### create-point ###
* Syntax:
@@ -482,10 +669,10 @@
* Creates the primitive type `line` using `point_expression1` and `point_expression2`.
* Arguments:
- * `point_expression1` : A `point` that represents the start point of the line.
- * `point_expression2` : A `point` that represents the end point of the line.
+ * `point_expression1` : A `point` that represents the start point of the line.
+ * `point_expression2` : A `point` that represents the end point of the line.
* Return Value:
- * A `line`, represents a spatial line created using the points provided in `point_expression1` and `point_expression2`.
+ * A spatial `line` created using the points provided in `point_expression1` and `point_expression2`.
* Example:
@@ -507,10 +694,10 @@
* Creates the primitive type `rectangle` using `point_expression1` and `point_expression2`.
* Arguments:
- * `point_expression1` : A `point` that represents the lower-left point of the rectangle.
- * `point_expression2` : A `point` that represents the upper-right point of the rectangle.
+ * `point_expression1` : A `point` that represents the lower-left point of the rectangle.
+ * `point_expression2` : A `point` that represents the upper-right point of the rectangle.
* Return Value:
- * A `rectangle`, represents a spatial rectangle created using the points provided in `point_expression1` and `point_expression2`.
+ * A spatial `rectangle` created using the points provided in `point_expression1` and `point_expression2`.
* Example:
@@ -532,10 +719,10 @@
* Creates the primitive type `circle` using `point_expression` and `radius`.
* Arguments:
- * `point_expression` : A `point` that represents the center of the circle.
- * `radius` : A `double` that represents the radius of the circle.
+ * `point_expression` : A `point` that represents the center of the circle.
+ * `radius` : A `double` that represents the radius of the circle.
* Return Value:
- * A `circle`, represents a spatial circle created using the center point and the radius provided in `point_expression` and `radius`.
+ * A spatial `circle` created using the center point and the radius provided in `point_expression` and `radius`.
* Example:
@@ -579,15 +766,14 @@
point(string_expression)
- * Constructor function for `point` type by parsing a point string `string_expression`
+ * Constructor function for the `point` type by parsing a point string `string_expression`
* Arguments:
- * `string_expression` : The `string` value representing a point value.
+ * `string_expression` : The `string` value representing a point value.
* Return Value:
- * A `point` value represented by the given string.
+ * A `point` value represented by the given string.
* Example:
-
use dataverse TinySocial;
let $c := point("55.05,-138.04")
@@ -606,13 +792,12 @@
* Constructor function for `line` type by parsing a line string `string_expression`
* Arguments:
- * `string_expression` : The `string` value representing a line value.
+ * `string_expression` : The `string` value representing a line value.
* Return Value:
- * A `line` value represented by the given string.
+ * A `line` value represented by the given string.
* Example:
-
use dataverse TinySocial;
let $c := line("55.05,-138.04 13.54,-138.04")
@@ -631,13 +816,12 @@
* Constructor function for `rectangle` type by parsing a rectangle string `string_expression`
* Arguments:
- * `string_expression` : The `string` value representing a rectangle value.
+ * `string_expression` : The `string` value representing a rectangle value.
* Return Value:
- * A `rectangle` value represented by the given string.
+ * A `rectangle` value represented by the given string.
* Example:
-
use dataverse TinySocial;
let $c := rectangle("20.05,-125.0 40.67,-100.87")
@@ -656,13 +840,12 @@
* Constructor function for `circle` type by parsing a circle string `string_expression`
* Arguments:
- * `string_expression` : The `string` value representing a circle value.
+ * `string_expression` : The `string` value representing a circle value.
* Return Value:
* A `circle` value represented by the given string.
* Example:
-
use dataverse TinySocial;
let $c := circle("55.05,-138.04 10.0")
@@ -681,13 +864,12 @@
* Constructor function for `polygon` type by parsing a polygon string `string_expression`
* Arguments:
- * `string_expression` : The `string` value representing a polygon value.
+ * `string_expression` : The `string` value representing a polygon value.
* Return Value:
- * A `polygon` value represented by the given string.
+ * A `polygon` value represented by the given string.
* Example:
-
use dataverse TinySocial;
let $c := polygon("55.05,-138.04 13.54,-138.04 13.54,-53.31 55.05,-53.31")
@@ -706,9 +888,9 @@
* Returns the x or y coordinates of a point `point_expression`.
* Arguments:
- * `point_expression` : A `point`.
+ * `point_expression` : A `point`.
* Return Value:
- * A `double`, represents the x or y coordinates of the point `point_expression`.
+ * A `double` representing the x or y coordinates of the point `point_expression`.
* Example:
@@ -730,9 +912,9 @@
* Returns an ordered list of the points forming the spatial object `spatial_expression`.
* Arguments:
- * `spatial_expression` : A `point`, `line`, `rectangle`, `circle`, or `polygon`.
+ * `spatial_expression` : A `point`, `line`, `rectangle`, `circle`, or `polygon`.
* Return Value:
- * An `OrderedList` of the points forming the spatial object `spatial_expression`.
+ * An `OrderedList` of the points forming the spatial object `spatial_expression`.
* Example:
@@ -757,11 +939,11 @@
get-center(circle_expression) or get-radius(circle_expression)
- * Returns the center and the radius of a circle `circle_expression`.
+ * Returns the center and the radius of a circle `circle_expression`, respectively.
* Arguments:
- * `circle_expression` : A `circle`.
+ * `circle_expression` : A `circle`.
* Return Value:
- * A `point` or `double`, represent the center or radius of the circle `circle_expression`.
+ * A `point` or `double`, represent the center or radius of the circle `circle_expression`.
* Example:
@@ -783,12 +965,12 @@
spatial-distance(point_expression1, point_expression2)
- * Returns the euclidean distance between `point_expression1` and `point_expression2`.
+ * Returns the Euclidean distance between `point_expression1` and `point_expression2`.
* Arguments:
- * `point_expression1` : A `point`.
- * `point_expression2` : A `point`.
+ * `point_expression1` : A `point`.
+ * `point_expression2` : A `point`.
* Return Value:
- * A `double`, represents the euclidean distance between `point_expression1` and `point_expression2`.
+ * A `double` as the Euclidean distance between `point_expression1` and `point_expression2`.
* Example:
@@ -819,13 +1001,13 @@
### spatial-area ###
* Syntax:
- spatial-distance(spatial_2d_expression)
+ spatial-area(spatial_2d_expression)
* Returns the spatial area of `spatial_2d_expression`.
* Arguments:
- * `spatial_2d_expression` : A `rectangle`, `circle`, or `polygon`.
+ * `spatial_2d_expression` : A `rectangle`, `circle`, or `polygon`.
* Return Value:
- * A `double`, represents the area of `spatial_2d_expression`.
+ * A `double` representing the area of `spatial_2d_expression`.
* Example:
@@ -848,10 +1030,10 @@
* Checks whether `@arg1` and `@arg2` spatially intersect each other.
* Arguments:
- * `spatial_expression1` : A `point`, `line`, `rectangle`, `circle`, or `polygon`.
- * `spatial_expression2` : A `point`, `line`, `rectangle`, `circle`, or `polygon`.
+ * `spatial_expression1` : A `point`, `line`, `rectangle`, `circle`, or `polygon`.
+ * `spatial_expression2` : A `point`, `line`, `rectangle`, `circle`, or `polygon`.
* Return Value:
- * A `boolean`, represents whether `spatial_expression1` and `spatial_expression2` spatially intersect each other.
+ * A `boolean` representing whether `spatial_expression1` and `spatial_expression2` spatially overlap with each other.
* Example:
@@ -876,12 +1058,12 @@
* Returns the grid cell that `point_expression1` belongs to.
* Arguments:
- * `point_expression1` : A `point`, represents the point of interest that its grid cell will be returned.
- * `point_expression2` : A `point`, represents the origin of the grid.
- * `x_increment` : A `double`, represents X increments.
- * `y_increment` : A `double`, represents Y increments.
+ * `point_expression1` : A `point` representing the point of interest that its grid cell will be returned.
+ * `point_expression2` : A `point` representing the origin of the grid.
+ * `x_increment` : A `double`, represents X increments.
+ * `y_increment` : A `double`, represents Y increments.
* Return Value:
- * A `rectangle`, represents the grid cell that `point_expression1` belongs to.
+ * A `rectangle` representing the grid cell that `point_expression1` belongs to.
* Example:
@@ -895,21 +1077,21 @@
* The expected result is:
- { "cell": rectangle("20.0,92.0 25.5,98.0"), "count": 1 }
- { "cell": rectangle("25.5,74.0 31.0,80.0"), "count": 2 }
- { "cell": rectangle("31.0,62.0 36.5,68.0"), "count": 1 }
- { "cell": rectangle("31.0,68.0 36.5,74.0"), "count": 1 }
- { "cell": rectangle("36.5,68.0 42.0,74.0"), "count": 2 }
- { "cell": rectangle("36.5,74.0 42.0,80.0"), "count": 1 }
- { "cell": rectangle("36.5,92.0 42.0,98.0"), "count": 1 }
- { "cell": rectangle("42.0,80.0 47.5,86.0"), "count": 1 }
- { "cell": rectangle("42.0,92.0 47.5,98.0"), "count": 1 }
- { "cell": rectangle("47.5,80.0 53.0,86.0"), "count": 1 }
+ { "cell": rectangle("20.0,92.0 25.5,98.0"), "count": 1i64 }
+ { "cell": rectangle("25.5,74.0 31.0,80.0"), "count": 2i64 }
+ { "cell": rectangle("31.0,62.0 36.5,68.0"), "count": 1i64 }
+ { "cell": rectangle("31.0,68.0 36.5,74.0"), "count": 1i64 }
+ { "cell": rectangle("36.5,68.0 42.0,74.0"), "count": 2i64 }
+ { "cell": rectangle("36.5,74.0 42.0,80.0"), "count": 1i64 }
+ { "cell": rectangle("36.5,92.0 42.0,98.0"), "count": 1i64 }
+ { "cell": rectangle("42.0,80.0 47.5,86.0"), "count": 1i64 }
+ { "cell": rectangle("42.0,92.0 47.5,98.0"), "count": 1i64 }
+ { "cell": rectangle("47.5,80.0 53.0,86.0"), "count": 1i64 }
-## Similarity Functions ##
+## <a id="SimilarityFunctions">Similarity Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
AsterixDB supports queries with different similarity functions, including edit distance and Jaccard.
@@ -920,10 +1102,10 @@
* Returns the [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) of `expression1` and `expression2`.
* Arguments:
- * `expression1` : A `string` or a homogeneous `OrderedList` of a comparable item type.
- * `expression2` : The same type as `expression1`.
+ * `expression1` : A `string` or a homogeneous `OrderedList` of a comparable item type.
+ * `expression2` : The same type as `expression1`.
* Return Value:
- * An `int32` that represents the edit-distance similarity between `expression1` and `expression2`.
+ * An `int32` that represents the edit distance between `expression1` and `expression2`.
* Example:
@@ -948,16 +1130,16 @@
edit-distance-check(expression1, expression2, threshold)
- * Checks whether `expression1` and `expression2` have a [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) `<= threshold`. The “check” version of edit distance is faster than the "non-check" version because the former can detect whether two items satisfy a given similarity threshold using early-termination techniques, as opposed to computing their real distance. Although possible, it is not necessary for the user to write queries using the “check” versions explicitly, since a rewrite rule can perform an appropriate transformation from a “non-check” version to a “check” version.
+ * Checks whether `expression1` and `expression2` have an [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) within a given threshold. The “check” version of edit distance is faster than the "non-check" version because the former can detect whether two items satisfy a given threshold using early-termination techniques, as opposed to computing their real distance. Although possible, it is not necessary for the user to write queries using the “check” versions explicitly, since a rewrite rule can perform an appropriate transformation from a “non-check” version to a “check” version.
* Arguments:
- * `expression1` : A `string` or a homogeneous `OrderedList` of a comparable item type.
- * `expression2` : The same type as `expression1`.
- * `threshold` : An `int32` that represents the distance threshold.
+ * `expression1` : A `string` or a homogeneous `OrderedList` of a comparable item type.
+ * `expression2` : The same type as `expression1`.
+ * `threshold` : An `int32` that represents the distance threshold.
* Return Value:
- * An `OrderedList` with two items:
- * The first item contains a `boolean` value representing whether `expression1` and `expression2` are similar.
- * The second item contains an `int32` that represents the edit distance of `expression1` and `expression2` if it is `<= `threshold`, or 0 otherwise.
+ * An `OrderedList` with two items:
+ * The first item contains a `boolean` value representing whether `expression1` and `expression2` are similar.
+ * The second item contains an `int32` that represents the edit distance of `expression1` and `expression2` if it is within the threshold, or 0 otherwise.
* Example:
@@ -981,10 +1163,10 @@
* Returns the [Jaccard similarity](http://en.wikipedia.org/wiki/Jaccard_index) of `list_expression1` and `list_expression2`.
* Arguments:
- * `list_expression1` : An `UnorderedList` or `OrderedList`.
- * `list_expression2` : An `UnorderedList` or `OrderedList`.
+ * `list_expression1` : An `UnorderedList` or `OrderedList`.
+ * `list_expression2` : An `UnorderedList` or `OrderedList`.
* Return Value:
- * A `float` that represents the Jaccard similarity of `list_expression1` and `list_expression2`.
+ * A `float` that represents the Jaccard similarity of `list_expression1` and `list_expression2`.
* Example:
@@ -1013,16 +1195,16 @@
similarity-jaccard-check(list_expression1, list_expression2, threshold)
- * Checks whether `list_expression1` and `list_expression2` have a [Jaccard similarity](http://en.wikipedia.org/wiki/Jaccard_index) `>= threshold`. Again, the “check” version of Jaccard is faster than the "non-check" version.
+ * Checks whether `list_expression1` and `list_expression2` have a [Jaccard similarity](http://en.wikipedia.org/wiki/Jaccard_index) greater than or equal to threshold. Again, the “check” version of Jaccard is faster than the "non-check" version.
* Arguments:
- * `list_expression1` : An `UnorderedList` or `OrderedList`.
- * `list_expression2` : An `UnorderedList` or `OrderedList`.
- * `threshold` : A `float` that represents the similarity threshold.
+ * `list_expression1` : An `UnorderedList` or `OrderedList`.
+ * `list_expression2` : An `UnorderedList` or `OrderedList`.
+ * `threshold` : A `float` that represents the similarity threshold.
* Return Value:
- * An `OrderedList` with two items:
+ * An `OrderedList` with two items:
* The first item contains a `boolean` value representing whether `list_expression1` and `list_expression2` are similar.
- * The second item contains a `float` that represents the Jaccard similarity of `list_expression1` and `list_expression2` if it is >`= `threshold`, or 0 otherwise.
+ * The second item contains a `float` that represents the Jaccard similarity of `list_expression1` and `list_expression2` if it is greater than or equal to the threshold, or 0 otherwise.
* Example:
@@ -1043,7 +1225,7 @@
### Similarity Operator ~# ###
* "`~=`" is syntactic sugar for expressing a similarity condition with a given similarity threshold.
* The similarity function and threshold for "`~=`" are controlled via "set" directives.
- * The "`~=`" operator returns a `boolean` that represents whether the operands are similar.
+ * The "`~=`" operator returns a `boolean` value that represents whether the operands are similar.
* Example for Jaccard similarity:
@@ -1089,7 +1271,7 @@
}
-## Tokenizing Functions ##
+## <a id="TokenizingFunctions">Tokenizing Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
### word-tokens ###
* Syntax:
@@ -1097,9 +1279,9 @@
* Returns a list of word tokens of `string_expression`.
* Arguments:
- * `string_expression` : A `string` that will be tokenized.
+ * `string_expression` : A `string` that will be tokenized.
* Return Value:
- * An `OrderedList` of `string` word tokens.
+ * An `OrderedList` of `string` word tokens.
* Example:
@@ -1126,7 +1308,7 @@
* Returns a list of hashed word tokens of `string_expression`.
* Arguments:
- * `string_expression` : A `string` that will be tokenized.
+ * `string_expression` : A `string` that will be tokenized.
* Return Value:
* An `OrderedList` of `int32` hashed tokens.
@@ -1155,9 +1337,9 @@
* Returns a list of hashed word tokens of `string_expression`. The hashing mechanism gives duplicate tokens different hash values, based on the occurrence count of that token.
* Arguments:
- * `string_expression` : A `String` that will be tokenized.
+ * `string_expression` : A `String` that will be tokenized.
* Return Value:
- * An `OrderedList` of `Int32` hashed tokens.
+ * An `OrderedList` of `Int32` hashed tokens.
* Example:
use dataverse TinySocial;
@@ -1183,11 +1365,11 @@
* Returns a list of gram tokens of `string_expression`, which can be obtained by scanning the characters using a sliding window of a fixed length.
* Arguments:
- * `string_expression` : A `String` that will be tokenized.
- * `gram_length` : An `Int32` as the length of grams.
+ * `string_expression` : A `String` that will be tokenized.
+ * `gram_length` : An `Int32` as the length of grams.
* `boolean_expression` : A `Boolean` value to indicate whether to generate additional grams by pre- and postfixing `string_expression` with special characters.
* Return Value:
- * An `OrderedList` of String gram tokens.
+ * An `OrderedList` of String gram tokens.
* Example:
@@ -1218,11 +1400,11 @@
* Returns a list of hashed gram tokens of `string_expression`.
* Arguments:
- * `string_expression` : A `String` that will be tokenized.
- * `gram_length` : An `Int32` as the length of grams.
- * `boolean_expression` : A `Boolean` to indicate whether to generate additional grams by pre- and postfixing `string_expression` with special characters.
+ * `string_expression` : A `String` that will be tokenized.
+ * `gram_length` : An `Int32` as the length of grams.
+ * `boolean_expression` : A `Boolean` to indicate whether to generate additional grams by pre- and postfixing `string_expression` with special characters.
* Return Value:
- * An `OrderedList` of `Int32` hashed gram tokens.
+ * An `OrderedList` of `Int32` hashed gram tokens.
* Example:
@@ -1255,11 +1437,11 @@
* Returns a list of hashed gram tokens of `string_expression`. The hashing mechanism gives duplicate tokens different hash values, based on the occurrence count of that token.
* Arguments:
- * `string_expression` : A `String` that will be tokenized.
- * `gram_length` : An `Int32`, length of grams to generate.
- * `boolean_expression` : A `Boolean`, whether to generate additional grams by pre- and postfixing `string_expression` with special characters.
+ * `string_expression` : A `String` that will be tokenized.
+ * `gram_length` : An `Int32`, length of grams to generate.
+ * `boolean_expression` : A `Boolean`, whether to generate additional grams by pre- and postfixing `string_expression` with special characters.
* Return Value:
- * An `OrderedList` of `Int32` hashed gram tokens.
+ * An `OrderedList` of `Int32` hashed gram tokens.
* Example:
@@ -1285,18 +1467,18 @@
}
-->
-## Temporal Functions ##
+## <a id="TemporalFunctions">Temporal Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
### date ###
* Syntax:
date(string_expression)
- * Constructor function for `date` type by parsing a date string `string_expression`
+ * Constructor function for `date` type by parsing a date string `string_expression`.
* Arguments:
- * `string_expression` : The `string` value representing a date value.
+ * `string_expression` : The `string` value representing a date value.
* Return Value:
- * A `date` value represented by the given string.
+ * A `date` value represented by the given string.
* Example:
@@ -1319,11 +1501,11 @@
time(string_expression)
- * Constructor function for `time` type by parsing a time string `string_expression`
+ * Constructor function for `time` type by parsing a time string `string_expression`.
* Arguments:
- * `string_expression` : The `string` value representing a time value.
+ * `string_expression` : The `string` value representing a time value.
* Return Value:
- * A `time` value represented by the given string.
+ * A `time` value represented by the given string.
* Example:
@@ -1346,11 +1528,11 @@
datetime(string_expression)
- * Constructor function for `datetime` type by parsing a datetime string `string_expression`
+ * Constructor function for the `datetime` type by parsing a datetime string `string_expression`.
* Arguments:
- * `string_expression` : The `string` value representing a datetime value.
+ * `string_expression` : The `string` value representing a datetime value.
* Return Value:
- * A `datetime` value represented by the given string.
+ * A `datetime` value represented by the given string.
* Example:
@@ -1373,12 +1555,12 @@
interval-from-date(string_expression1, string_expression2)
- * Constructor function for `interval` type by parsing two date strings.
+ * Constructor function for the `interval` type by parsing two date strings.
* Arguments:
- * `string_expression1` : The `string` value representing the starting date.
- * `string_expression2` : The `string` value representing the ending date.
+ * `string_expression1` : The `string` value representing the starting date.
+ * `string_expression2` : The `string` value representing the ending date.
* Return Value:
- * An `interval` value between the two dates.
+ * An `interval` value between the two dates.
* Example:
@@ -1395,12 +1577,12 @@
interval-from-time(string_expression1, string_expression2)
- * Constructor function for `interval` type by parsing two time strings.
+ * Constructor function for the `interval` type by parsing two time strings.
* Arguments:
- * `string_expression1` : The `string` value representing the starting time.
- * `string_expression2` : The `string` value representing the ending time.
+ * `string_expression1` : The `string` value representing the starting time.
+ * `string_expression2` : The `string` value representing the ending time.
* Return Value:
- * An `interval` value between the two times.
+ * An `interval` value between the two times.
* Example:
@@ -1419,10 +1601,10 @@
* Constructor function for `interval` type by parsing two datetime strings.
* Arguments:
- * `string_expression1` : The `string` value representing the starting datetime.
- * `string_expression2` : The `string` value representing the ending datetime.
+ * `string_expression1` : The `string` value representing the starting datetime.
+ * `string_expression2` : The `string` value representing the ending datetime.
* Return Value:
- * An `interval` value between the two datetimes.
+ * An `interval` value between the two datetimes.
* Example:
@@ -1441,9 +1623,9 @@
* Accessors for accessing fields in a temporal value
* Arguments:
- * `temporal_expression` : a temporal value represented as one of the following types: `date`, `datetime`, `time`, `duration`.
+ * `temporal_expression` : a temporal value represented as one of the following types: `date`, `datetime`, `time`, and `duration`.
* Return Value:
- * An `int32` value representing the field to be extracted.
+ * An `int32` value representing the field to be extracted.
* Example:
@@ -1460,121 +1642,23 @@
{ "year": 2010, "month": 11, "day": 30, "hour": 5, "min": 28, "second": 23, "ms": 94 }
-
-### add-date-duration ###
- * Syntax:
-
- add-date-duration(date_expression, duration_expression)
-
- * Create a new date by adding the duration `duration_expression` to the given date `date_expression`.
- * Arguments:
- * `date_expression` : The `date` value to be added onto.
- * `duration_expression` : The `duration` to be added.
- * Return Value:
- * A `date` value represents the new date after being adjusted by the duration.
-
- * Example:
-
- use dataverse TinySocial;
-
- let $startdate := date('2011-03-01')
- for $i in dataset('TweetMessage')
- where date-from-datetime($i.send-time) > $startdate
- and date-from-datetime($i.send-time) < add-date-duration($startdate, duration('P2Y'))
- return {"send-time": $i.send-time, "message": $i.message-text}
-
-
- * The expected result is:
-
- { "send-time": datetime("2011-12-26T10:10:00.000Z"), "message": " like sprint the voice-command is mind-blowing:)" }
- { "send-time": datetime("2011-08-25T10:10:00.000Z"), "message": " like samsung the platform is good" }
- { "send-time": datetime("2012-07-21T10:10:00.000Z"), "message": " love verizon its voicemail-service is awesome" }
-
-
-### add-datetime-duration ###
- * Syntax:
-
- add-date-duration(datetime_expression, duration_expression)
-
- * Create a new datetime by adding the duration `duration_expression` to the given datetime `datetime_expression`.
- * Arguments:
- * `datetime_expression` : The `datetime` value to be added onto.
- * `duration_expression` : The `duration` to be added.
- * Return Value:
- * A `datetime` value represents the new datetime after being adjusted by the duration.
-
- * Example:
-
- use dataverse TinySocial;
-
- let $startdt := datetime('2011-03-01T00:00:00')
- for $i in dataset('TweetMessage')
- where $i.send-time > $startdt and $i.send-time < add-datetime-duration($startdt, duration('P2Y'))
- return {"send-time": $i.send-time, "message": $i.message-text}
-
-
- * The expected result is:
-
- { "send-time": datetime("2011-12-26T10:10:00.000Z"), "message": " like sprint the voice-command is mind-blowing:)" }
- { "send-time": datetime("2011-08-25T10:10:00.000Z"), "message": " like samsung the platform is good" }
- { "send-time": datetime("2012-07-21T10:10:00.000Z"), "message": " love verizon its voicemail-service is awesome" }
-
-
-### add-time-duration ###
- * Syntax:
-
- add-time-duration(time_expression, duration_expression)
-
- * Create a new time by adding the duration `duration_expression` to the given time `time_expression`.
- * Arguments:
- * `time_expression` : The `time` value to be added onto.
- * `duration_expression` : The `duration` to be added.
- * Return Value:
- * A `time` value represents the new time after being adjusted by the duration.
-
- * Example:
-
- use dataverse TinySocial;
-
- let $starttime := time('08:00:00')
- for $i in dataset('TweetMessage')
- where time-from-datetime($i.send-time) > $starttime and time-from-datetime($i.send-time) < add-time-duration($starttime, duration('PT5H'))
- return {"send-time": $i.send-time, "message": $i.message-text}
-
-
- * The expected result is:
-
- { "send-time": datetime("2008-04-26T10:10:00.000Z"), "message": " love t-mobile its customization is good:)" }
- { "send-time": datetime("2010-05-13T10:10:00.000Z"), "message": " like verizon its shortcut-menu is awesome:)" }
- { "send-time": datetime("2006-11-04T10:10:00.000Z"), "message": " like motorola the speed is good:)" }
- { "send-time": datetime("2011-12-26T10:10:00.000Z"), "message": " like sprint the voice-command is mind-blowing:)" }
- { "send-time": datetime("2006-08-04T10:10:00.000Z"), "message": " can't stand motorola its speed is terrible:(" }
- { "send-time": datetime("2010-05-07T10:10:00.000Z"), "message": " like iphone the voice-clarity is good:)" }
- { "send-time": datetime("2011-08-25T10:10:00.000Z"), "message": " like samsung the platform is good" }
- { "send-time": datetime("2005-10-14T10:10:00.000Z"), "message": " like t-mobile the shortcut-menu is awesome:)" }
- { "send-time": datetime("2012-07-21T10:10:00.000Z"), "message": " love verizon its voicemail-service is awesome" }
- { "send-time": datetime("2008-01-26T10:10:00.000Z"), "message": " hate verizon its voice-clarity is OMG:(" }
- { "send-time": datetime("2008-03-09T10:10:00.000Z"), "message": " can't stand iphone its platform is terrible" }
- { "send-time": datetime("2010-02-13T10:10:00.000Z"), "message": " like samsung the voice-command is amazing:)" }
-
-
### adjust-datetime-for-timezone ###
* Syntax:
adjust-datetime-for-timezone(datetime_expression, string_expression)
- * Adjust the given datetime `datetime_expression` by applying the timezone information `string_expression`
+ * Adjusts the given datetime `datetime_expression` by applying the timezone information `string_expression`.
* Arguments:
- * `datetime_expression` : A `datetime` value to be adjusted.
- * `string_expression` : A `string` representing the timezone information.
+ * `datetime_expression` : A `datetime` value to be adjusted.
+ * `string_expression` : A `string` representing the timezone information.
* Return Value:
- * A `string` value represents the new datetime after being adjusted by the timezone information.
+ * A `string` value representing the new datetime after being adjusted by the timezone information.
* Example:
use dataverse TinySocial;
- for $i in dataset('TweetMessage')
+ for $i in dataset('TweetMessages')
return {"adjusted-send-time": adjust-datetime-for-timezone($i.send-time, "+08:00"), "message": $i.message-text}
@@ -1599,18 +1683,18 @@
adjust-time-for-timezone(time_expression, string_expression)
- * Adjust the given time `time_expression` by applying the timezone information `string_expression`
+ * Adjusts the given time `time_expression` by applying the timezone information `string_expression`.
* Arguments:
- * `time_expression` : A `time` value to be adjusted.
- * `string_expression` : A `string` representing the timezone information.
+ * `time_expression` : A `time` value to be adjusted.
+ * `string_expression` : A `string` representing the timezone information.
* Return Value:
- * A `string` value represents the new time after being adjusted by the timezone information.
+ * A `string` value representing the new time after being adjusted by the timezone information.
* Example:
use dataverse TinySocial;
- for $i in dataset('TweetMessage')
+ for $i in dataset('TweetMessages')
return {"adjusted-send-time": adjust-time-for-timezone(time-from-datetime($i.send-time), "+08:00"), "message": $i.message-text}
@@ -1635,18 +1719,18 @@
calendar-duration-from-datetime(datetime_expression, duration_expression)
- * Get a user-friendly representation of the duration `duration_expression` based on the given datetime `datetime_expression`
+ * Gets a user-friendly representation of the duration `duration_expression` based on the given datetime `datetime_expression`.
* Arguments:
- * `datetime_expression` : A `datetime` value to be used as the reference time point.
- * `duration_expression` : A `duration` value to be converted
+ * `datetime_expression` : A `datetime` value to be used as the reference time point.
+ * `duration_expression` : A `duration` value to be converted.
* Return Value:
- * A `duration` value with the duration as `duration_expression` but with a user-friendly representation.
+ * A `duration` value with the duration as `duration_expression` but with a user-friendly representation.
* Example:
use dataverse TinySocial;
- for $i in dataset('TweetMessage')
+ for $i in dataset('TweetMessages')
where $i.send-time > datetime("2011-01-01T00:00:00")
return {"since-2011": subtract-datetime($i.send-time, datetime("2011-01-01T00:00:00")), "since-2011-user-friendly": calendar-duration-from-datetime($i.send-time, subtract-datetime($i.send-time, datetime("2011-01-01T00:00:00")))}
@@ -1663,18 +1747,18 @@
calendar-duration-from-date(date_expression, duration_expression)
- * Get a user-friendly representation of the duration `duration_expression` based on the given date `date_expression`
+ * Gets a user-friendly representation of the duration `duration_expression` based on the given date `date_expression`.
* Arguments:
- * `date_expression` : A `date` value to be used as the reference time point.
- * `duration_expression` : A `duration` value to be converted
+ * `date_expression` : A `date` value to be used as the reference time point.
+ * `duration_expression` : A `duration` value to be converted.
* Return Value:
- * A `duration` value with the duration as `duration_expression` but with a user-friendly representation.
+ * A `duration` value with the duration as `duration_expression` but with a user-friendly representation.
* Example:
use dataverse TinySocial;
- for $i in dataset('TweetMessage')
+ for $i in dataset('TweetMessages')
where $i.send-time > datetime("2011-01-01T00:00:00")
return {"since-2011": subtract-datetime($i.send-time, datetime("2011-01-01T00:00:00")),
"since-2011-user-friendly": calendar-duration-from-date(date-from-datetime($i.send-time), subtract-datetime($i.send-time, datetime("2011-01-01T00:00:00")))}
@@ -1692,10 +1776,10 @@
current-date()
- * Get the current date
- * Arguments:None
+ * Gets the current date.
+ * Arguments: None
* Return Value:
- * A `date` value of the date when the function is called.
+ * A `date` value of the date when the function is called.
### current-time ###
* Syntax:
@@ -1703,9 +1787,9 @@
current-time()
* Get the current time
- * Arguments:None
+ * Arguments: None
* Return Value:
- * A `time` value of the time when the function is called.
+ * A `time` value of the time when the function is called.
### current-datetime ###
* Syntax:
@@ -1713,9 +1797,9 @@
current-datetime()
* Get the current datetime
- * Arguments:None
+ * Arguments: None
* Return Value:
- * A `datetime` value of the datetime when the function is called.
+ * A `datetime` value of the datetime when the function is called.
* Example:
@@ -1738,11 +1822,11 @@
date-from-datetime(datetime_expression)
- * Get the date value from the given datetime value `datetime_expression`
+ * Gets the date value from the given datetime value `datetime_expression`.
* Arguments:
- * `datetime_expression`: A `datetime` value to be extracted from
+ * `datetime_expression`: A `datetime` value to be extracted from.
* Return Value:
- * A `date` value from the datetime.
+ * A `date` value from the datetime.
### time-from-datetime ###
* Syntax:
@@ -1751,15 +1835,15 @@
* Get the time value from the given datetime value `datetime_expression`
* Arguments:
- * `datetime_expression`: A `datetime` value to be extracted from
+ * `datetime_expression`: A `datetime` value to be extracted from
* Return Value:
- * A `time` value from the datetime.
+ * A `time` value from the datetime.
* Example:
use dataverse TinySocial;
- for $i in dataset('TweetMessage')
+ for $i in dataset('TweetMessages')
where $i.send-time > datetime("2011-01-01T00:00:00")
return {"send-date": date-from-datetime($i.send-time), "send-time": time-from-datetime($i.send-time)}
@@ -1776,33 +1860,33 @@
date-from-unix-time-in-days(numeric_expression)
- * Get date representing the time after `numeric_expression` days since 1970-01-01
+ * Gets a date representing the time after `numeric_expression` days since 1970-01-01.
* Arguments:
- * `numeric_expression`: A `int8`/`int16`/`int32` value representing the number of days
+ * `numeric_expression`: A `int8`/`int16`/`int32` value representing the number of days.
* Return Value:
- * A `date` value as the time after `numeric_expression` days since 1970-01-01
+ * A `date` value as the time after `numeric_expression` days since 1970-01-01.
### datetime-from-unix-time-in-ms ###
* Syntax:
datetime-from-unix-time-in-ms(numeric_expression)
- * Get datetime representing the time after `numeric_expression` milliseconds since 1970-01-01T00:00:00Z
+ * Gets a datetime representing the time after `numeric_expression` milliseconds since 1970-01-01T00:00:00Z.
* Arguments:
- * `numeric_expression`: A `int8`/`int16`/`int32`/`int64` value representing the number of milliseconds
+ * `numeric_expression`: A `int8`/`int16`/`int32`/`int64` value representing the number of milliseconds.
* Return Value:
- * A `datetime` value as the time after `numeric_expression` milliseconds since 1970-01-01T00:00:00Z
+ * A `datetime` value as the time after `numeric_expression` milliseconds since 1970-01-01T00:00:00Z.
### time-from-unix-time-in-ms ###
* Syntax:
time-from-unix-time-in-ms(numeric_expression)
- * Get time representing the time after `numeric_expression` milliseconds since 00:00:00.000Z
+ * Gets a time representing the time after `numeric_expression` milliseconds since 00:00:00.000Z.
* Arguments:
- * `numeric_expression`: A `int8`/`int16`/`int32` value representing the number of milliseconds
+ * `numeric_expression`: A `int8`/`int16`/`int32` value representing the number of milliseconds.
* Return Value:
- * A `time` value as the time after `numeric_expression` milliseconds since 00:00:00.000Z
+ * A `time` value as the time after `numeric_expression` milliseconds since 00:00:00.000Z.
* Example:
@@ -1818,7 +1902,6 @@
{ "date": date("2013-04-05"), "datetime": datetime("2013-04-05T05:28:20.000Z"), "time": time("00:00:03.748Z") }
-
### subtract-date ###
* Syntax:
@@ -1826,10 +1909,10 @@
* Get the duration between two dates `date_start` and `date_end`
* Arguments:
- * `date_start`: the starting `date`
- * `date_end`: the ending `date`
+ * `date_start`: the starting `date`
+ * `date_end`: the ending `date`
* Return Value:
- * A `duration` value between `date_start` and `date_end`
+ * A `duration` value between `date_start` and `date_end`
* Example:
@@ -1855,10 +1938,10 @@
* Get the duration between two times `time_start` and `time_end`
* Arguments:
- * `time_start`: the starting `time`
- * `time_end`: the ending `time`
+ * `time_start`: the starting `time`
+ * `time_end`: the ending `time`
* Return Value:
- * A `duration` value between `time_start` and `time_end`
+ * A `duration` value between `time_start` and `time_end`
* Example:
@@ -1884,10 +1967,10 @@
* Get the duration between two datetimes `datetime_start` and `datetime_end`
* Arguments:
- * `datetime_start`: the starting `datetime`
- * `datetime_end`: the ending `datetime`
+ * `datetime_start`: the starting `datetime`
+ * `datetime_end`: the ending `datetime`
* Return Value:
- * A `duration` value between `datetime_start` and `datetime_end`
+ * A `duration` value between `datetime_start` and `datetime_end`
* Example:
@@ -1908,16 +1991,39 @@
{ "id1": 3, "id2": 7, "diff": duration("P28D") }
{ "id1": 7, "id2": 1, "diff": duration("P13D") }
+### interval-start-from-date/time/datetime ###
+ * Syntax:
+
+ interval-start-from-date/time/datetime(date/time/datetime, duration)
+
+ * Construct an `interval` value by the given starting `date`/`time`/`datetime` and the `duration` that the interval lasts.
+ * Arguments:
+ * `date/time/datetime`: a `string` representing a `date`, `time` or `datetime`, or a `date`/`time`/`datetime` value, representing the starting time point.
+ * `duration`: a `string` or `duration` value representing the duration of the interval. Note that duration cannot be negative value.
+ * Return Value:
+ * An `interval` value representing the interval starting from the given time point with the length of duration.
+
+ * Example:
+
+ let $itv1 := interval-start-from-date("1984-01-01", "P1Y")
+ let $itv2 := interval-start-from-time(time("02:23:28.394"), "PT3H24M")
+ let $itv3 := interval-start-from-datetime("1999-09-09T09:09:09.999", duration("P2M30D"))
+ return {"interval1": $itv1, "interval2": $itv2, "interval3": $itv3}
+
+ * The expectecd result is:
+
+ { "interval1": interval-date("1984-01-01, 1985-01-01"), "interval2": interval-time("02:23:28.394Z, 05:47:28.394Z"), "interval3": interval-datetime("1999-09-09T09:09:09.999Z, 1999-12-09T09:09:09.999Z") }
+
### get-interval-start, get-interval-end ###
* Syntax:
get-interval-start/get-interval-end(interval)
- * Get the start/end of the given interval
+ * Gets the start/end of the given interval.
* Arguments:
- * `interval`: the interval to be accessed
+ * `interval`: the interval to be accessed.
* Return Value:
- * A `time`, `date` or `datetime` (depending on the time instances of the interval) representing the starting or ending time.
+ * A `time`, `date`, or `datetime` (depending on the time instances of the interval) representing the starting or ending time.
* Example:
@@ -1928,3 +2034,65 @@
* The expected result is:
{ "start": date("1984-01-01"), "end": date("1985-01-01") }
+
+### interval-bin ###
+ * Syntax:
+
+ interval-bin(time-to-bin, time-bin-anchor, duration-bin-size)
+
+ * Return the `interval` value representing the bin containing the `time-to-bin` value.
+ * Arguments:
+ * `time-to-bin`: a date/time/datetime value representing the time to be binned.
+ * `time-bin-anchor`: a date/time/datetime value representing an anchor of a bin starts. The type of this argument should be the same as the first `time-to-bin` argument.
+ * `duration-bin-size`: the duration value representing the size of the bin, in the type of `year-month-duration` or `day-time-duration` or `null`. The sub-duration type must be compatible to the arithmetic operations between the type of "time_to_bin" and the sub-duration type must be defined. Specifically, one of the following arithmetic operations should be used:
+ * `datetime` +|- `year-month-duration`
+ * `datetime` +|- `day-time-duration`
+ * `date` +|- `year-month-duration`
+ * `date` +|- `day-time-duration`
+ * `time` +|- `day-time-duration`
+ * Return Value:
+ * A `interval` value representing the bin containing the `time-to-bin` value. Note that the internal type of this interval value should be the same as the `time-to-bin` type.
+
+ * Example:
+
+ let $c1 := date("2010-10-30")
+ let $c2 := datetime("-1987-11-19T23:49:23.938")
+ let $c3 := time("12:23:34.930+07:00")
+
+ return { "bin1": interval-bin($c1, date("1990-01-01"), year-month-duration("P1Y")),
+ "bin2": interval-bin($c2, datetime("1990-01-01T00:00:00.000Z"), year-month-duration("P6M")),
+ "bin3": interval-bin($c3, time("00:00:00"), day-time-duration("PD1M")),
+ "bin4": interval-bin($c2, datetime("2013-01-01T00:00:00.000"), day-time-duration("PT24H"))
+ }
+
+ * The expected result is:
+
+ { "bin1": interval-date("2010-01-01, 2011-01-01"),
+ "bin2": interval-datetime("-1987-07-01T00:00:00.000Z, -1986-01-01T00:00:00.000Z"),
+ "bin3": interval-time("05:23:00.000Z, 05:24:00.000Z"),
+ "bin4": interval-datetime("-1987-11-19T00:00:00.000Z, -1987-11-20T00:00:00.000Z")}
+
+## <a id="OtherFunctions">Other Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
+
+### is-null ###
+ * Syntax:
+
+ is-null(var)
+
+ * Checks whether the given variable is a `null` value.
+ * Arguments:
+ * `var` : A variable (any type is allowed).
+ * Return Value:
+ * A `boolean` on whether the variable is a `null` or not.
+
+ * Example:
+
+ for $m in ['hello', 'world', null]
+ where not(is-null($m))
+ return $m
+
+
+ * The expected result is:
+
+ "hello"
+ "world"
diff --git a/asterix-doc/src/site/markdown/aql/manual.md b/asterix-doc/src/site/markdown/aql/manual.md
index f1c3fbd..882d331 100644
--- a/asterix-doc/src/site/markdown/aql/manual.md
+++ b/asterix-doc/src/site/markdown/aql/manual.md
@@ -1,5 +1,12 @@
# The Asterix Query Language, Version 1.0
-## 1. Introduction
+
+## <a id="toc">Table of Contents</a> ##
+
+* [1. Introduction](#Introduction)
+* [2. Expressions](#Expressions)
+* [3. Statements](#Statements)
+
+## <a id="Introduction">1. Introduction</a><font size="4"> <a href="#toc">[Back to TOC]</a></font>
This document is intended as a reference guide to the full syntax
and semantics of the Asterix Query Language (AQL), the language for talking to AsterixDB.
@@ -14,7 +21,7 @@
We list and briefly explain each of the productions in the AQL grammar, offering
examples for clarity in cases where doing so seems needed or helpful.
-## 2. Expressions
+## <a id="Expressions">2. Expressions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font>
Query ::= Expression
@@ -53,14 +60,25 @@
#### Literals
- Literal ::= StringLiteral
- | <INTEGER_LITERAL>
- | <FLOAT_LITERAL>
- | <DOUBLE_LITERAL>
- | "null"
- | "true"
- | "false"
- StringLiteral ::= <STRING_LITERAL>
+ Literal ::= StringLiteral
+ | IntegerLiteral
+ | FloatLiteral
+ | DoubleLiteral
+ | "null"
+ | "true"
+ | "false"
+ StringLiteral ::= ("\"" (<ESCAPE_QUOT> | ~["\""])* "\"")
+ | ("\'" (<ESCAPE_APOS> | ~["\'"])* "\'")
+ <ESCAPE_QUOT> ::= "\\\""
+ <ESCAPE_APOS> ::= "\\\'"
+ IntegerLiteral ::= <DIGITS>
+ <DIGITS> ::= ["0" - "9"]+
+ FloatLiteral ::= <DIGITS> ( "f" | "F" )
+ | <DIGITS> ( "." <DIGITS> ( "f" | "F" ) )?
+ | "." <DIGITS> ( "f" | "F" )
+ DoubleLiteral ::= <DIGITS>
+ | <DIGITS> ( "." <DIGITS> )?
+ | "." <DIGITS>
Literals (constants) in AQL can be strings, integers, floating point values,
double values, boolean constants, or the constant value null.
@@ -78,6 +96,8 @@
#### Variable References
VariableRef ::= <VARIABLE>
+ <VARIABLE> ::= "$" <LETTER> (<LETTER> | <DIGIT> | "_")*
+ <LETTER> ::= ["A" - "Z", "a" - "z"]
A variable in AQL can be bound to any legal ADM value.
A variable reference refers to the value to which an in-scope variable is bound.
@@ -125,6 +145,8 @@
DatasetAccessExpression ::= "dataset" ( ( Identifier ( "." Identifier )? )
| ( "(" Expression ")" ) )
Identifier ::= <IDENTIFIER> | StringLiteral
+ <IDENTIFIER> ::= <LETTER> (<LETTER> | <DIGIT> | <SPECIALCHARS>)*
+ <SPECIALCHARS> ::= ["$", "_", "-"]
Querying Big Data is the main point of AsterixDB and AQL.
Data in AsterixDB reside in datasets (collections of ADM records),
@@ -133,6 +155,8 @@
Dataset access expressions are most commonly used in FLWOR expressions, where variables
are bound to their contents.
+Note that the Identifier that identifies a dataset (or any other Identifier in AQL) can also be a StringLiteral.
+This is especially useful to avoid conficts with AQL keywords (e.g. "dataset", "null", or "type").
The following are three examples of legal dataset access expressions.
The first one accesses a dataset called Customers in the dataverse called SalesDV.
@@ -179,6 +203,11 @@
"project members": {{ "vinayakb", "dtabass", "chenli" }}
}
+##### Note
+
+When constructing nested records there needs to be a space between the closing braces to avoid confusion with the `}}` token that ends an unordered list constructor:
+`{ "a" : { "b" : "c" }}` will fail to parse while `{ "a" : { "b" : "c" } }` will work.
+
### Path Expressions
ValueExpr ::= PrimaryExpr ( Field | Index )*
@@ -235,7 +264,7 @@
### Arithmetic Expressions
AddExpr ::= MultExpr ( ( "+" | "-" ) MultExpr )*
- MultExpr ::= UnaryExpr ( ( "*" | "/" | "%" | <CARET> | "idiv" ) UnaryExpr )*
+ MultExpr ::= UnaryExpr ( ( "*" | "/" | "%" | "^"| "idiv" ) UnaryExpr )*
UnaryExpr ::= ( ( "+" | "-" ) )? ValueExpr
AQL also supports the usual cast of characters for arithmetic expressions.
@@ -406,7 +435,7 @@
every $x in [ 1, 2, 3 ] satisfies $x < 3
some $x in [ 1, 2, 3 ] satisfies $x < 3
-## 3. Statements
+## <a id="Statements">3. Statements</a> <font size="4"><a href="#toc">[Back to TOC]</a></font>
Statement ::= ( SingleStatement ( ";" )? )* <EOF>
SingleStatement ::= DataverseDeclaration
@@ -523,12 +552,12 @@
##### Example
create type FacebookUserType as closed {
- id: int32,
- alias: string,
- name: string,
- user-since: datetime,
- friend-ids: {{ int32 }},
- employment: [ EmploymentType ]
+ "id" : int32,
+ "alias" : string,
+ "name" : string,
+ "user-since" : datetime,
+ "friend-ids" : {{ int32 }},
+ "employment" : [ EmploymentType ]
}
#### Datasets
@@ -541,8 +570,8 @@
Configuration ::= "(" ( KeyValuePair ( "," KeyValuePair )* )? ")"
KeyValuePair ::= "(" StringLiteral "=" StringLiteral ")"
Properties ::= ( "(" Property ( "," Property )* ")" )?
- Property ::= Identifier "=" ( StringLiteral | <INTEGER_LITERAL> )
- FunctionSignature ::= FunctionOrTypeName "@" <INTEGER_LITERAL>
+ Property ::= Identifier "=" ( StringLiteral | IntegerLiteral )
+ FunctionSignature ::= FunctionOrTypeName "@" IntegerLiteral
PrimaryKey ::= "primary" "key" Identifier ( "," Identifier )*
The create dataset statement is used to create a new dataset.
@@ -552,7 +581,7 @@
An Internal dataset (the default) is a dataset that is stored in and managed by AsterixDB.
It must have a specified unique primary key that can be used to partition data across nodes of an AsterixDB cluster.
The primary key is also used in secondary indexes to uniquely identify the indexed primary data records.
-An External dataset is stored outside of AsterixDB, e.g., in HDFS or in the local filesystem(s) of the cluster's nodes.
+An External dataset is stored outside of AsterixDB (currently datasets in HDFS or on the local filesystem(s) of the cluster's nodes are supported).
External dataset support allows AQL queries to treat external data as though it were stored in AsterixDB,
making it possible to query "legacy" file data (e.g., Hive data) without having to physically import it into AsterixDB.
For an external dataset, an appropriate adaptor must be selected to handle the nature of the desired external data.
@@ -565,14 +594,15 @@
create internal dataset FacebookUsers(FacebookUserType) primary key id;
The next example creates an external dataset for storing LineitemType records.
-The choice of the `localfs` adaptor means that its data will reside in the local filesystem of the cluster nodes.
-The create statement provides several parameters used by the localfs adaptor;
-e.g., the file format is delimited text with vertical bar being the field delimiter.
+The choice of the `hdfs` adaptor means that its data will reside in HDFS.
+The create statement provides parameters used by the hdfs adaptor:
+the URL and path needed to locate the data in HDFS and a description of the data format.
##### Example
-
- create external dataset Lineitem(LineitemType) using localfs (
- ("path"="127.0.0.1://SOURCE_PATH"),
+ create external dataset Lineitem('LineitemType) using hdfs (
+ ("hdfs"="hdfs://HOST:PORT"),
+ ("path"="HDFS_PATH"),
+ ("input-format"="text-input-format"),
("format"="delimited-text"),
("delimiter"="|"));
@@ -583,7 +613,7 @@
IndexType ::= "btree"
| "rtree"
| "keyword"
- | "ngram" "(" <INTEGER_LITERAL> ")"
+ | "ngram" "(" IntegerLiteral ")"
The create index statement creates a secondary index on one or more fields of a specified dataset.
Supported index types include `btree` for totally ordered datatypes,
@@ -672,6 +702,7 @@
The load statement is used to initially populate a dataset via bulk loading of data from an external file.
An appropriate adaptor must be selected to handle the nature of the desired external data.
+The load statement accepts the same adaptors and the same parameters as external datasets.
(See the [guide to external data](externaldata.html) for more information on the available adaptors.)
The following example shows how to bulk load the FacebookUsers dataset from an external file containing
@@ -730,5 +761,6 @@
for $praise in {{ "great", "brilliant", "awesome" }}
return
- string-concat(["AsterixDB is ", $praise]
+ string-concat(["AsterixDB is ", $praise])
+
diff --git a/asterix-doc/src/site/markdown/aql/primer.md b/asterix-doc/src/site/markdown/aql/primer.md
index fc8ea9a..92fcb01 100644
--- a/asterix-doc/src/site/markdown/aql/primer.md
+++ b/asterix-doc/src/site/markdown/aql/primer.md
@@ -346,11 +346,9 @@
In this section we introduce AQL via a set of example queries, along with their expected results,
based on the data above, to help you get started.
Many of the most important features of AQL are presented in this set of representative queries.
-You can find a BNF description of the current AQL grammar at [wiki:AsterixDBGrammar], and someday
-in the not-too-distant future we will also provide a complete reference manual for the language.
-In the meantime, this will get you started down the path of using AsterixDB.
-A more complete list of the supported AsterixDB primitive types and built-in functions can be
-found at [Asterix Data Model (ADM)](datamodel.html) and [Asterix Functions](functions.html).
+You can find more details in the document on the [Asterix Data Model (ADM)](datamodel.html),
+in the [AQL Reference Manual](manual.html), and a complete list of built-in functions is available
+in the [Asterix Functions](functions.html) document.
AQL is an expression language.
Even the expression 1+1 is a valid AQL query that evaluates to 2.
diff --git a/asterix-doc/src/site/markdown/aql/similarity.md b/asterix-doc/src/site/markdown/aql/similarity.md
index 244103c..12cfb10 100644
--- a/asterix-doc/src/site/markdown/aql/similarity.md
+++ b/asterix-doc/src/site/markdown/aql/similarity.md
@@ -1,7 +1,15 @@
# AsterixDB Support of Similarity Queries #
-## Motivation ##
+## <a id="toc">Table of Contents</a> ##
+
+* [Motivation](#Motivation)
+* [Data Types and Similarity Functions](#DataTypesAndSimilarityFunctions)
+* [Similarity Selection Queries](#SimilaritySelectionQueries)
+* [Similarity Join Queries](#SimilarityJoinQueries)
+* [Using Indexes to Support Similarity Queries](#UsingIndexesToSupportSimilarityQueries)
+
+## <a id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
Similarity queries are widely used in applications where users need to
find records that satisfy a similarity predicate, while exact matching
@@ -14,7 +22,7 @@
users who have similar friends. To meet this type of needs, AsterixDB
supports similarity queries using efficient indexes and algorithms.
-## Data Types and Similarity Functions ##
+## <a id="DataTypesAndSimilarityFunctions">Data Types and Similarity Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
AsterixDB supports [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) (on strings) and
[Jaccard](http://en.wikipedia.org/wiki/Jaccard_index) (on sets). For
@@ -33,7 +41,7 @@
to convert strings to sets, and the
[similarity functions](functions.html#Similarity_Functions).
-## Similarity Selection Queries ##
+## <a id="SimilaritySelectionQueries">Similarity Selection Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
The following [query](functions.html#edit-distance)
asks for all the Facebook users whose name is similar to
@@ -78,7 +86,7 @@
using `simfunction` and then specify the threshold `0.6f` using
`simthreshold`.
-## Similarity Join Queries ##
+## <a id="SimilarityJoinQueries">Similarity Join Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
AsterixDB supports fuzzy joins between two sets. The following
[query](primer.html#Query_5_-_Fuzzy_Join)
@@ -103,7 +111,7 @@
}
};
-## Using Indexes to Support Similarity Queries ##
+## <a id="UsingIndexesToSupportSimilarityQueries">Using Indexes to Support Similarity Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
AsterixDB uses two types of indexes to support similarity queries, namely
"ngram index" and "keyword index".
diff --git a/asterix-doc/src/site/markdown/index.md b/asterix-doc/src/site/markdown/index.md
index 4ee2a5f..c435457 100644
--- a/asterix-doc/src/site/markdown/index.md
+++ b/asterix-doc/src/site/markdown/index.md
@@ -1,19 +1,12 @@
# AsterixDB: A Big Data Management System #
-## What Is AsterixDB? ##
+## <a id="toc">Table of Contents</a> ##
+* [What Is AsterixDB?](#WhatIsAsterixDB)
+* [Getting and Using AsterixDB](#GettingAndUsingAsterixDB)
-Welcome to the new home of the AsterixDB Big Data Management System (BDMS).
-The AsterixDB BDMS is the result of about 3.5 years of R&D involving researchers at UC Irvine, UC Riverside, and UC San Diego.
-The AsterixDB code base now consists of roughly 250K lines of Java code that has been co-developed at UC Irvine and UC Riverside.
+## <a id="WhatIsAsterixDB">What Is AsterixDB?</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
-Initiated in 2009, the NSF-sponsored ASTERIX project has been developing new technologies for ingesting, storing, managing, indexing, querying, and analyzing vast quantities of semi-structured information.
-The project has been combining ideas from three distinct areas---semi-structured data, parallel databases, and data-intensive computing (a.k.a. today's Big Data platforms)---in order to create a next-generation, open-source software platform that scales by running on large, shared-nothing commodity computing clusters.
-The ASTERIX effort has been targeting a wide range of semi-structured information, ranging from "data" use cases---where information is well-typed and highly regular---to "content" use cases---where data tends to be irregular, much of each datum may be textual, and the ultimate schema for the various data types involved may be hard to anticipate up front.
-The ASTERIX project has been addressing technical issues including highly scalable data storage and indexing, semi-structured query processing on very large clusters, and merging time-tested parallel database techniques with modern data-intensive computing techniques to support performant yet declarative solutions to the problem of storing and analyzing semi-structured information effectively.
-The first fruits of this labor have been captured in the AsterixDB system that is now being released in preliminary or "Beta" release form.
-We are hoping that the arrival of AsterixDB will mark the beginning of the "BDMS era", and we hope that both the Big Data community and the database community will find the AsterixDB system to be interesting and useful for a much broader class of problems than can be addressed with any one of today's current Big Data platforms and related technologies (e.g., Hadoop, Pig, Hive, HBase, MongoDB, and so on). One of our project mottos has been "one size fits a bunch"---at least that has been our aim. For more information about the research effort that led to the birth of AsterixDB, please refer to our NSF project web site: [http://asterix.ics.uci.edu/](http://asterix.ics.uci.edu/).
-
-In a nutshell, AsterixDB is a full-function BDMS with a rich feature set that distinguishes it from pretty much any other Big Data platform that's out and available today. We believe that its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. AsterixDB has:
+In a nutshell, AsterixDB is a full-function BDMS (Big Data Management System) with a rich feature set that distinguishes it from pretty much any other Big Data platform that's out and available today. We believe that its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. AsterixDB has:
* A semistructured NoSQL style data model (ADM) resulting from extending JSON with object database ideas
* An expressive and declarative query language (AQL) that supports a broad range of queries and analysis over semistructured data
@@ -25,16 +18,14 @@
* Support for fuzzy and spatial queries as well as for more traditional parametric queries
* Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store
-## Getting and Using AsterixDB ##
+## <a id="GettingAndUsingAsterixDB">Getting and Using AsterixDB</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
You are most likely here because you are interested in getting your hands on AsterixDB---so you would like to know how to get it, how to set it up, and how to use it.
-Someday our plan is to have comprehensive documentation for AsterixDB and its data model (ADM) and query language (AQL) here on this wiki.
-For the Beta release, we've got a start; for the Beta release a month or so from now, we will hopefully have much more.
-The following is a list of the wiki pages and supporting documents that we have available today:
+The following is a list of the supporting documents that we have available today:
1. [Installing AsterixDB using Managix](install.html) :
This is our installation guide, and it is where you should start.
-This document will tell you how to obtain, install, and manage instances of [AsterixDB](https://asterixdb.googlecode.com/files/asterix-installer-0.0.4-binary-assembly.zip), including both single-machine setup (for developers) as well as cluster installations (for deployment in its intended form).
+This document will tell you how to obtain, install, and manage instances of [AsterixDB](http://asterixdb.ics.uci.edu/download/asterix-installer-0.8.3-binary-assembly.zip), including both single-machine setup (for developers) as well as cluster installations (for deployment in its intended form).
2. [AsterixDB 101: An ADM and AQL Primer](aql/primer.html) :
This is a first-timers introduction to the user model of the AsterixDB BDMS, by which we mean the view of AsterixDB as seen from the perspective of an "average user" or Big Data application developer.
@@ -42,7 +33,7 @@
This document presents a tiny "social data warehousing" example and uses it as a backdrop for describing, by example, the key features of AsterixDB.
By working through this document, you will learn how to define the artifacts needed to manage data in AsterixDB, how to load data into the system, how to use most of the basic features of its query language, and how to insert and delete data dynamically.
-3. [Asterix Data Model (ADM)](aql/datamodel.html), [Asterix Functions](aql/functions.html), and [Asterix Query Language (AQL)](aql/manual.html) :
+3. [Asterix Data Model (ADM)](aql/datamodel.html), [Asterix Functions](aql/functions.html), [Asterix functions for Allen's Relations](aql/allens.html), and [Asterix Query Language (AQL)](aql/manual.html) :
These are reference documents that catalog the primitive data types and built-in functions available in AQL and the reference manual for AQL itself.
5. [REST API to AsterixDB](api.html) :
diff --git a/asterix-doc/src/site/markdown/install.md b/asterix-doc/src/site/markdown/install.md
index 9ba8ea3..4b90352 100644
--- a/asterix-doc/src/site/markdown/install.md
+++ b/asterix-doc/src/site/markdown/install.md
@@ -1,7 +1,17 @@
# Introduction #
-This is a quickstart guide for getting ASTERIX running in a distributed environment. This guide also introduces the ASTERIX installer (nicknamed _*Managix*_) and describes how it can be used to create/manage an ASTERIX instance. By following the simple steps described in this guide, you will get a running instance of ASTERIX. You shall be able to use ASTERIX from its Web interface and manage its lifecycle using Managix. This document assumes that you are running some version of _*Linux*_ or _*MacOS X*_.
-## Prerequisites for Installing ASTERIX ##
+## <a id="toc">Table of Contents</a> ##
+
+* [Prerequisites for Installing AsterixDB](#PrerequisitesForInstallingAsterixDB)
+* [Section 1: Single-Machine AsterixDB installation](#Section1SingleMachineAsterixDBInstallation)
+* [Section 2: Single-Machine AsterixDB installation (Advanced)](#Section2SingleMachineAsterixDBInstallationAdvanced)
+* [Section 3: Installing AsterixDB on a Cluster of Multiple Machines](#Section3InstallingAsterixDBOnAClusterOfMultipleMachines)
+* [Section 4: Managing the Lifecycle of an AsterixDB Instance](#Section4ManagingTheLifecycleOfAnAsterixDBInstance)
+* [Section 5: Frequently Asked Questions](#Section5FAQ)
+
+This is a quickstart guide for getting AsterixDB running in a distributed environment. This guide also introduces the AsterixDB installer (nicknamed _*Managix*_) and describes how it can be used to create and manage an AsterixDB instance. By following the simple steps described in this guide, you will get a running instance of AsterixDB. You shall be able to use AsterixDB from its Web interface and manage its lifecycle using Managix. This document assumes that you are running some version of _*Linux*_ or _*MacOS X*_.
+
+## <a id="PrerequisitesForInstallingAsterixDB">Prerequisites for Installing AsterixDB</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
Prerequisite:
* [JDK7](http://www.oracle.com/technetwork/java/javase/downloads/index.html) (Otherwise known as JDK 1.7).
@@ -25,7 +35,7 @@
* For Mac: [JDK 7 Mac Install](http://docs.oracle.com/javase/7/docs/webnotes/install/mac/mac-jdk.html)
JDK would be installed at /Library/Java/JavaVirtualMachines/jdk-version/Contents/Home .
-The java installation directory is referred as JAVA_HOME. Since we upgraded/installed Java, we need to ensure JAVA_HOME points to the installation directory of JDK 7. Modify your ~/.bash_profile (or ~/.bashrc) and define JAVA_HOME accordingly. After modifying, execute the following:
+The java installation directory is referred as JAVA_HOME. Since we upgraded/installed Java, we need to ensure JAVA_HOME points to the installation directory of JDK 7. Modify your ~/.bash_profile (or ~/.bashrc) and define JAVA_HOME accordingly. After the modification, execute the following:
$ java -version
@@ -36,36 +46,27 @@
$ echo "PATH=$JAVA_HOME/bin:$PATH" >> ~/.bash_profile (or ~/.bashrc)
$ source ~/.bash_profile (or ~/.bashrc)
-We also need to ensure that $JAVA_HOME/bin is in the PATH. $JAVA_HOME/bin should be included in the PATH value. We need to change the if $JAVA_HOME/bin is already in the PATH, we shall simply execute the following:
-
- $ java
-
-If you get the following message, you need to alter the PATH variable in your ~/.bash_profile or ~/.bashrc (whichever you use).
-
-
- -bash: java: command not found
-
-## Section 1: Single-Machine ASTERIX installation ##
-We assume a user Joe with a home directory as /home/joe. Please note that on Mac, the home directory for user Joe would be /Users/joe.
+## <a id="Section1SingleMachineAsterixDBInstallation">Section 1: Single-Machine AsterixDB installation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
+We assume a user called "Joe" with a home directory as /home/joe. On a Mac, the home directory for user Joe would be /Users/joe.
### Configuring Environment ###
-Ensure that JAVA_HOME variable is defined and points to the the java installation directory on your machine. To verify, execute the following.
+Ensure that JAVA_HOME variable is defined and points to the the java installation directory on your machine. To verify, execute the following:
$ echo $JAVA_HOME
-If you do not see any output, JAVA_HOME is not defined. We need to add the following line to your profile located at /home/joe/.bash_profile or /home/joe/.bashrc, whichever you are using. If you do not any of these files, create a ~/.bash_profile.
+If you do not see any output, JAVA_HOME is not defined. We need to add the following line to your profile located at /home/joe/.bash_profile or /home/joe/.bashrc, whichever you are using. If you do not have any of these files, create a ~/.bash_profile file.
export JAVA_HOME=<Path to Java installation directory>
-After you have edited ~/.bash_profile (or ~/.bashrc), execute the following to make the changes effective in current shell.
+After you have edited ~/.bash_profile (or ~/.bashrc), execute the following to make the changes effective in current shell:
$ source /home/joe/.bash_profile (or /home/joe/.bashrc)
-Before proceeding, verify that JAVA_HOME is defined by executing the following.
+Before proceeding, verify that JAVA_HOME is defined by executing the following:
$ echo $JAVA_HOME
@@ -74,7 +75,7 @@
If SSH is not enabled on your system, please follow the instruction below to enable/install it or else skip to the section [Configuring Password-less SSH](#Configuring_Password-less_SSH).
#### Enabling SSH on Mac ####
-The Apple Mac OS X operating system has SSH installed by default but the SSH daemon is not enabled. This means you can’t login remotely or do remote copies until you enable it. To enable it, go to ‘System Preferences’. Under ‘Internet & Networking’ there is a ‘Sharing’ icon. Run that. In the list that appears, check the ‘Remote Login’ option. Also check the "All users" radio button for "Allow access for". This starts the SSH daemon immediately and you can remotely login using your username. The ‘Sharing’ window shows at the bottom the name and IP address to use. You can also find this out using ‘whoami’ and ‘ifconfig’ from the Terminal application.
+The Apple Mac OS X operating system has SSH installed by default but the SSH daemon is not enabled. This means you can't login remotely or do remote copies until you enable it. To enable it, go to 'System Preferences'. Under 'Internet & Networking' there is a 'Sharing' icon. Run that. In the list that appears, check the 'Remote Login' option. Also check the "All users" radio button for "Allow access for". This starts the SSH daemon immediately and you can remotely login using your username. The 'Sharing' window shows at the bottom the name and IP address to use. You can also find this out using 'whoami' and 'ifconfig' from the Terminal application.
#### Enabling SSH on Linux ####
@@ -84,7 +85,7 @@
#### Configuring Password-less SSH ####
-For our single-machine setup of ASTERIX, we need to configure password-less SSH access to localhost. We assume that you are on the machine where you want to install ASTERIX. To verify if you already have password-less SSH configured, execute the following.
+For our single-machine setup of AsterixDB, we need to configure password-less SSH access to localhost. We assume that you are on the machine where you want to install AsterixDB. To verify if you already have password-less SSH configured, execute the following:
$ ssh 127.0.0.1
@@ -103,7 +104,6 @@
$ ssh 127.0.0.1
Last login: Sat Mar 23 22:52:49 2013
-
[Important: Password-less SSH requires the use of a (public,private) key-pair. The key-pair is located as a pair of files under
$HOME/.ssh directory. It is required that the (public,private) key-pair files have default names (id_rsa.pub, id_rsa) respectively.
If you are using different names, please rename the files to use the default names]
@@ -129,7 +129,7 @@
/home/joe/.ssh/id_rsa already exists.
Overwrite (y/n)?
-You should see an output similar to one shown below.
+You should see an output similar to one shown below:
The key fingerprint is:
@@ -158,14 +158,14 @@
$ ssh 127.0.0.1
-You may see an output similar to one shown below.
+You may see an output similar to one shown below:
The authenticity of host '127.0.0.1 (127.0.0.1)' can't be established.
- RSA key fingerprint is aa:7b:51:90:74:39:c4:f6:28:a2:9d:47:c2:8d:33:31.
+ RSA key fingerprint is aa:7b:51:90:74:39:c4:f6:28:a2:9d:47:c2:8d:33:31.
Are you sure you want to continue connecting (yes/no)?
-Type 'yes' and press the enter key. You should see an output similar to one shown below.
+Type 'yes' and press the enter key. You should see an output similar to one shown below:
Warning: Permanently added '127.0.0.1' (RSA) to the list of known hosts.
@@ -185,9 +185,9 @@
Connection to 127.0.0.1 closed.
### Configuring Managix ###
-You will need the ASTERIX installer (a.k.a Managix). Download Managix from [here](https://asterixdb.googlecode.com/files/asterix-installer-0.0.5-binary-assembly.zip); this includes the bits for Managix as well as ASTERIX.
+You will need the AsterixDB installer (a.k.a. Managix). Download Managix from [here](http://asterixdb.ics.uci.edu/download/asterix-installer-0.8.3-binary-assembly.zip); this includes the bits for Managix as well as AsterixDB.
-Unzip the Managix zip bundle to an appropriate location. You may create a sub-directory: asterix-mgmt (short for asterix-management) under your home directory. We shall refer to this location as MANAGIX_HOME.
+Unzip the Managix zip bundle to an appropriate location. You may create a sub-directory called "asterix-mgmt" (short for asterix-management) under your home directory. We shall refer to this location as MANAGIX_HOME.
$ cd ~
@@ -197,7 +197,7 @@
/home/joe/asterix-mgmt> $ export MANAGIX_HOME=`pwd`
/home/joe/asterix-mgmt> $ export PATH=$PATH:$MANAGIX_HOME/bin
-It is recommended that you add $MANAGIX_HOME/bin to your PATH variable in your bash profile . This can be done by executing the following.
+It is recommended that you add $MANAGIX_HOME/bin to your PATH variable in your bash profile . This step can be done by executing the following.
currentDir=`pwd`
@@ -206,12 +206,12 @@
Above, use ~/.bashrc instead of ~/.bash_profile if you are using ~/.bashrc .
-To be able to create an ASTERIX instance and manage its lifecycle, the Managix requires you to configure a set of configuration files namely:
+To be able to create an AsterixDB instance and manage its lifecycle, the Managix requires you to configure a set of configuration files namely:
* `conf/managix-conf.xml`: A configuration XML file that contains configuration settings for Managix.
* A configuration XML file that describes the nodes in the cluster, e.g., `$MANAGIX_HOME/clusters/local/local.xml`.
-Since we intend to run ASTERIX on a single node, Managix can auto-configure itself and populate the above mentioned configuration files. To auto-configure Managix, execute the following in the MANAGIX_HOME directory:
+Since we intend to run AsterixDB on a single node, Managix can auto-configure itself and populate the above configuration files. To auto-configure Managix, execute the following in the MANAGIX_HOME directory:
/home/joe/asterix-mgmt> $ managix configure
@@ -228,18 +228,18 @@
INFO: Environment [OK]
INFO: Cluster configuration [OK]
-### Creating an ASTERIX instance ###
-Now that we have configured Managix, we shall next create an ASTERIX instance. An ASTERIX instance is identified by a unique name and is created using the `create` command. The usage description for the `create` command can be obtained by executing the following.
+### Creating an AsterixDB instance ###
+Now that we have configured Managix, we shall next create an AsterixDB instance. An AsterixDB instance is identified by a unique name and is created using the `create` command. The usage description for the `create` command can be obtained by executing the following:
$ managix help -cmd create
- Creates an ASTERIX instance with a specified name. Post creation, the instance is in ACTIVE state,
+ Creates an AsterixDB instance with a specified name. Post creation, the instance is in ACTIVE state,
indicating its availability for executing statements/queries.
Usage arguments/options:
- -n Name of the ASTERIX instance.
+ -n Name of the AsterixDB instance.
-c Path to the cluster configuration file
-We shall now use the create command to create an ASTERIX instance by the name "my_asterix". In doing so, we shall use the cluster configuration file that was auto-generated by managix.
+We shall now use the `create` command to create an AsterixDB instance by the name "my_asterix". In doing so, we shall use the cluster configuration file that was auto-generated by Managix.
$ managix create -n my_asterix -c $MANAGIX_HOME/clusters/local/local.xml
@@ -252,7 +252,7 @@
Web-Url:http://127.0.0.1:19001
State:ACTIVE
-The third line above shows the web-url http://127.0.0.1:19001 for ASTERIX's web-interface. The ASTERIX instance is in the 'ACTIVE' state indicating that you may access the web-interface by navigating to the web-url.
+The third line above shows the web-url http://127.0.0.1:19001 for an AsterixDB's web interface. The AsterixDB instance is in the 'ACTIVE' state, indicating that you may access the web interface by navigating to the web url.
Type in the following "Hello World" query in the box:
@@ -260,23 +260,23 @@
let $message := 'Hello World!'
return $message
-Press the "Execute" button. If the query result shows on the output box, then Congratulations! You have successfully created an ASTERIX instance!
+Press the "Run" button. If the query result shows on the output box, then Congratulations! You have successfully created an AsterixDB instance!
-## Section 2: Single-Machine ASTERIX installation (Advanced) ##
-We assume that you have successfully completed the single-machine ASTERIX installation by following the instructions above in section [ASTERIX installation](#Section_1:_Single-Machine_ASTERIX_installation Single Machine). In this section, we shall cover advanced topics related to ASTERIX configuration. Before we proceed, it is imperative to go through some preliminary concepts related to ASTERIX runtime.
+## <a id="Section2SingleMachineAsterixDBInstallationAdvanced">Section 2: Single-Machine AsterixDB installation (Advanced)</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
+We assume that you have successfully completed the single-machine AsterixDB installation by following the instructions above in section [AsterixDB installation](#Section_1:_Single-Machine_AsterixDB_installation). In this section, we shall cover advanced topics related to AsterixDB configuration. Before we proceed, it is imperative to go through some preliminary concepts related to AsterixDB runtime.
-### ASTERIX Runtime ###
-An ASTERIX runtime comprises of a ''master node'' and a set of ''worker nodes'', each identified by a unique id. The master node runs a ''Cluster Controller'' service (a.k.a. ''CC''), while each worker node runs a ''Node Controller'' service (a.k.a. ''NC''). Please note that a node in an ASTERIX cluster is a logical concept in the sense that multiple nodes may map to a single physical machine, which is the case for a single-machine ASTERIX installation. This association or mapping between an ASTERIX node and a physical machine is captured in a cluster configuration XML file. In addition, the XML file contains properties and parameters associated with each node.
+### AsterixDB Runtime ###
+An AsterixDB runtime comprises of a ''master node'' and a set of ''worker nodes'', each identified by a unique id. The master node runs a ''Cluster Controller'' service (a.k.a. ''CC''), while each worker node runs a ''Node Controller'' service (a.k.a. ''NC''). Please note that a node in an AsterixDB cluster is a logical concept in the sense that multiple nodes may map to a single physical machine, which is the case for a single-machine AsterixDB installation. This association or mapping between an AsterixDB node and a physical machine is captured in a cluster configuration XML file. In addition, the XML file contains properties and parameters associated with each node.
-#### ASTERIX Runtime Configuration ####
-As observed earlier, Managix can auto-configure itself for a single-machine setup. As part of auto-configuration, Managix generated the cluster XML file. Let us understand the components of the generated cluster XML file. If you have configured Managix (via the "configure" command), you can find a similar cluster XML file as $MANAGIX_HOME/clusters/local/local.xml. The following is a sample XML file generated on a Ubuntu (Linux) setup:
+#### AsterixDB Runtime Configuration ####
+As observed earlier, Managix can auto-configure itself for a single-machine setup. As part of auto-configuration, Managix generated the cluster XML file. Let us understand the components of the generated cluster XML file. If you have configured Managix (via the `configure` command), you can find a similar cluster XML file as $MANAGIX_HOME/clusters/local/local.xml. The following is a sample XML file generated on a Ubuntu (Linux) setup:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cluster xmlns="cluster">
<name>local</name>
<java_home>/usr/lib/jvm/jdk1.7.0</java_home>
- <log_dir>/home/joe/asterix-mgmt/clusters/local/working_dir/logs</logdir>
+ <log_dir>/home/joe/asterix-mgmt/clusters/local/working_dir/logs</log_dir>
<txn_log_dir>/home/joe/asterix-mgmt/clusters/local/working_dir/logs</txn_log_dir>
<iodevices>/home/joe/asterix-mgmt/clusters/local/working_dir</iodevices>
<store>storage</store>
@@ -286,8 +286,11 @@
</working_dir>
<master_node>
<id>master</id>
- <client_ip>127.0.0.1</client_ip>
- <cluster_ip>127.0.0.1</cluster_ip>
+ <client-ip>127.0.0.1</client-ip>
+ <cluster-ip>127.0.0.1</cluster-ip>
+ <client_port>1098</client_port>
+ <cluster_port>1099</cluster_port>
+ <http_port>8888</http_port>
</master_node>
<node>
<id>node1</id>
@@ -297,14 +300,17 @@
We shall next explain the components of the cluster configuration XML file.
-#### (1) Defining nodes in ASTERIX runtime ####
-The single-machine ASTERIX instance configuration that is auto-generated by Managix (using the "configure" command) involves a master node (CC) and a worker node (NC). Each node is assigned a unique id and provided with an ip address (called ''cluster_ip'') that maps a node to a physical machine. The following snippet from the above XML file captures the master/worker nodes in our ASTERIX installation.
+#### (1) Defining nodes in AsterixDB runtime ####
+The single-machine AsterixDB instance configuration that is auto-generated by Managix (using the `configure` command) involves a master node (CC) and a worker node (NC). Each node is assigned a unique id and provided with an ip address (called ''cluster-ip'') that maps a node to a physical machine. The following snippet from the above XML file captures the master/worker nodes in our AsterixDB installation.
<master_node>
<id>master</id>
- <client_ip>127.0.0.1</client_ip>
- <cluster_ip>127.0.0.1</cluster_ip>
+ <client-ip>127.0.0.1</client-ip>
+ <cluster-ip>127.0.0.1</cluster-ip>
+ <client_port>1098</client_port>
+ <cluster_port>1099</cluster_port>
+ <http_port>8888</http_port>
</master_node>
<node>
<id>node1</id>
@@ -328,13 +334,26 @@
<td>IP address of the machine to which a node maps to. This address is used for all internal communication between the nodes.</td>
</tr>
<tr>
- <td>client_ip</td>
- <td>Provided for the master node. This IP should be reachable from clients that want to connect with ASTERIX via its web interface.</td>
+ <td>client-ip</td>
+ <td>Provided for the master node. This IP should be reachable from clients that want to connect with AsterixDB via its web interface.</td>
</tr>
+<tr>
+ <td>client-port</td>
+ <td>Provided for the master node. This is the port at which the Cluster Controller (CC) service listens for connections from clients.</td>
+</tr>
+<tr>
+ <td>cluster-port</td>
+ <td>Provided for the master node. This is the port used by the Cluster Controller (CC) service to listen for connections from Node Controllers (NCs). </td>
+</tr>
+<tr>
+ <td>http-port</td>
+ <td>Provided for the master node. This is the http port used by the Cluster Controller (CC) service. </td>
+</tr>
+
</table>
-#### (2) Properties associated with a worker node (NC) in ASTERIX ####
-The following is a list of properties associated with each worker node in an ASTERIX configuration.
+#### (2) Properties associated with a worker node (NC) in AsterixDB ####
+The following is a list of properties associated with each worker node in an AsterixDB configuration.
<table>
<tr>
@@ -347,11 +366,11 @@
</tr>
<tr>
<td>log_dir</td>
- <td>A directory where worker node may write logs.</td>
+ <td>A directory where the worker node JVM may write logs.</td>
</tr>
<tr>
<td>txn_log_dir</td>
- <td>A directory where worker node may write transaction logs.</td>
+ <td>A directory where the worker node writes transaction logs.</td>
</tr>
<tr>
<td>iodevices</td>
@@ -359,13 +378,13 @@
</tr>
<tr>
<td>store</td>
- <td>A data directory (under each iodevice) that ASTERIX uses to store data belonging to dataset(s).</td>
+ <td>A data directory (under each iodevice) that AsterixDB uses to store data belonging to dataset(s).</td>
</tr>
</table>
-All the above properties can be defined at the global level or a local level. In the former case, these properties apply to all the nodes in an ASTERIX configuration. In the latter case, these properties apply only to the node(s) under which they are defined. A property defined at the local level overrides the definition at the global level.
+All the above properties can be defined at the global level or a local level. In the former case, these properties apply to all the nodes in an AsterixDB configuration. In the latter case, these properties apply only to the node(s) under which they are defined. A property defined at the local level overrides the definition at the global level.
-#### (3) Working directory of an ASTERIX instance ####
+#### (3) Working directory of an AsterixDB instance ####
Next we explain the following setting in the file $MANAGIX_HOME/clusters/local/local.xml.
@@ -375,10 +394,10 @@
</working_dir>
-Managix associates a working directory with an ASTERIX instance and uses this directory for transferring binaries to each node. If there exists a directory that is readable by each node, Managix can use it to place binaries that can be accessed and used by all the nodes in the ASTERIX set up. A network file system (NFS) provides such a functionality for a cluster of physical machines such that a path on NFS is accessible from each machine in the cluster. In the single-machine set up described above, all nodes correspond to a single physical machine. Each path on the local file system is accessible to all the nodes in the ASTERIX setup and the boolean value for NFS above is thus set to `true`.
+Managix associates a working directory with an AsterixDB instance and uses this directory for transferring binaries to each node. If there is a directory that is readable by each node, Managix can use it to place binaries that can be accessed and used by all the nodes in the AsterixDB set up. A network file system (NFS) provides such a functionality for a cluster of physical machines so that a path on NFS is accessible from each machine in the cluster. In the single-machine set up described above, all nodes correspond to a single physical machine. Each path on the local file system is accessible to all the nodes in the AsterixDB setup and the boolean value for NFS above is thus set to `true`.
### Managix Configuration ###
-Managix allows creation and management of multiple ASTERIX instances and uses Zookeeper as its back-end database to keep track of information related to each instance. We need to provide a set of one or more hosts that Managix can use to run a Zookeeper instance. Zookeeper runs as a daemon process on each of the specified hosts. At each host, Zookeeper stores data under the Zookeeper home directory specified as part of the configuration. The following is an example configuration `$MANAGIX_HOME/conf/managix-conf.xml` that has Zookeeper running on the localhost (127.0.0.1) :
+Managix allows creation and management of multiple AsterixDB instances and uses Zookeeper as its back-end database to keep track of information related to each instance. We need to provide a set of one or more hosts that Managix can use to run a Zookeeper instance. Zookeeper runs as a daemon process on each of the specified hosts. At each host, Zookeeper stores data under the Zookeeper home directory specified as part of the configuration. The following is an example configuration `$MANAGIX_HOME/conf/managix-conf.xml` that has Zookeeper running on the localhost (127.0.0.1) :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
@@ -392,20 +411,20 @@
</zookeeper>
</configuration>
-It is possible to have a single host for Zookeeper. A larger number of hosts would use Zookeeper's replication and fault-tolerance feature such that a failure of a host running Zookeeper would not result in loss of information about existing ASTERIX instances.
+It is possible to have a single host for Zookeeper. A larger number of hosts would use Zookeeper's replication and fault-tolerance feature such that a failure of a host running Zookeeper would not result in loss of information about existing AsterixDB instances.
-## Section 3: Installing ASTERIX on a Cluster of Multiple Machines ##
-We assume that you have read the two sections above on single-machine ASTERIX setup. Next we explain how to install ASTERIX in a cluster of multiple machines. As an example, we assume we want to setup ASTERIX on a cluster of three machines, in which we use one machine (called machine A) as the master node and two other machines (called machine B and machine C) as the worker nodes, as shown in the following diagram:
+## <a id="Section3InstallingAsterixDBOnAClusterOfMultipleMachines">Section 3: Installing AsterixDB on a Cluster of Multiple Machines</a><font size="4"><a href="#toc">[Back to TOC]</a></font> ##
+We assume that you have read the two sections above on single-machine AsterixDB setup. Next we explain how to install AsterixDB in a cluster of multiple machines. As an example, we assume we want to setup AsterixDB on a cluster of three machines, in which we use one machine (called machine A) as the master node and two other machines (called machine B and machine C) as the worker nodes, as shown in the following diagram:
![AsterixCluster](https://asterixdb.googlecode.com/files/AsterixCluster.png)
Notice that each machine has a ''cluster_ip'' address, which is used by these machines for their intra-cluster communication. Meanwhile, the master machine also has a ''client_ip'' address, using which an end-user outside the cluster can communicate with this machine. The reason we differentiate between these two types of IP addresses is that we can have a cluster of machines using a private network. In this case they have internal ip addresses that cannot be used outside the network. In the case all the machines are on a public network, the "client_ip" and "cluster_ip" of the master machine can share the same address.
-Next we describe how to set up ASTERIX in this cluster, assuming no Managix has been installed on these machines.
+Next we describe how to set up AsterixDB in this cluster, assuming no Managix has been installed on these machines.
-### Step (1): Define the ASTERIX cluster ###
+### Step (1): Define the AsterixDB cluster ###
-We first log into the master machine as the user "joe". On this machine, download Managix from [here](https://asterixdb.googlecode.com/files/asterix-installer-0.0.5-binary-assembly.zip) (same as above), then do the following steps similar to the single-machine case described above:
+We first log into the master machine as the user "joe". On this machine, download Managix from [here](http://asterixdb.ics.uci.edu/download/asterix-installer-0.8.3-binary-assembly.zip) (save as above), then do the following steps similar to the single-machine case described above:
machineA> cd ~
@@ -415,8 +434,10 @@
machineA> export MANAGIX_HOME=`pwd`
machineA> export PATH=$PATH:$MANAGIX_HOME/bin
+Note that it is recommended that MANAGIX_HOME is not located on a network file system (NFS). Managix creates artifacts/logs that are not required to be shared. Any overhead
+associated with creating artifacts/logs on the NFS should be avoided.
-We also need an ASTERIX configuration XML file for the cluster. We give the name to the cluster, say, "rainbow". We create a folder for the configuration of this cluster:
+We also need an AsterixDB configuration XML file for the cluster. We give the name to the cluster, say, "rainbow". We create a folder for the configuration of this cluster:
machineA> mkdir $MANAGIX_HOME/rainbow_cluster
@@ -432,30 +453,32 @@
<!-- username, which should be valid for all the three machines -->
<username>joe</username>
- <!-- The working directory of Managix. It should be on a network file system (NFS) that
- can accessed by all the machine. -->
+ <!-- The working directory of Managix. It is recommended for the working
+ directory to be on a network file system (NFS) that can accessed by
+ all machines.
+ Managix creates the directory if it it doesn't exist. -->
<working_dir>
<dir>/home/joe/managix-workingDir</dir>
<NFS>true</NFS>
</working_dir>
- <!-- Directory for Asterix to store log information for each node. Needs
- to be on the local file system. -->
- <log_dir>/mnt/joe/logs</log_dir>
+ <!-- Directory for Asterix to store log information for each machine.
+ Needs to be on the local file system of each machine.
+ Managix creates the directory if it doesn't exist.
+ This property can be overriden for a node by redefining at the node level. -->
+ <logdir>/mnt/joe/logs</logdir>
- <!-- Directory for Asterix to store transaction logs information for each node. Needs
- to be on the local file system. -->
- <txn_log_dir>/mnt/joe/txn-logs</txn_log_dir>
-
+ <!-- Mount point of an iodevice. Use a comma separated list for a machine that
+ has multiple iodevices (disks).
+ This property can be overriden for a node by redefining at the node level. -->
<iodevices>/mnt/joe</iodevices>
- <!-- Directory named (under each iodevice) that used by each worker node to store data files. Needs
- to be on the local file system. -->
+ <!-- Path on each iodevice where Asterix will store its data -->
<store>storage</store>
- <!-- Java home for each node. Can be overriden at node level. -->
+ <!-- Java home for each machine -->
<java_home>/usr/lib/jvm/jdk1.7.0</java_home>
-
+
<!-- IP addresses of the master machine A -->
<master_node>
<id>master</id>
@@ -524,7 +547,6 @@
machineA> ssh-keygen -t rsa -P ""
machineA> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
machineA> chmod 700 $HOME/.ssh/authorized_keys
-
If $HOME is not on the NFS, copy the id_rsa.pub to the directory ~/.ssh (login with the same account) on each machine, and then do the following on each machine. (Notice that this step is not needed if the folder ".ssh" is on the NFS and can be accessed by all the nodes.)
@@ -533,7 +555,7 @@
cd ~/.ssh
cat id_rsa.pub >> authorized_keys
chmod 700 $HOME/.ssh/authorized_keys
-
+
Then run the following step again and type "yes" if prompted:
@@ -542,13 +564,13 @@
### Step (3): Configuring Managix ###
-Managix is using a configuration XML file at `$MANAGIX_HOME/conf/managix-conf.xml` to configure its own properties, such as its Zookeeper service. We can use the `configure` command to auto-generate this configuration file:
+Managix uses a configuration XML file at `$MANAGIX_HOME/conf/managix-conf.xml` to configure its own properties, such as its Zookeeper service. We can use the `configure` command to auto-generate this configuration file:
machineA> managix configure
-We use the validate command to validate managix configuration. To do so, execute the following.
+We use the `validate` command to validate the Managix configuration. To do so, execute the following.
machineA> managix validate
INFO: Environment [OK]
@@ -557,49 +579,49 @@
Note that the `configure` command also generates a cluster configuration XML file at $MANAGIX_HOME/conf/clusters/local.xml. This file is not needed in the case of a cluster of machines.
-### Step (4): Creating an ASTERIX instance ###
+### Step (4): Creating an AsterixDB instance ###
-Now that we have configured Managix, we shall next create an ASTERIX instance. An ASTERIX instance is identified by a unique name and is created using the create command. The usage description for the create command can be obtained by executing the following:
+Now that we have configured Managix, we shall next create an AsterixDB instance, which is identified by a unique name and is created using the `create` command. The usage description for the `create` command can be obtained by executing the following:
machineA> managix help -cmd create
- Creates an ASTERIX instance with a specified name. Post creation, the instance is in ACTIVE state,
+ Creates an AsterixDB instance with a specified name. Post creation, the instance is in ACTIVE state,
indicating its availability for executing statements/queries.
Usage arguments/options:
- -n Name of the ASTERIX instance.
+ -n Name of the AsterixDB instance.
-c Path to the cluster configuration file
-We shall now use the `create` command to create an ASTERIX instance called "rainbow_asterix". In doing so, we shall use the cluster configuration file that was auto-generated by Managix.
+We shall now use the `create` command to create an AsterixDB instance called "rainbow_asterix". In doing so, we shall use the cluster configuration file that was auto-generated by Managix.
machineA> managix create -n rainbow_asterix -c $MANAGIX_HOME/clusters/rainbow.xml
-If the response message does not have warning, then Congratulations! You have successfully installed Asterix on this cluster of machines!
+If the response message does not have warning, then Congratulations! You have successfully installed AsterixDB on this cluster of machines!
-Please refer to the section [Managing the Lifecycle of an ASTERIX Instance](#Section_4:_Managing_the_Lifecycle_of_an_ASTERIX_Instance) for a detailed description on the set of available commands/operations that let you manage the lifecycle of an ASTERIX instance. Note that the output of the commands varies with the cluster definition and may not apply to the cluster specification you built above.
+Please refer to the section [Managing the Lifecycle of an AsterixDB Instance](#Section_4:_Managing_the_Lifecycle_of_an_AsterixDB_Instance) for a detailed description on the set of available commands/operations that let you manage the lifecycle of an AsterixDB instance. Note that the output of the commands varies with the cluster definition and may not apply to the cluster specification you built above.
-## Section 4: Managing the Lifecycle of an ASTERIX Instance ##
+## <a id="Section4ManagingTheLifecycleOfAnAsterixDBInstance">Section 4: Managing the Lifecycle of an AsterixDB Instance</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
-Now that we have an ASTERIX instance running, let us use Managix to manage the instance's lifecycle. Managix provides the following set of commands/operations:
+Now that we have an AsterixDB instance running, let us use Managix to manage the instance's lifecycle. Managix provides the following set of commands/operations:
#### Managix Commands ####
<table>
<tr><td>Command</td> <td>Description</td></tr>
-<tr><td><a href="#Creating_an_ASTERIX_instance">create</a></td> <td>Creates a new asterix instance.</td></tr>
+<tr><td><a href="#Creating_an_AsterixDB_instance">create</a></td> <td>Creates a new asterix instance.</td></tr>
<tr><td><a href="#Describe_Command" >describe</a></td> <td>Describes an existing asterix instance.</td></tr>
<tr><td><a href="#Stop_Command" >stop</a></td> <td>Stops an asterix instance that is in the ACTIVE state.</td></tr>
-<tr><td><a href="#Start_Command" >start</a></td> <td>Starts an Asterix instance.</td></tr>
-<tr><td><a href="#Backup_Command" >backup</a></td> <td>Creates a backup for an existing Asterix instance.</td></tr>
-<tr><td><a href="#Restore_Command" >restore</a></td> <td>Restores an Asterix instance.</td></tr>
-<tr><td><a href="#Delete_Command" >delete</a></td> <td>Deletes an Asterix instance.</td></tr>
+<tr><td><a href="#Start_Command" >start</a></td> <td>Starts an AsterixDB instance.</td></tr>
+<tr><td><a href="#Backup_Command" >backup</a></td> <td>Creates a backup for an existing AsterixDB instance.</td></tr>
+<tr><td><a href="#Restore_Command" >restore</a></td> <td>Restores an AsterixDB instance.</td></tr>
+<tr><td><a href="#Delete_Command" >delete</a></td> <td>Deletes an AsterixDB instance.</td></tr>
<tr><td><a href="#Configuring_Managix" >validate</a></td> <td>Validates the installer/cluster configuration.</td></tr>
-<tr><td><a href="#Configuring_Managix" >configure</a></td><td>Auto generate configuration for an Asterix instance.</td></tr>
+<tr><td><a href="#Configuring_Managix" >configure</a></td><td>Auto generates a configuration for an AsterixDB instance.</td></tr>
<tr><td><a href="#Log_Command" >log</a></td><td>Produces a zip archive containing log files from each node in an AsterixDB instance.</td></tr>
-<tr><td><a href="#Shutdown_Command" >shutdown</a></td> <td>Shutdown the installer service.</td></tr>
+<tr><td><a href="#Shutdown_Command" >shutdown</a></td> <td>Shuts down the installer service.</td></tr>
</table>
You may obtain the above listing by simply executing 'managix' :
@@ -607,22 +629,22 @@
$ managix
-We already talked about create and validate commands. We shall next explain the rest of the commands listed above. We also provide sample output messages of these commands assuming we are running an ASTERIX instance on a single machine.
+We already talked about `create` and `validate` commands. We shall next explain the rest of the commands listed above. We also provide sample output messages of these commands assuming we are running an AsterixDB instance on a single machine.
##### Describe Command #####
-The `describe` command provides information about an ASTERIX instance. The usage can be looked up by executing the following:
+The `describe` command provides information about an AsterixDB instance. The usage can be looked up by executing the following:
$ managix help -cmd describe
- Provides information about an ASTERIX instance.
+ Provides information about an AsterixDB instance.
The following options are available:
- [-n] Name of the ASTERIX instance.
+ [-n] Name of the AsterixDB instance.
[-admin] Provides a detailed description
The brackets indicate optional flags.
-The output of the `describe` command when used without the `admin` flag contains minimal information and is similar to the output of the create command. Let us try running the describe command in "admin" mode.
+The output of the `describe` command when used without the `admin` flag contains minimal information and is similar to the output of the `create` command. Let us try running the describe command in "admin" mode.
$ managix describe -n my_asterix -admin
@@ -640,26 +662,56 @@
Processes
NC at 127.0.0.1 [ 22195 ]
CC at 127.0.0.1 [ 22161 ]
+
+ Asterix Configuration
+ nc.java.opts :-Xmx1024m
+ cc.java.opts :-Xmx1024m
+ storage.buffercache.pagesize :32768
+ storage.buffercache.size :33554432
+ storage.buffercache.maxopenfiles :214748364
+ storage.memorycomponent.pagesize :32768
+ storage.memorycomponent.numpages :1024
+ storage.memorycomponent.globalbudget :536870192
+ storage.lsm.mergethreshold :3
+ storage.lsm.bloomfilter.falsepositiverate:0.01
+ txn.log.buffer.numpages :8
+ txn.log.buffer.pagesize :131072
+ txn.log.partitionsize :2147483648
+ txn.log.disksectorsize :4096
+ txn.log.groupcommitinterval :1
+ txn.log.checkpoint.lsnthreshold :67108864
+ txn.log.checkpoint.pollfrequency :120
+ txn.log.checkpoint.history :0
+ txn.lock.escalationthreshold :1000
+ txn.lock.shrinktimer :5000
+ txn.lock.timeout.waitthreshold :60000
+ txn.lock.timeout.sweepthreshold :10000
+ compiler.sortmemory :33554432
+ compiler.joinmemory :33554432
+ compiler.framesize :32768
+ web.port :19001
+ api.port :19002
+ log.level :INFO
As seen above, the instance 'my_asterix' is configured such that all processes running at the localhost (127.0.0.1). The process id for each process (JVM) is shown next to it.
##### Stop Command #####
-The `stop` command can be used for shutting down an ASTERIX instance. After that, the instance is unavailable for executing queries. The usage can be looked up by executing the following:
+The `stop` command can be used for shutting down an AsterixDB instance. After that, the instance is unavailable for executing queries. The usage can be looked up by executing the following.
$ managix help -cmd stop
- Shuts an ASTERIX instance that is in ACTIVE state. After executing the stop command, the ASTERIX instance transits
+ Shuts an AsterixDB instance that is in ACTIVE state. After executing the stop command, the AsterixDB instance transits
to the INACTIVE state, indicating that it is no longer available for executing queries.
Available arguments/options
- -n name of the ASTERIX instance.
+ -n name of the AsterixDB instance.
-To stop the ASTERIX instance.
+To stop the AsterixDB instance.
$ managix stop -n my_asterix
- INFO: Stopped Asterix instance: my_asterix
+ INFO: Stopped AsterixDB instance: my_asterix
$ managix describe -n my_asterix
@@ -670,17 +722,17 @@
##### Start Command #####
-The `start` command starts an ASTERIX instance that is in the INACTIVE state. The usage can be looked up by executing the following:
+The `start` command starts an AsterixDB instance that is in the INACTIVE state. The usage can be looked up by executing the following:
$ managix help -cmd start
- Starts an ASTERIX instance that is in INACTIVE state. After executing the start command, the ASTERIX instance transits to the ACTIVE state, indicating that it is now available for executing statements/queries.
+ Starts an AsterixDB instance that is in INACTIVE state. After executing the start command, the AsterixDB instance transits to the ACTIVE state, indicating that it is now available for executing statements/queries.
Available arguments/options
- -n name of the ASTERIX instance.
+ -n name of the AsterixDB instance.
-Let us now start the ASTERIX instance.
+Let us now start the AsterixDB instance.
$ managix start -n my_asterix
@@ -692,22 +744,22 @@
##### Backup Command #####
-In an undesirable event of data loss either due to a disk/system failure or accidental execution of a DDL statement (drop dataverse/dataset), you may need to recover the lost data. The backup command allows you to take a backup of the data stored with an ASTERIX instance. The backup can be taken on the local file system or on an HDFS instance. In either case, the snapshots are stored under a backup directory. You need to make sure the backup directory has appropriate read/write permissions. Configuring settings for backup can be found inside the Managix's configuration file located at `$MANAGIX_HOME/conf/managix-conf.xml`.
+The backup command allows you to take a backup of the data stored with an AsterixDB instance. The backup can be taken on the local file system or on an HDFS instance. In either case, the snapshots are stored under a backup directory. You need to make sure the backup directory has appropriate read/write permissions. Configuring settings for backup can be found inside the Managix's configuration file located at `$MANAGIX_HOME/conf/managix-conf.xml`.
*Configuring backup on the local file system*
-We need to provide path to a backup directory on the local file system. The backup directory can be configured be editing the Managix configuration XML, found at `$MANAGIX_HOME/conf/managix-conf.xml`.
+We need to provide a path to a backup directory on the local file system. The backup directory can be configured be editing the Managix configuration XML, found at `$MANAGIX_HOME/conf/managix-conf.xml`.
<backup>
<backupDir>Provide path to the backup directory here</backupDir>
</backup>
-Prior to taking a backup of an ASTERIX instance, it is required for the instance to be in the INACTIVE state. We do so by using the `stop` command, as shown below:
+Prior to taking a backup of an AsterixDB instance, it is required for the instance to be in the INACTIVE state. We do so by using the `stop` command, as shown below:
$ managix stop -n my_asterix
- INFO: Stopped Asterix instance: my_asterix
+ INFO: Stopped AsterixDB instance: my_asterix
We can now take the backup by executing the following:
@@ -718,7 +770,7 @@
*Configuring backup on an HDFS instance*
-To configure a backups to be taken on an HDFS instance, we need to provide required information about the running HDFS instance. This information includes the HDFS version and the HDFS url. Simply edit the Managix configuration file and provide the required information.
+To configure a backup to be taken on an HDFS instance, we need to provide required information about the running HDFS instance. This information includes the HDFS version and the HDFS url. Simply edit the Managix configuration file and provide the required information.
<backup>
@@ -756,29 +808,29 @@
Processes
-The above output shows the available backup identified by it's id (0). We shall next describe the method for restoring an ASTERIX instance from a backup snapshot.
+The above output shows the available backup identified by it's id (0). We shall next describe the method for restoring an AsterixDB instance from a backup snapshot.
##### Restore Command #####
-The `restore` command allows you to restore an ASTERIX instance's data from a previously taken backup. The usage description can be obtained as follows:
+The `restore` command allows you to restore an AsterixDB instance's data from a previously taken backup. The usage description can be obtained as follows:
$ managix help -cmd restore
- Restores an ASTERIX instance's data from a previously taken backup.
+ Restores an AsterixDB instance's data from a previously taken backup.
Available arguments/options
- -n name of the ASTERIX instance
+ -n name of the AsterixDB instance
-b id of the backup snapshot
-The following command restores our ASTERIX instance from the backup snapshot identified by the id (0). Prior to restoring an instance from a backup, it is required that the instance is in the INACTIVE state.
+The following command restores our AsterixDB instance from the backup snapshot identified by the id (0). Prior to restoring an instance from a backup, it is required that the instance is in the INACTIVE state.
$ managix restore -n my_asterix -b 0
- INFO: Asterix instance: my_asterix has been restored from backup
+ INFO: AsterixDB instance: my_asterix has been restored from backup
-You can start the ASTERIX instance by using the start command.
+You can start the AsterixDB instance by using the start command.
##### Log Command #####
@@ -800,22 +852,22 @@
##### Delete Command #####
-As the name suggests, the `delete` command permanently removes an ASTERIX instance by cleaning up all associated data/artifacts. The usage can be looked up by executing the following:
+As the name suggests, the `delete` command permanently removes an AsterixDB instance by cleaning up all associated data/artifacts. The usage can be looked up by executing the following:
$ managix help -cmd delete
- Permanently deletes an ASTERIX instance. The instance must be in the INACTIVE state.
+ Permanently deletes an AsterixDB instance. The instance must be in the INACTIVE state.
Available arguments/options
- -n name of the ASTERIX instance.
+ -n name of the AsterixDB instance.
$ managix delete -n my_asterix
- INFO: Asterix instance my_asterix deleted.
+ INFO: AsterixDB instance my_asterix deleted.
##### Shutdown Command #####
-Managix uses Zookeeper service for storing all information about created ASTERIX instances. The Zookeeper service runs in the background and can be shut down using the `shutdown` command.
+Managix uses Zookeeper service for storing all information about created AsterixDB instances. The Zookeeper service runs in the background and can be shut down using the `shutdown` command.
$ managix shutdown
@@ -832,11 +884,11 @@
$ managix help -cmd configure
- Auto-generates the ASTERIX installer configruation settings and ASTERIX cluster
+ Auto-generates the AsterixDB installer configruation settings and AsterixDB cluster
configuration settings for a single node setup.
-## Section 5: Frequently Asked Questions ##
+## <a id="Section5FAQ">Section 5: Frequently Asked Questions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
##### Question #####
@@ -868,29 +920,28 @@
##### Question #####
Do I need to create all the directories/paths I put into the cluster configuration XML ?
-##### Answer #####
+##### Answer #####
Managix will create a path if it is not existing. It does so using the user account mentioned in the cluster configuration xml.
Please ensure that the user account has appropriate permissions for creating the missing paths.
-##### Question #####
-
+##### Question #####
Should MANAGIX_HOME be on the network file system (NFS) ?
##### Answer #####
It is recommended that MANAGIX_HOME is not on the NFS. Managix produces artifacts/logs on disk which are not required to be shared.
As such an overhead in creating the artifacts/logs on the NFS should be avoided.
-##### Question #####
+##### Question #####
-Question: How do we change the underlying code (apply a code patch) for an 'active' asterix instance?
+How do we change the underlying code (apply a code patch) for an 'active' asterix instance?
##### Answer #####
At times, end-user (particularly asterix developer) may run into the need to altering the underlying code that is being run by an asterix instance. In the current version of managix, this can be achieved as follows:-
-Assume that you have an 'active' instance by the name a1 that is running version v1 of asterix.
-You have a revised version of asterix - v2 that fixes some bug(s).
+Assume that you have an 'active' instance by the name a1 that is running version v1 of asterix.
+You have a revised version of asterix - v2 that fixes some bug(s).
To upgrade asterix from v1 to v2:-
@@ -900,12 +951,12 @@
step 3) copy asterix-server zip (version v2) to $MANAGIX_HOME/asterix/
-step 4) managix start -n a1
+step 4) managix start -n a1
-a1 now is running on version v2.
+a1 now is running on version v2.
Limitations:-
-a) Obviously this wont work in a situation where v2 has made a change that is incompatible with earlier version, such altering schema.
+a) Obviously this wont work in a situation where v2 has made a change that is incompatible with earlier version, such altering schema.
-b) A change in asterix zip applies to all existing instances (after a restart) and subsequent instances that user creates.
+b) A change in asterix zip applies to all existing instances (after a restart) and subsequent instances that user creates.
diff --git a/asterix-doc/src/site/site.xml b/asterix-doc/src/site/site.xml
index 150544e..a9794ed 100644
--- a/asterix-doc/src/site/site.xml
+++ b/asterix-doc/src/site/site.xml
@@ -21,7 +21,7 @@
<bannerLeft>
<name>AsterixDB</name>
<src>images/asterixlogo.png</src>
- <href>/index.html</href>
+ <href>http://asterixdb.ics.uci.edu/</href>
</bannerLeft>
<version position="right"/>
@@ -54,16 +54,28 @@
</custom>
<body>
+ <head>
+ <script>
+ (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-41536543-1', 'uci.edu');
+ ga('send', 'pageview');
+ </script>
+ </head>
<links>
- <item name="Home" href="index.html"/>
+ <item name="Documentation Home" href="index.html"/>
</links>
<menu name="Documentation">
<item name="Installing and Managing AsterixDB using Managix" href="install.html"/>
<item name="AsterixDB 101: An ADM and AQL Primer" href="aql/primer.html"/>
<item name="Asterix Data Model (ADM)" href="aql/datamodel.html"/>
- <item name="Asterix Functions" href="aql/functions.html"/>
<item name="Asterix Query Language (AQL)" href="aql/manual.html"/>
+ <item name="AQL Functions" href="aql/functions.html"/>
+ <item name="AQL Allen's Relations Functions" href="aql/allens.html"/>
<item name="AQL Support of Similarity Queries" href="aql/similarity.html"/>
<item name="Accessing External Data" href="aql/externaldata.html"/>
<item name="REST API to AsterixDB" href="api.html"/>