Asterix provides various classes of functions to support operations on string, spatial, and temporal data. This document explains how to use these functions.
Syntax:
string-to-codepoint(string_expression)
Converts the string string_expression
to its code-based representation.
Arguments:
string_expression
: A string
that will be converted.Return Value:
OrderedList
of the code points for the string string_expression
.Syntax:
codepoint-to-string(list_expression)
Converts the ordered code-based representation list_expression
to the corresponding string.
Arguments:
list_expression
: An OrderedList
of code-points.Return Value:
string
representation of list_expression
.Example:
use dataverse TinySocial; let $s := "Hello ASTERIX!" let $l := string-to-codepoint($s) let $ss := codepoint-to-string($l) return {"codes": $l, "string": $ss}
The expected result is:
{ "codes": [ 72, 101, 108, 108, 111, 32, 65, 83, 84, 69, 82, 73, 88, 33 ], "string": "Hello ASTERIX!" }
Syntax:
contains(string_expression, substring_to_contain)
Checks whether the string string_expression
contains the string substring_to_contain
Arguments:
string_expression
: A string
that might contain the given substring.substring_to_contain
: A target string
that might be contained.Return Value:
boolean
value, true
if string_expression
contains substring_to_contain
, and false
otherwise.Example:
use dataverse TinySocial; for $i in dataset('FacebookMessages') where contains($i.message, "phone") return {"mid": $i.message-id, "message": $i.message}
The expected result is:
{ "mid": 2, "message": " dislike iphone its touch-screen is horrible" } { "mid": 13, "message": " dislike iphone the voice-command is bad:(" } { "mid": 15, "message": " like iphone the voicemail-service is awesome" }
Syntax:
like(string_expression, string_pattern)
Checks whether the string string_expression
contains the string pattern string_pattern
. Compared to the contains
function, the like
function also supports regular expressions.
Arguments:
string_expression
: A string
that might contain the pattern or null
.string_pattern
: A pattern string
that might be contained or null
.Return Value:
boolean
value, true
if string_expression
contains the pattern string_pattern
, and false
otherwise.Example:
use dataverse TinySocial; for $i in dataset('FacebookMessages') where like($i.message, "%at&t%") return $i.message
The expected result is:
" can't stand at&t the network is horrible:(" " can't stand at&t its plan is terrible" " love at&t its 3G is good:)"
Syntax:
starts-with(string_expression, substring_to_start_with)
Checks whether the string string_expression
starts with the string substring_to_start_with
.
Arguments:
string_expression
: A string
that might start with the given string.substring_to_start_with
: A string
that might be contained as the starting substring.Return Value:
boolean
, returns true
if string_expression
starts with the string substring_to_start_with
, and false
otherwise.Example:
use dataverse TinySocial; for $i in dataset('FacebookMessages') where starts-with($i.message, " like") return $i.message
The expected result is:
" like samsung the plan is amazing" " like t-mobile its platform is mind-blowing" " like verizon the 3G is awesome:)" " like iphone the voicemail-service is awesome"
Syntax:
ends-with(string_expression, substring_to_end_with)
Checks whether the string string_expression
ends with the string substring_to_end_with
.
Arguments:
string_expression
: A string
that might end with the given string.substring_to_end_with
: A string
that might be contained as the ending substring.Return Value:
boolean
, returns true
if string_expression
ends with the string substring_to_end_with
, and false
otherwise.Example:
use dataverse TinySocial; for $i in dataset('FacebookMessages') where ends-with($i.message, ":)") return $i.message
The expected result is:
" love sprint its shortcut-menu is awesome:)" " like verizon the 3G is awesome:)" " love at&t its 3G is good:)"
Syntax:
string-concat(list_expression)
Concatenates a list of strings list_expression
into a single string.
Arguments:
list_expression
: An OrderedList
or UnorderedList
of string
s (could be null
) to be concatenated.Return Value:
string
value.Example:
let $i := "ASTERIX" let $j := " " let $k := "ROCKS!" return string-concat([$i, $j, $k])
The expected result is:
"ASTERIX ROCKS!"
Syntax:
string-join(list_expression, string_expression)
Joins a list of strings list_expression
with the given separator string_expression
into a single string.
Arguments:
list_expression
: An OrderedList
or UnorderedList
of strings (could be null
) to be joined.string_expression
: A string
as the separator.Return Value:
String
.Example:
use dataverse TinySocial; let $i := ["ASTERIX", "ROCKS~"] return string-join($i, "!! ")
The expected result is:
"ASTERIX!! ROCKS~"
Syntax:
lowercase(string_expression)
Converts a given string string_expression
to its lowercase form.
Arguments:
string_expression
: A string
to be converted.Return Value:
string
as the lowercase form of the given string_expression
.Example:
use dataverse TinySocial; let $i := "ASTERIX" return lowercase($i)
The expected result is:
asterix
Syntax:
matches(string_expression, string_pattern)
Checks whether the strings string_expression
matches the given pattern string_pattern
.
Arguments:
string_expression
: A string
that might contain the pattern.string_pattern
: A pattern string
to be matched.Return Value:
boolean
, returns true
if string_expression
matches the pattern string_pattern
, and false
otherwise.Example:
use dataverse TinySocial; for $i in dataset('FacebookMessages') where matches($i.message, "dislike iphone") return $i.message
The expected result is:
" dislike iphone its touch-screen is horrible" " dislike iphone the voice-command is bad:("
Syntax:
replace(string_expression, string_pattern, string_replacement)
Checks whether the string string_expression
matches the given pattern string_pattern
, and replace the matched pattern string_pattern
with the new pattern string_replacement
.
Arguments:
string_expression
: A string
that might contain the pattern.string_pattern
: A pattern string
to be matched.string_replacement
: A pattern string
to be used as the replacement.Return Value:
string
that is obtained after the replacements.Example:
use dataverse TinySocial; for $i in dataset('FacebookMessages') where matches($i.message, " like iphone") return replace($i.message, " like iphone", "like android")
The expected result is:
"like android the voicemail-service is awesome"
Syntax:
string-length(string_expression)
Returns the length of the string string_expression
.
Arguments:
string_expression
: A string
or null
that represents the string to be checked.Return Value:
int32
that represents the length of string_expression
.Example:
use dataverse TinySocial; for $i in dataset('FacebookMessages') return {"mid": $i.message-id, "message-len": string-length($i.message)}
The expected result is:
{ "mid": 1, "message-len": 43 } { "mid": 2, "message-len": 44 } { "mid": 3, "message-len": 33 } { "mid": 4, "message-len": 43 } { "mid": 5, "message-len": 46 } { "mid": 6, "message-len": 43 } { "mid": 7, "message-len": 37 } { "mid": 8, "message-len": 33 } { "mid": 9, "message-len": 34 } { "mid": 10, "message-len": 50 } { "mid": 11, "message-len": 38 } { "mid": 12, "message-len": 52 } { "mid": 13, "message-len": 42 } { "mid": 14, "message-len": 27 } { "mid": 15, "message-len": 45 }
Syntax:
substring(string_expression, offset[, length])
Returns the substring from the given string string_expression
based on the given start offset offset
with the optional length
.
Arguments:
string_expression
: A string
to be extracted.offset
: An int32
as the starting offset of the substring in string_expression
.length
: (Optional) An int32
as the length of the substring.Return Value:
string
that represents the substring.Example:
use dataverse TinySocial; for $i in dataset('FacebookMessages') where string-length($i.message) > 50 return substring($i.message, 50)
The expected result is:
"G:("
Syntax:
substring-before(string_expression, string_pattern)
Returns the substring from the given string string_expression
before the given pattern string_pattern
.
Arguments:
string_expression
: A string
to be extracted.string_pattern
: A string
pattern to be searched.Return Value:
string
that represents the substring.Example:
use dataverse TinySocial; for $i in dataset('FacebookMessages') where contains($i.message, "iphone") return substring-before($i.message, "iphone")
The expected result is:
" dislike " " dislike " " like "
Syntax:
substring-after(string_expression, string_pattern)
Returns the substring from the given string string_expression
after the given pattern string_pattern
.
Arguments:
string_expression
: A string
to be extracted.string_pattern
: A string
pattern to be searched.Return Value:
string
that represents the substring.Example:
use dataverse TinySocial; for $i in dataset('FacebookMessages') where contains($i.message, "iphone") return substring-after($i.message, "iphone")
The expected result is:
" its touch-screen is horrible" " the voice-command is bad:(" " the voicemail-service is awesome"
Syntax:
create-point(latitude, longitude)
Creates the primitive type point
using latitude
and longitude
.
Arguments:
latitude
: A double
that represents the latitude.longitude
: A double
that represents the longitude.Return Value:
point
created using the latitude and longitude provided in latitude
and longitude
.Example:
use dataverse TinySocial; let $c := create-point(30.0,70.0) return {"point": $c}
The expected result is:
{ "point": point("30.0,70.0") }
Syntax:
create-line(point_expression1, point_expression2)
Creates the primitive type line
using point_expression1
and point_expression2
.
Arguments:
point_expression1
: A point
that represents the start point of the line.point_expression2
: A point
that represents the end point of the line.Return Value:
line
created using the points provided in point_expression1
and point_expression2
.Example:
use dataverse TinySocial; let $c := create-line(create-point(30.0,70.0), create-point(50.0,90.0)) return {"line": $c}
The expected result is:
{ "line": line("30.0,70.0 50.0,90.0") }
Syntax:
create-rectangle(point_expression1, point_expression2)
Creates the primitive type rectangle
using point_expression1
and point_expression2
.
Arguments:
point_expression1
: A point
that represents the lower-left point of the rectangle.point_expression2
: A point
that represents the upper-right point of the rectangle.Return Value:
rectangle
created using the points provided in point_expression1
and point_expression2
.Example:
use dataverse TinySocial; let $c := create-rectangle(create-point(30.0,70.0), create-point(50.0,90.0)) return {"rectangle": $c}
The expected result is:
{ "rectangle": rectangle("30.0,70.0 50.0,90.0") }
Syntax:
create-circle(point_expression, radius)
Creates the primitive type circle
using point_expression
and radius
.
Arguments:
point_expression
: A point
that represents the center of the circle.radius
: A double
that represents the radius of the circle.Return Value:
circle
created using the center point and the radius provided in point_expression
and radius
.Example:
use dataverse TinySocial; let $c := create-circle(create-point(30.0,70.0), 5.0) return {"circle": $c}
The expected result is:
{ "circle": circle("30.0,70.0 5.0") }
Syntax:
create-polygon(point_expression1, point_expression2, point_expression3, […, point_expressionn])
Creates the primitive type polygon
using an unlimited number of arguments point_expression1
, point_expression2
, ..., point_expressionn
. Note that at least three points should be specified.
Arguments:
point_expression1
/.../point_expressionn
: A point
that represents a vertex of the polygon.Return Value:
polygon
created using the points provided in point_expression1
, point_expression2
, ..., point_expressionn
.Example:
use dataverse TinySocial; let $c := create-polygon(create-point(1.0,1.0), create-point(2.0,2.0), create-point(3.0,3.0), create-point(4.0,4.0)) return {"polygon": $c}
The expected result is:
{ "polygon": polygon("1.0,1.0 2.0,2.0 3.0,3.0 4.0,4.0") }
Syntax:
point(string_expression)
Constructor function for the point
type by parsing a point string string_expression
Arguments:
string_expression
: The string
value representing a point value.Return Value:
point
value represented by the given string.Example:
use dataverse TinySocial; let $c := point("55.05,-138.04") return {"point": $c}
The expected result is:
{ "point": point("55.05,-138.04") }
Syntax:
line(string_expression)
Constructor function for line
type by parsing a line string string_expression
Arguments:
string_expression
: The string
value representing a line value.Return Value:
line
value represented by the given string.Example:
use dataverse TinySocial; let $c := line("55.05,-138.04 13.54,-138.04") return {"line": $c}
The expected result is:
{ "line": line("55.05,-138.04 13.54,-138.04") }
Syntax:
rectangle(string_expression)
Constructor function for rectangle
type by parsing a rectangle string string_expression
Arguments:
string_expression
: The string
value representing a rectangle value.Return Value:
rectangle
value represented by the given string.Example:
use dataverse TinySocial; let $c := rectangle("20.05,-125.0 40.67,-100.87") return {"rectangle": $c}
The expected result is:
{ "rectangle": rectangle("20.05,-125.0 40.67,-100.87") }
Syntax:
circle(string_expression)
Constructor function for circle
type by parsing a circle string string_expression
Arguments:
string_expression
: The string
value representing a circle value.Return Value:
circle
value represented by the given string.Example:
use dataverse TinySocial; let $c := circle("55.05,-138.04 10.0") return {"circle": $c}
The expected result is:
{ "circle": circle("55.05,-138.04 10.0") }
Syntax:
polygon(string_expression)
Constructor function for polygon
type by parsing a polygon string string_expression
Arguments:
string_expression
: The string
value representing a polygon value.Return Value:
polygon
value represented by the given string.Example:
use dataverse TinySocial; let $c := polygon("55.05,-138.04 13.54,-138.04 13.54,-53.31 55.05,-53.31") return {"polygon": $c}
The expected result is:
{ "polygon": polygon("55.05,-138.04 13.54,-138.04 13.54,-53.31 55.05,-53.31") }
Syntax:
get-x(point_expression) or get-y(point_expression)
Returns the x or y coordinates of a point point_expression
.
Arguments:
point_expression
: A point
.Return Value:
double
representing the x or y coordinates of the point point_expression
.Example:
use dataverse TinySocial; let $point := create-point(2.3,5.0) return {"x-coordinate": get-x($point), "y-coordinate": get-y($point)}
The expected result is:
{ "x-coordinate": 2.3d, "y-coordinate": 5.0d }
Syntax:
get-points(spatial_expression)
Returns an ordered list of the points forming the spatial object spatial_expression
.
Arguments:
spatial_expression
: A point
, line
, rectangle
, circle
, or polygon
.Return Value:
OrderedList
of the points forming the spatial object spatial_expression
.Example:
use dataverse TinySocial; let $line := create-line(create-point(100.6,99.4), create-point(-72.0,-76.9)) let $rectangle := create-rectangle(create-point(9.2,49.0), create-point(77.8,111.1)) let $polygon := create-polygon(create-point(1.0,1.0), create-point(2.0,2.0), create-point(3.0,3.0), create-point(4.0,4.0)) let $line_list := get-points($line) let $rectangle_list := get-points($rectangle) let $polygon_list := get-points($polygon) return {"line-first-point": $line_list[0], "line-second-point": $line_list[1], "rectangle-left-bottom-point": $rectangle_list[0], "rectangle-top-upper-point": $rectangle_list[1], "polygon-first-point": $polygon_list[0], "polygon-second-point": $polygon_list[1], "polygon-third-point": $polygon_list[2], "polygon-forth-point": $polygon_list[3]}
The expected result is:
{ "line-first-point": point("100.6,99.4"), "line-second-point": point("-72.0,-76.9"), "rectangle-left-bottom-point": point("9.2,49.0"), "rectangle-top-upper-point": point("77.8,111.1"), "polygon-first-point": point("1.0,1.0"), "polygon-second-point": point("2.0,2.0"), "polygon-third-point": point("3.0,3.0"), "polygon-forth-point": point("4.0,4.0") }
Syntax:
get-center(circle_expression) or get-radius(circle_expression)
Returns the center and the radius of a circle circle_expression
, respectively.
Arguments:
circle_expression
: A circle
.Return Value:
point
or double
, represent the center or radius of the circle circle_expression
.Example:
use dataverse TinySocial; let $circle := create-circle(create-point(6.0,3.0), 1.0) return {"circle-radius": get-radius($circle), "circle-center": get-center($circle)}
The expected result is:
{ "circle-radius": 1.0d, "circle-center": point("6.0,3.0") }
Syntax:
spatial-distance(point_expression1, point_expression2)
Returns the Euclidean distance between point_expression1
and point_expression2
.
Arguments:
point_expression1
: A point
.point_expression2
: A point
.Return Value:
double
as the Euclidean distance between point_expression1
and point_expression2
.Example:
use dataverse TinySocial; for $t in dataset('TweetMessages') let $d := spatial-distance($t.sender-location, create-point(30.0,70.0)) return {"point": $t.sender-location, "distance": $d}
The expected result is:
{ "point": point("47.44,80.65"), "distance": 20.434678857275934d } { "point": point("29.15,76.53"), "distance": 6.585089217315132d } { "point": point("37.59,68.42"), "distance": 7.752709203884797d } { "point": point("24.82,94.63"), "distance": 25.168816023007512d } { "point": point("32.84,67.14"), "distance": 4.030533463451212d } { "point": point("29.72,75.8"), "distance": 5.806754687430835d } { "point": point("39.28,70.48"), "distance": 9.292405501268227d } { "point": point("40.09,92.69"), "distance": 24.832321679617472d } { "point": point("47.51,83.99"), "distance": 22.41250097601782d } { "point": point("36.21,72.6"), "distance": 6.73231758015024d } { "point": point("46.05,93.34"), "distance": 28.325926286707734d } { "point": point("36.86,74.62"), "distance": 8.270671073135482d }
Syntax:
spatial-area(spatial_2d_expression)
Returns the spatial area of spatial_2d_expression
.
Arguments:
spatial_2d_expression
: A rectangle
, circle
, or polygon
.Return Value:
double
representing the area of spatial_2d_expression
.Example:
use dataverse TinySocial; let $circleArea := spatial-area(create-circle(create-point(0.0,0.0), 5.0)) return {"Area":$circleArea}
The expected result is:
{ "Area": 78.53981625d }
Syntax:
spatial-intersect(spatial_expression1, spatial_expression2)
Checks whether @arg1
and @arg2
spatially intersect each other.
Arguments:
spatial_expression1
: A point
, line
, rectangle
, circle
, or polygon
.spatial_expression2
: A point
, line
, rectangle
, circle
, or polygon
.Return Value:
boolean
representing whether spatial_expression1
and spatial_expression2
spatially overlap with each other.Example:
use dataverse TinySocial; for $t in dataset('TweetMessages') where spatial-intersect($t.sender-location, create-rectangle(create-point(30.0,70.0), create-point(40.0,80.0))) return $t
The expected result is:
{ "tweetid": "4", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("39.28,70.48"), "send-time": datetime("2011-12-26T10:10:00.000Z"), "referred-topics": {{ "sprint", "voice-command" }}, "message-text": " like sprint the voice-command is mind-blowing:)" } { "tweetid": "7", "user": { "screen-name": "ChangEwing_573", "lang": "en", "friends_count": 182, "statuses_count": 394, "name": "Chang Ewing", "followers_count": 32136 }, "sender-location": point("36.21,72.6"), "send-time": datetime("2011-08-25T10:10:00.000Z"), "referred-topics": {{ "samsung", "platform" }}, "message-text": " like samsung the platform is good" } { "tweetid": "9", "user": { "screen-name": "NathanGiesen@211", "lang": "en", "friends_count": 39339, "statuses_count": 473, "name": "Nathan Giesen", "followers_count": 49416 }, "sender-location": point("36.86,74.62"), "send-time": datetime("2012-07-21T10:10:00.000Z"), "referred-topics": {{ "verizon", "voicemail-service" }}, "message-text": " love verizon its voicemail-service is awesome" }
Syntax:
spatial-cell(point_expression1, point_expression2, x_increment, y_increment)
Returns the grid cell that point_expression1
belongs to.
Arguments:
point_expression1
: A point
representing the point of interest that its grid cell will be returned.point_expression2
: A point
representing the origin of the grid.x_increment
: A double
, represents X increments.y_increment
: A double
, represents Y increments.Return Value:
rectangle
representing the grid cell that point_expression1
belongs to.Example:
use dataverse TinySocial; for $t in dataset('TweetMessages') group by $c := spatial-cell($t.sender-location, create-point(20.0,50.0), 5.5, 6.0) with $t let $num := count($t) return { "cell": $c, "count": $num}
The expected result is:
{ "cell": rectangle("20.0,92.0 25.5,98.0"), "count": 1i64 } { "cell": rectangle("25.5,74.0 31.0,80.0"), "count": 2i64 } { "cell": rectangle("31.0,62.0 36.5,68.0"), "count": 1i64 } { "cell": rectangle("31.0,68.0 36.5,74.0"), "count": 1i64 } { "cell": rectangle("36.5,68.0 42.0,74.0"), "count": 2i64 } { "cell": rectangle("36.5,74.0 42.0,80.0"), "count": 1i64 } { "cell": rectangle("36.5,92.0 42.0,98.0"), "count": 1i64 } { "cell": rectangle("42.0,80.0 47.5,86.0"), "count": 1i64 } { "cell": rectangle("42.0,92.0 47.5,98.0"), "count": 1i64 } { "cell": rectangle("47.5,80.0 53.0,86.0"), "count": 1i64 }
AsterixDB supports queries with different similarity functions, including edit distance and Jaccard.
Syntax:
edit-distance(expression1, expression2)
Returns the edit distance of expression1
and expression2
.
Arguments:
expression1
: A string
or a homogeneous OrderedList
of a comparable item type.expression2
: The same type as expression1
.Return Value:
int32
that represents the edit distance between expression1
and expression2
.Example:
use dataverse TinySocial; for $user in dataset('FacebookUsers') let $ed := edit-distance($user.name, "Suzanna Tilson") where $ed <= 2 return $user
The expected result is:
{ "id": 7, "alias": "Suzanna", "name": "SuzannaTillson", "user-since": datetime("2012-08-07T10:10:00.000Z"), "friend-ids": {{ 6 }}, "employment": [ { "organization-name": "Labzatron", "start-date": date("2011-04-19"), "end-date": null } ] }
Syntax:
edit-distance-check(expression1, expression2, threshold)
Checks whether expression1
and expression2
have an edit distance within a given threshold. The “check” version of edit distance is faster than the "non-check" version because the former can detect whether two items satisfy a given threshold using early-termination techniques, as opposed to computing their real distance. Although possible, it is not necessary for the user to write queries using the “check” versions explicitly, since a rewrite rule can perform an appropriate transformation from a “non-check” version to a “check” version.
Arguments:
expression1
: A string
or a homogeneous OrderedList
of a comparable item type.expression2
: The same type as expression1
.threshold
: An int32
that represents the distance threshold.Return Value:
OrderedList
with two items:boolean
value representing whether expression1
and expression2
are similar.int32
that represents the edit distance of expression1
and expression2
if it is within the threshold, or 0 otherwise.Example:
use dataverse TinySocial; for $user in dataset('FacebookUsers') let $ed := edit-distance-check($user.name, "Suzanna Tilson", 2) where $ed[0] return $ed[1]
The expected result is:
2
Syntax:
similarity-jaccard(list_expression1, list_expression2)
Returns the Jaccard similarity of list_expression1
and list_expression2
.
Arguments:
list_expression1
: An UnorderedList
or OrderedList
.list_expression2
: An UnorderedList
or OrderedList
.Return Value:
float
that represents the Jaccard similarity of list_expression1
and list_expression2
.Example:
use dataverse TinySocial; for $user in dataset('FacebookUsers') let $sim := similarity-jaccard($user.friend-ids, [1,5,9]) where $sim >= 0.6f return $user
The expected result is:
{ "id": 3, "alias": "Emory", "name": "EmoryUnk", "user-since": datetime("2012-07-10T10:10:00.000Z"), "friend-ids": {{ 1, 5, 8, 9 }}, "employment": [ { "organization-name": "geomedia", "start-date": date("2010-06-17"), "end-date": date("2010-01-26") } ] } { "id": 10, "alias": "Bram", "name": "BramHatch", "user-since": datetime("2010-10-16T10:10:00.000Z"), "friend-ids": {{ 1, 5, 9 }}, "employment": [ { "organization-name": "physcane", "start-date": date("2007-06-05"), "end-date": date("2011-11-05") } ] }
Syntax:
similarity-jaccard-check(list_expression1, list_expression2, threshold)
Checks whether list_expression1
and list_expression2
have a Jaccard similarity greater than or equal to threshold. Again, the “check” version of Jaccard is faster than the "non-check" version.
Arguments:
list_expression1
: An UnorderedList
or OrderedList
.list_expression2
: An UnorderedList
or OrderedList
.threshold
: A float
that represents the similarity threshold.Return Value:
OrderedList
with two items:boolean
value representing whether list_expression1
and list_expression2
are similar.float
that represents the Jaccard similarity of list_expression1
and list_expression2
if it is greater than or equal to the threshold, or 0 otherwise.Example:
use dataverse TinySocial; for $user in dataset('FacebookUsers') let $sim := similarity-jaccard-check($user.friend-ids, [1,5,9], 0.6f) where $sim[0] return $sim[1]
The expected result is:
0.75f 1.0f
"~=
" is syntactic sugar for expressing a similarity condition with a given similarity threshold.
The similarity function and threshold for "~=
" are controlled via "set" directives.
The "~=
" operator returns a boolean
value that represents whether the operands are similar.
Example for Jaccard similarity:
use dataverse TinySocial; set simfunction "jaccard"; set simthreshold "0.6f"; for $user in dataset('FacebookUsers') where $user.friend-ids ~= [1,5,9] return $user
The expected result is:
{ "id": 3, "alias": "Emory", "name": "EmoryUnk", "user-since": datetime("2012-07-10T10:10:00.000Z"), "friend-ids": {{ 1, 5, 8, 9 }}, "employment": [ { "organization-name": "geomedia", "start-date": date("2010-06-17"), "end-date": date("2010-01-26") } ] } { "id": 10, "alias": "Bram", "name": "BramHatch", "user-since": datetime("2010-10-16T10:10:00.000Z"), "friend-ids": {{ 1, 5, 9 }}, "employment": [ { "organization-name": "physcane", "start-date": date("2007-06-05"), "end-date": date("2011-11-05") } ] }
Example for edit-distance similarity:
use dataverse TinySocial; set simfunction "edit-distance"; set simthreshold "2"; for $user in dataset('FacebookUsers') where $user.name ~= "Suzanna Tilson" return $user
The expected output is:
{ "id": 7, "alias": "Suzanna", "name": "SuzannaTillson", "user-since": datetime("2012-08-07T10:10:00.000Z"), "friend-ids": {{ 6 }}, "employment": [ { "organization-name": "Labzatron", "start-date": date("2011-04-19"), "end-date": null } ] }
Syntax:
word-tokens(string_expression)
Returns a list of word tokens of string_expression
.
Arguments:
string_expression
: A string
that will be tokenized.Return Value:
OrderedList
of string
word tokens.Example:
use dataverse TinySocial; for $t in dataset('TweetMessages') let $tokens := word-tokens($t.message-text) where $t.send-time >= datetime('2012-01-01T00:00:00') return { "tweetid": $t.tweetid, "word-tokens": $tokens }
The expected result is:
{ "tweetid": "9", "word-tokens": [ "love", "verizon", "its", "voicemail", "service", "is", "awesome" ] }
Syntax:
date(string_expression)
Constructor function for date
type by parsing a date string string_expression
.
Arguments:
string_expression
: The string
value representing a date value.Return Value:
date
value represented by the given string.Example:
{ "date-extended": date("2013-04-01"), "date-basic": date("20130401") }
The expected result is:
{ "date-extended": date("2013-04-01"), "date-basic": date("2013-04-01") }
Syntax:
time(string_expression)
Constructor function for time
type by parsing a time string string_expression
.
Arguments:
string_expression
: The string
value representing a time value.Return Value:
time
value represented by the given string.Example:
{ "time-extended": time("12:30:45.678+08:00"), "time-basic": time("123045678+0800") }
The expected result is:
{ "time-extended": time("04:30:45.678Z"), "time-basic": time("04:30:45.678Z") }
Syntax:
datetime(string_expression)
Constructor function for the datetime
type by parsing a datetime string string_expression
.
Arguments:
string_expression
: The string
value representing a datetime value.Return Value:
datetime
value represented by the given string.Example:
{ "datetime-extended": datetime("2013-04-01T12:30:45.678+08:00"), "datetime-basic": datetime("20130401T123045678+0800") }
The expected result is:
{ "datetime-extended": datetime("2013-04-01T04:30:45.678Z"), "datetime-basic": datetime("2013-04-01T04:30:45.678Z") }
Syntax:
interval-from-date(string_expression1, string_expression2)
Constructor function for the interval
type by parsing two date strings.
Arguments:
string_expression1
: The string
value representing the starting date.string_expression2
: The string
value representing the ending date.Return Value:
interval
value between the two dates.Example:
{"date-interval": interval-from-date("2012-01-01", "2013-04-01")}
The expected result is:
{ "date-interval": interval-date("2012-01-01, 2013-04-01") }
Syntax:
interval-from-time(string_expression1, string_expression2)
Constructor function for the interval
type by parsing two time strings.
Arguments:
string_expression1
: The string
value representing the starting time.string_expression2
: The string
value representing the ending time.Return Value:
interval
value between the two times.Example:
{"time-interval": interval-from-time("12:23:34.456Z", "233445567+0800")}
The expected result is:
{ "time-interval": interval-time("12:23:34.456Z, 15:34:45.567Z") }
Syntax:
interval-from-datetime(string_expression1, string_expression2)
Constructor function for interval
type by parsing two datetime strings.
Arguments:
string_expression1
: The string
value representing the starting datetime.string_expression2
: The string
value representing the ending datetime.Return Value:
interval
value between the two datetimes.Example:
{"datetime-interval": interval-from-datetime("2012-01-01T12:23:34.456+08:00", "20130401T153445567Z")}
The expected result is:
{ "datetime-interval": interval-datetime("2012-01-01T04:23:34.456Z, 2013-04-01T15:34:45.567Z") }
Syntax:
year/month/day/hour/minute/second/millisecond(temporal_expression)
Accessors for accessing fields in a temporal value
Arguments:
temporal_expression
: a temporal value represented as one of the following types: date
, datetime
, time
, and duration
.Return Value:
int32
value representing the field to be extracted.Example:
let $c1 := date("2010-10-30") let $c2 := datetime("1987-11-19T23:49:23.938") let $c3 := time("12:23:34.930+07:00") let $c4 := duration("P3Y73M632DT49H743M3948.94S") return {"year": year($c1), "month": month($c2), "day": day($c1), "hour": hour($c3), "min": minute($c4), "second": second($c2), "ms": millisecond($c4)}
The expected result is:
{ "year": 2010, "month": 11, "day": 30, "hour": 5, "min": 28, "second": 23, "ms": 94 }
Syntax:
adjust-datetime-for-timezone(datetime_expression, string_expression)
Adjusts the given datetime datetime_expression
by applying the timezone information string_expression
.
Arguments:
datetime_expression
: A datetime
value to be adjusted.string_expression
: A string
representing the timezone information.Return Value:
string
value representing the new datetime after being adjusted by the timezone information.Example:
use dataverse TinySocial; for $i in dataset('TweetMessages') return {"adjusted-send-time": adjust-datetime-for-timezone($i.send-time, "+08:00"), "message": $i.message-text}
The expected result is:
{ "adjusted-send-time": "2008-04-26T18:10:00.000+08:00", "message": " love t-mobile its customization is good:)" } { "adjusted-send-time": "2010-05-13T18:10:00.000+08:00", "message": " like verizon its shortcut-menu is awesome:)" } { "adjusted-send-time": "2006-11-04T18:10:00.000+08:00", "message": " like motorola the speed is good:)" } { "adjusted-send-time": "2011-12-26T18:10:00.000+08:00", "message": " like sprint the voice-command is mind-blowing:)" } { "adjusted-send-time": "2006-08-04T18:10:00.000+08:00", "message": " can't stand motorola its speed is terrible:(" } { "adjusted-send-time": "2010-05-07T18:10:00.000+08:00", "message": " like iphone the voice-clarity is good:)" } { "adjusted-send-time": "2011-08-25T18:10:00.000+08:00", "message": " like samsung the platform is good" } { "adjusted-send-time": "2005-10-14T18:10:00.000+08:00", "message": " like t-mobile the shortcut-menu is awesome:)" } { "adjusted-send-time": "2012-07-21T18:10:00.000+08:00", "message": " love verizon its voicemail-service is awesome" } { "adjusted-send-time": "2008-01-26T18:10:00.000+08:00", "message": " hate verizon its voice-clarity is OMG:(" } { "adjusted-send-time": "2008-03-09T18:10:00.000+08:00", "message": " can't stand iphone its platform is terrible" } { "adjusted-send-time": "2010-02-13T18:10:00.000+08:00", "message": " like samsung the voice-command is amazing:)" }
Syntax:
adjust-time-for-timezone(time_expression, string_expression)
Adjusts the given time time_expression
by applying the timezone information string_expression
.
Arguments:
time_expression
: A time
value to be adjusted.string_expression
: A string
representing the timezone information.Return Value:
string
value representing the new time after being adjusted by the timezone information.Example:
use dataverse TinySocial; for $i in dataset('TweetMessages') return {"adjusted-send-time": adjust-time-for-timezone(time-from-datetime($i.send-time), "+08:00"), "message": $i.message-text}
The expected result is:
{ "adjusted-send-time": "18:10:00.000+08:00", "message": " love t-mobile its customization is good:)" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " like verizon its shortcut-menu is awesome:)" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " like motorola the speed is good:)" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " like sprint the voice-command is mind-blowing:)" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " can't stand motorola its speed is terrible:(" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " like iphone the voice-clarity is good:)" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " like samsung the platform is good" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " like t-mobile the shortcut-menu is awesome:)" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " love verizon its voicemail-service is awesome" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " hate verizon its voice-clarity is OMG:(" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " can't stand iphone its platform is terrible" } { "adjusted-send-time": "18:10:00.000+08:00", "message": " like samsung the voice-command is amazing:)" }
Syntax:
calendar-duration-from-datetime(datetime_expression, duration_expression)
Gets a user-friendly representation of the duration duration_expression
based on the given datetime datetime_expression
.
Arguments:
datetime_expression
: A datetime
value to be used as the reference time point.duration_expression
: A duration
value to be converted.Return Value:
duration
value with the duration as duration_expression
but with a user-friendly representation.Example:
use dataverse TinySocial; for $i in dataset('TweetMessages') where $i.send-time > datetime("2011-01-01T00:00:00") return {"since-2011": subtract-datetime($i.send-time, datetime("2011-01-01T00:00:00")), "since-2011-user-friendly": calendar-duration-from-datetime($i.send-time, subtract-datetime($i.send-time, datetime("2011-01-01T00:00:00")))}
The expected result is:
{ "since-2011": duration("P359DT10H10M"), "since-2011-user-friendly": duration("P11M23DT10H10M") } { "since-2011": duration("P236DT10H10M"), "since-2011-user-friendly": duration("P7M23DT10H10M") } { "since-2011": duration("P567DT10H10M"), "since-2011-user-friendly": duration("P1Y6M18DT10H10M") }
Syntax:
calendar-duration-from-date(date_expression, duration_expression)
Gets a user-friendly representation of the duration duration_expression
based on the given date date_expression
.
Arguments:
date_expression
: A date
value to be used as the reference time point.duration_expression
: A duration
value to be converted.Return Value:
duration
value with the duration as duration_expression
but with a user-friendly representation.Example:
use dataverse TinySocial; for $i in dataset('TweetMessages') where $i.send-time > datetime("2011-01-01T00:00:00") return {"since-2011": subtract-datetime($i.send-time, datetime("2011-01-01T00:00:00")), "since-2011-user-friendly": calendar-duration-from-date(date-from-datetime($i.send-time), subtract-datetime($i.send-time, datetime("2011-01-01T00:00:00")))}
The expected result is:
{ "since-2011": duration("P359DT10H10M"), "since-2011-user-friendly": duration("P11M23DT10H10M") } { "since-2011": duration("P236DT10H10M"), "since-2011-user-friendly": duration("P7M23DT10H10M") } { "since-2011": duration("P567DT10H10M"), "since-2011-user-friendly": duration("P1Y6M18DT10H10M") }
Syntax:
current-date()
Gets the current date.
Arguments: None
Return Value:
date
value of the date when the function is called.Syntax:
current-time()
Get the current time
Arguments: None
Return Value:
time
value of the time when the function is called.Syntax:
current-datetime()
Get the current datetime
Arguments: None
Return Value:
datetime
value of the datetime when the function is called.Example:
use dataverse TinySocial; {"current-date": current-date(), "current-time": current-time(), "current-datetime": current-datetime()}
The expected result is:
{ "current-date": date("2013-04-06"), "current-time": time("00:48:44.093Z"), "current-datetime": datetime("2013-04-06T00:48:44.093Z") }
Syntax:
date-from-datetime(datetime_expression)
Gets the date value from the given datetime value datetime_expression
.
Arguments:
datetime_expression
: A datetime
value to be extracted from.Return Value:
date
value from the datetime.Syntax:
time-from-datetime(datetime_expression)
Get the time value from the given datetime value datetime_expression
Arguments:
datetime_expression
: A datetime
value to be extracted fromReturn Value:
time
value from the datetime.Example:
use dataverse TinySocial; for $i in dataset('TweetMessages') where $i.send-time > datetime("2011-01-01T00:00:00") return {"send-date": date-from-datetime($i.send-time), "send-time": time-from-datetime($i.send-time)}
The expected result is:
{ "send-date": date("2011-12-26"), "send-time": time("10:10:00.000Z") } { "send-date": date("2011-08-25"), "send-time": time("10:10:00.000Z") } { "send-date": date("2012-07-21"), "send-time": time("10:10:00.000Z") }
Syntax:
date-from-unix-time-in-days(numeric_expression)
Gets a date representing the time after numeric_expression
days since 1970-01-01.
Arguments:
numeric_expression
: A int8
/int16
/int32
value representing the number of days.Return Value:
date
value as the time after numeric_expression
days since 1970-01-01.Syntax:
datetime-from-unix-time-in-ms(numeric_expression)
Gets a datetime representing the time after numeric_expression
milliseconds since 1970-01-01T00:00:00Z.
Arguments:
numeric_expression
: A int8
/int16
/int32
/int64
value representing the number of milliseconds.Return Value:
datetime
value as the time after numeric_expression
milliseconds since 1970-01-01T00:00:00Z.Syntax:
time-from-unix-time-in-ms(numeric_expression)
Gets a time representing the time after numeric_expression
milliseconds since 00:00:00.000Z.
Arguments:
numeric_expression
: A int8
/int16
/int32
value representing the number of milliseconds.Return Value:
time
value as the time after numeric_expression
milliseconds since 00:00:00.000Z.Example:
use dataverse TinySocial; let $d := date-from-unix-time-in-days(15800) let $dt := datetime-from-unix-time-in-ms(1365139700000) let $t := time-from-unix-time-in-ms(3748) return {"date": $d, "datetime": $dt, "time": $t}
The expected result is:
{ "date": date("2013-04-05"), "datetime": datetime("2013-04-05T05:28:20.000Z"), "time": time("00:00:03.748Z") }
Syntax:
get-interval-start/get-interval-end(interval)
Gets the start/end of the given interval.
Arguments:
interval
: the interval to be accessed.Return Value:
time
, date
, or datetime
(depending on the time instances of the interval) representing the starting or ending time.Example:
let $itv := interval-start-from-date("1984-01-01", "P1Y") return {"start": get-interval-start($itv), "end": get-interval-end($itv)}
The expected result is:
{ "start": date("1984-01-01"), "end": date("1985-01-01") }