additional modification on AsterixSimilarityQueries Page based on reformatted page by chen
diff --git a/asterix-doc/src/site/markdown/AsterixSimilarityQueries.md b/asterix-doc/src/site/markdown/AsterixSimilarityQueries.md
index a57db43..dde478c 100644
--- a/asterix-doc/src/site/markdown/AsterixSimilarityQueries.md
+++ b/asterix-doc/src/site/markdown/AsterixSimilarityQueries.md
@@ -16,12 +16,11 @@
## Data Types and Similarity Functions ##
-AsterixDB supports [http://en.wikipedia.org/wiki/Levenshtein_distance
-edit distance] (on strings) and
-[http://en.wikipedia.org/wiki/Jaccard_index Jaccard] (on sets). For
+AsterixDB supports [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) (on strings) and
+[Jaccard](http://en.wikipedia.org/wiki/Jaccard_index) (on sets). For
instance, in our
-[https://code.google.com/p/asterixdb/wiki/AdmAql101#ADM:_Modeling_Semistructed_Data_in_AsterixDB
-TinySocial] example, the `friend-ids` of a Facebook user forms a set
+[TinySocial](AdmAql101.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB)
+example, the `friend-ids` of a Facebook user forms a set
of friends, and we can define a similarity between two sets of
friends. We can also convert a string to a set of grams of a length q
(called "n-grams") and define the Jaccard similarity between the two
@@ -30,15 +29,14 @@
`schwarzenegger` are `sch`, `chw`, `hwa`, ..., `ger`.
AsterixDB provides
-[https://code.google.com/p/asterixdb/wiki/AsterixDataTypesAndFunctions#Tokenizing_Functions
-tokenization functions] to convert strings to sets, and the
-[https://code.google.com/p/asterixdb/wiki/AsterixDataTypesAndFunctions#Similarity_Functions
-similarity functions].
+[tokenization functions](AsterixDataTypesAndFunctions.html#Tokenizing_Functions)
+to convert strings to sets, and the
+[similarity functions](AsterixDataTypesAndFunctions.html#Similarity_Functions).
-## Selection Queries ##
+## Similarity Selection Queries ##
-The following [https://code.google.com/p/asterixdb/wiki/AsterixDataTypesAndFunctions#edit-distance
-query] asks for all the Facebook users whose name is similar to
+The following [query](AsterixDataTypesAndFunctions.html#edit-distance)
+asks for all the Facebook users whose name is similar to
`Suzanna Tilson`, i.e., their edit distance is at most 2.
use dataverse TinySocial;
@@ -49,8 +47,8 @@
return $user
-The following [https://code.google.com/p/asterixdb/wiki/AsterixDataTypesAndFunctions#similarity-jaccard
-query] asks for all the Facebook users whose set of friend ids is
+The following [query](AsterixDataTypesAndFunctions.html#similarity-jaccard)
+asks for all the Facebook users whose set of friend ids is
similar to `[1,5,9]`, i.e., their Jaccard similarity is at least 0.6.
use dataverse TinySocial;
@@ -83,8 +81,8 @@
## Similarity Join Queries ##
AsterixDB supports fuzzy joins between two data sets. The following
-[https://code.google.com/p/asterixdb/wiki/AdmAql101#Query_5_-_Fuzzy_Join
-query] finds, for each Facebook user, all Twitter users with names
+[query](AdmAql101.html#Query_5_-_Fuzzy_Join)
+finds, for each Facebook user, all Twitter users with names
"similar" to their name based on the edit distance.
use dataverse TinySocial;
@@ -107,7 +105,7 @@
## Using Indexes to Support Queries ##
-AsterixDB uses a gram-based inverted index (called "n-gram") and
+AsterixDB uses a gram-based inverted index (called "ngram") and
efficient algorithms to support similarity queries. For a set of
strings, we generate n-grams for each string, and build an inverted
list for each n-gram that includes the ids of the strings with this
@@ -116,23 +114,23 @@
occurrences of the string ids on these inverted lists. The similar
idea can be used to answer queries with Jaccard similarity. A
detailed description of these techniques is available at this
-[http://www.ics.uci.edu/~chenli/pub/icde2009-memreducer.pdf](paper).
+[paper](http://www.ics.uci.edu/~chenli/pub/icde2009-memreducer.pdf).
For instance, the following DDL statement creates such an index on the
-`FacebookUser.name` attribute using an inverted index of 3-grams.
+`FacebookUsers.name` attribute using an inverted index of 3-grams.
After the index is created, similarity queries with an edit distance
condition on this attribute can be answered efficiently.
use dataverse TinySocial;
- create index fbUserFuzzyIdx on FacebookUsers(name) type n-gram(3);
+ create index fbUserIdx on FacebookUsers(name) type ngram(3);
-The number "3" in "n-gram(3)" is the length "n" in the grams. This
+The number "3" in "ngram(3)" is the length "n" in the grams. This
index can be used to optimize similarity queries on this attribute
-using edit distance, or Jaccard queries on this attribute where the
+using [edit distance](AsterixDataTypesAndFunctions.html#edit-distance), or [Jaccard](AsterixDataTypesAndFunctions.html#similarity-jaccard) queries on this attribute where the
similarity is defined on sets of 3-grams. This index can also be used
-to optimize queries with the "contains()" predicate (i.e., substring
+to optimize queries with the "[contains()]((AsterixDataTypesAndFunctions.html#contains))" predicate (i.e., substring
matching) since it can be also be solved by counting on the inverted
list of the grams in the query string.
@@ -144,4 +142,4 @@
use dataverse TinySocial;
- create index fbUserFuzzyIdx on FacebookUsers(name) type partitioned n-gram(3);
+ create index fbUserFuzzyIdx on FacebookUsers(name) type fuzzy ngram(3);