additional modification on AsterixSimilarityQueries Page based on reformatted page by chen

commit: ec9816c46c48c61dcda3fc75ca1f27885db38fc0 [log] [tgz]
author: JIMAHN <jimahnok@gmail.com> Wed May 22 15:39:00 2013 -0700
committer: JIMAHN <jimahnok@gmail.com> Wed May 22 15:39:00 2013 -0700
tree: 4f5e7d175fff9cb36374f1e15010dc614d8cb765
parent: 601d2fbb47100e5cfc701830dfc82c9f11c86489 [diff]
diff --git a/asterix-doc/src/site/markdown/AsterixSimilarityQueries.md b/asterix-doc/src/site/markdown/AsterixSimilarityQueries.md
index a57db43..dde478c 100644
--- a/asterix-doc/src/site/markdown/AsterixSimilarityQueries.md
+++ b/asterix-doc/src/site/markdown/AsterixSimilarityQueries.md

@@ -16,12 +16,11 @@
 
 ## Data Types and Similarity Functions ## 
 
-AsterixDB supports [http://en.wikipedia.org/wiki/Levenshtein_distance
-edit distance]  (on strings) and
-[http://en.wikipedia.org/wiki/Jaccard_index Jaccard]  (on sets).  For
+AsterixDB supports [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) (on strings) and
+[Jaccard](http://en.wikipedia.org/wiki/Jaccard_index) (on sets).  For
 instance, in our
-[https://code.google.com/p/asterixdb/wiki/AdmAql101#ADM:_Modeling_Semistructed_Data_in_AsterixDB
-TinySocial] example, the `friend-ids` of a Facebook user forms a set
+[TinySocial](AdmAql101.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB)
+example, the `friend-ids` of a Facebook user forms a set
 of friends, and we can define a similarity between two sets of
 friends. We can also convert a string to a set of grams of a length q
 (called "n-grams") and define the Jaccard similarity between the two
@@ -30,15 +29,14 @@
 `schwarzenegger` are `sch`, `chw`, `hwa`, ..., `ger`.
 
 AsterixDB provides
-[https://code.google.com/p/asterixdb/wiki/AsterixDataTypesAndFunctions#Tokenizing_Functions
-tokenization functions] to convert strings to sets, and the
-[https://code.google.com/p/asterixdb/wiki/AsterixDataTypesAndFunctions#Similarity_Functions
-similarity functions].
+[tokenization functions](AsterixDataTypesAndFunctions.html#Tokenizing_Functions)
+to convert strings to sets, and the
+[similarity functions](AsterixDataTypesAndFunctions.html#Similarity_Functions).
 
-## Selection Queries ## 
+## Similarity Selection Queries ## 
 
-The following [https://code.google.com/p/asterixdb/wiki/AsterixDataTypesAndFunctions#edit-distance
-query] asks for all the Facebook users whose name is similar to
+The following [query](AsterixDataTypesAndFunctions.html#edit-distance)
+asks for all the Facebook users whose name is similar to
 `Suzanna Tilson`, i.e., their edit distance is at most 2.
 
         use dataverse TinySocial;
@@ -49,8 +47,8 @@
         return $user
 
 
-The following [https://code.google.com/p/asterixdb/wiki/AsterixDataTypesAndFunctions#similarity-jaccard
-query] asks for all the Facebook users whose set of friend ids is
+The following [query](AsterixDataTypesAndFunctions.html#similarity-jaccard)
+asks for all the Facebook users whose set of friend ids is
 similar to `[1,5,9]`, i.e., their Jaccard similarity is at least 0.6.
 
         use dataverse TinySocial;
@@ -83,8 +81,8 @@
 ## Similarity Join Queries ## 
 
 AsterixDB supports fuzzy joins between two data sets. The following
-[https://code.google.com/p/asterixdb/wiki/AdmAql101#Query_5_-_Fuzzy_Join
-query] finds, for each Facebook user, all Twitter users with names
+[query](AdmAql101.html#Query_5_-_Fuzzy_Join)
+finds, for each Facebook user, all Twitter users with names
 "similar" to their name based on the edit distance.
 
         use dataverse TinySocial;
@@ -107,7 +105,7 @@
 
 ## Using Indexes to Support Queries ## 
 
-AsterixDB uses a gram-based inverted index (called "n-gram") and
+AsterixDB uses a gram-based inverted index (called "ngram") and
 efficient algorithms to support similarity queries.  For a set of
 strings, we generate n-grams for each string, and build an inverted
 list for each n-gram that includes the ids of the strings with this
@@ -116,23 +114,23 @@
 occurrences of the string ids on these inverted lists.  The similar
 idea can be used to answer queries with Jaccard similarity.  A
 detailed description of these techniques is available at this
-[http://www.ics.uci.edu/~chenli/pub/icde2009-memreducer.pdf](paper).
+[paper](http://www.ics.uci.edu/~chenli/pub/icde2009-memreducer.pdf).
 
 For instance, the following DDL statement creates such an index on the
-`FacebookUser.name` attribute using an inverted index of 3-grams.
+`FacebookUsers.name` attribute using an inverted index of 3-grams.
 After the index is created, similarity queries with an edit distance
 condition on this attribute can be answered efficiently.
 
         use dataverse TinySocial;
         
-        create index fbUserFuzzyIdx on FacebookUsers(name) type n-gram(3);
+        create index fbUserIdx on FacebookUsers(name) type ngram(3);
 
 
-The number "3" in "n-gram(3)" is the length "n" in the grams. This
+The number "3" in "ngram(3)" is the length "n" in the grams. This
 index can be used to optimize similarity queries on this attribute
-using edit distance, or Jaccard queries on this attribute where the
+using [edit distance](AsterixDataTypesAndFunctions.html#edit-distance), or [Jaccard](AsterixDataTypesAndFunctions.html#similarity-jaccard) queries on this attribute where the
 similarity is defined on sets of 3-grams.  This index can also be used
-to optimize queries with the "contains()" predicate (i.e., substring
+to optimize queries with the "[contains()]((AsterixDataTypesAndFunctions.html#contains))" predicate (i.e., substring
 matching) since it can be also be solved by counting on the inverted
 list of the grams in the query string.
 
@@ -144,4 +142,4 @@
 
         use dataverse TinySocial;
         
-        create index fbUserFuzzyIdx on FacebookUsers(name) type partitioned n-gram(3);
+        create index fbUserFuzzyIdx on FacebookUsers(name) type fuzzy ngram(3);
commit	ec9816c46c48c61dcda3fc75ca1f27885db38fc0	[log] [tgz]
author	JIMAHN <jimahnok@gmail.com>	Wed May 22 15:39:00 2013 -0700
committer	JIMAHN <jimahnok@gmail.com>	Wed May 22 15:39:00 2013 -0700
tree	4f5e7d175fff9cb36374f1e15010dc614d8cb765
parent	601d2fbb47100e5cfc701830dfc82c9f11c86489 [diff]