ASTERIXDB-1778: Optimize the edit-distance-check function
- Only calculate 2 * (threshold + 1) cells, rather than all cells per row.
- Terminate the calculation steps early when it become obvious that
the possible edit-distance value is greater than the given threshold.
There is no reason to compute all cells in the 2 dimensional array.
- Move the location of IListIterator to Hyracks since we now have
a CharacterIterator in a String. Change the name to ISequenceIterator.
- Add the section for the function in the manual.
- Remove letter counting filtering method since it is only applicable for
the string in ASCII range (0 ~ 127).
Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1481
Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
BAD: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Reviewed-by: Jianfeng Jia <jianfeng.jia@gmail.com>
diff --git a/asterixdb/asterix-fuzzyjoin/pom.xml b/asterixdb/asterix-fuzzyjoin/pom.xml
index 0539782..9485852 100644
--- a/asterixdb/asterix-fuzzyjoin/pom.xml
+++ b/asterixdb/asterix-fuzzyjoin/pom.xml
@@ -82,6 +82,10 @@
<groupId>org.apache.hyracks</groupId>
<artifactId>hyracks-util</artifactId>
</dependency>
+ <dependency>
+ <groupId>org.apache.hyracks</groupId>
+ <artifactId>hyracks-data-std</artifactId>
+ </dependency>
</dependencies>
</project>