ASTERIXDB-1556, ASTERIXDB-1733: Hash Group By and Hash Join conform to the memory budget

 - External Hash Group By and Hash Join now conform to the memory budget (compiler.groupmemory and compiler.joinmemory)
 - For Optimzed Hybrid Hash Join, we calculate the expected hash table size when the build phase is done and
   try to spill one or more partitions if the freespace can't afford the hash table size.
 - For External Hash Group By, the number of hash entries (hash table size) is calculated based on
   an estimation of the aggregated tuple size and possible hash values for the given field size in that tuple.
 - Garbage Collection feature has been added to SerializableHashTable. For external hash group-by,
   whenever we spill a data partition to the disk, we also check the ratio of garbage in the hash table.
   If it's greater than the given threshold, we conduct a GC on Hash Table.

Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1056
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
BAD: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Reviewed-by: Yingyi Bu <buyingyi@gmail.com>
42 files changed
tree: cba219e154ed7760cc89d798777cc728eb9836f2
  1. .gitattributes
  2. .gitignore
  3. README.md
  4. asterixdb/
  5. build.xml
  6. hyracks-fullstack/
  7. pom.xml
README.md

#AsterixDB

AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. AsterixDB has:

  • A semistructured NoSQL style data model (ADM) resulting from extending JSON with object database ideas
  • Two expressive and declarative query languages (SQL++ and AQL) that support a broad range of queries and analysis over semistructured data
  • A parallel runtime query execution engine, Apache Hyracks, that has been scale-tested on up to 1000+ cores and 500+ disks
  • Partitioned LSM-based data storage and indexing to support efficient ingestion and management of semistructured data
  • Support for query access to externally stored data (e.g., data in HDFS) as well as to data stored natively by AsterixDB
  • A rich set of primitive data types, including spatial and temporal data in addition to integer, floating point, and textual data
  • Secondary indexing options that include B+ trees, R trees, and inverted keyword (exact and fuzzy) index types
  • Support for fuzzy and spatial queries as well as for more traditional parametric queries
  • Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store

Learn more about AsterixDB at [http://asterixdb.apache.org] (http://asterixdb.apache.org)

##Building AsterixDB

To build AsterixDB from source, you should have a platform with the following:

  • A Unix-ish environment (Linux, OS X, will all do).
  • git
  • Maven 3.3.9 or newer.
  • Java 8 or newer.

Instructions for building the master:

  • Checkout AsterixDB master:

      $git clone https://github.com/apache/asterixdb.git
    
  • Build AsterixDB master:

      $cd asterixdb
      $mvn clean package -DskipTests
    

##Running AsterixDB (on your machine from your build) Here are steps to get AsterixDB running on your local machine:

##Documentation

AsterixDB's official documentation resides at [https://ci.apache.org/projects/asterixdb/index.html] (https://ci.apache.org/projects/asterixdb/index.html). This is built from the maven project under asterix-doc/ as a maven site. The documentation on the official website refers to the most stable build version, so for pre-release versions one should refer to the compiled documentation.

##Support/Contact

If you have any questions, please feel free to ask on our mailing list, users@asterixdb.apache.org. Join the list by sending an email to users-subscribe@asterixdb.apache.org. If you are interested in the internals or developement of AsterixDB, also please feel free to subscribe to our developer mailing list, dev@asterixdb.apache.org, by sending an email to dev-subscribe@asterixdb.apache.org.