[NO ISSUE][STO] Misc Storage Fixes and Improvements

- user model changes: no
- storage format changes: no
- interface changes: yes

Details:
- This change introduces some improvements to storage
  operations.
- Local RecoveryManager is now extensible.
- Bulk loaders now call the IO callback similar to
  Flushes, making them less special and creating a
  unified lifecycle for adding an index component.
- As a result, The IndexCheckpointManager doesn't need
  to have a special treatment for components loaded
  through the bulk load operation.
- Component Id have been added to the index checkpoint
  files.
- Cleanup for the code of local recovery for failed flush
  operations.
- Ensure that after local recovery of flushes, primary
  and secondary indexes have the same index for mutable
  memory component.
- The use of WAIT logs to ensure in-flight flushes
  are scheduled didn't work as expected. A new log type
  WAIT_FOR_FLUSHES was introduced to acheive the expected
  behavior.
- The local test framework was made Extensible to support
  more use cases.
- Test cases were added for component ids in checkpoint files.
  The following scenarios were covered:
  - Primary and secondary both have values when a flush is
    shceduled.
  - Primary have values but not secondary when a flush is
    scheduled.
  - Primary is empty and an index is created through bulk
    load.
  - Primary has a single component and secondary is created
    through bulk load.
  - Primary has multiple components and secondary is created
    through bulk load.
- Each primary opTracker now keeps a list of ongoing flushes.
- FlushDataset now waits only for flushes only and
  not all io operations.
- Previously, we had many flushes scheduled on open datasets.
  This was not detected but after this change, a failure
  is thrown in such cases.
- Flush operations dont need to extend the comparable
  interface anymore since they are FIFO per index.

Change-Id: If24c9baaac2b79e7d1acf47fa2601767388ce988
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2632
Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Reviewed-by: Murtadha Hubail <mhubail@apache.org>
89 files changed
tree: cf6eef9dbb3eb14793df6861d5af8a310f57e319
  1. .gitattributes
  2. .gitignore
  3. README.md
  4. asterixdb/
  5. build.xml
  6. hyracks-fullstack/
  7. pom.xml
README.md

What is AsterixDB?

AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. AsterixDB has:

  • Data model
    A semistructured NoSQL style data model (ADM) resulting from extending JSON with object database ideas

  • Query languages
    Two expressive and declarative query languages (SQL++ and AQL) that support a broad range of queries and analysis over semistructured data

  • Scalability
    A parallel runtime query execution engine, Apache Hyracks, that has been scale-tested on up to 1000+ cores and 500+ disks

  • Native storage
    Partitioned LSM-based data storage and indexing to support efficient ingestion and management of semistructured data

  • External storage
    Support for query access to externally stored data (e.g., data in HDFS) as well as to data stored natively by AsterixDB

  • Data types
    A rich set of primitive data types, including spatial and temporal data in addition to integer, floating point, and textual data

  • Indexing
    Secondary indexing options that include B+ trees, R trees, and inverted keyword (exact and fuzzy) index types

  • Transactions
    Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store

Learn more about AsterixDB at its website.

Build from source

To build AsterixDB from source, you should have a platform with the following:

  • A Unix-ish environment (Linux, OS X, will all do).
  • git
  • Maven 3.3.9 or newer.
  • Oracle JDK 8 or newer.

Instructions for building the master:

  • Checkout AsterixDB master:

      $git clone https://github.com/apache/asterixdb.git
    
  • Build AsterixDB master:

      $cd asterixdb
      $mvn clean package -DskipTests
    

Run the build on your machine

Here are steps to get AsterixDB running on your local machine:

  • Start a single-machine AsterixDB instance:

      $cd asterixdb/asterix-server/target/asterix-server-*-binary-assembly/
      $./opt/local/bin/start-sample-cluster.sh
    
  • Good to go and run queries in your browser at:

      http://localhost:19001
    
  • Read more documentations to learn the data model, query language, and how to create a cluster instance.

Documentation

Community support