Cleanup FileSplit and FileReference

This change gives FileSplit and FileReference specific meaning to
avoid confusion of an absolute vs relative, local vs global, inside
an IO device vs outside IO devices.

In addition, it enables better abstraction of global partitions and
delegate the responsibility of choosing which partition goes to which
IO device to the IO Manager through the introduction of FileDeviceComputer

In details:
Previously, the LocalResource in Hyracks had partition (storage partition)
and there is no such thing in Hyracks. This scope leak is bad. In addition
The local resource had a name and a path. they were always the same and so
the name was removed.
The storage partition was instead moved to asterixdb implementation of the
serialized object in the local resource.

With all of these changes, the cluster controller (compiler) only needs to
know about partitions and relative paths. It doesn't need to worry about
heterogenous Node setups and different io device configurations. For File
assignment to IO devices, a new interface (IFileDeviceComputer) was
introduced which can be overriden by applications to have their own
strategy for distributing files among IO devices.

Change-Id: I4fac508bf9af5a3bed41a3cf4464d2cbfecf2f61
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1352
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>
286 files changed
tree: 9a03931f408b713940a008236611d10a20dd8857
  1. .gitattributes
  2. .gitignore
  3. README.md
  4. asterixdb/
  5. build.xml
  6. hyracks-fullstack/
  7. pom.xml
README.md

#AsterixDB

AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. AsterixDB has:

  • A semistructured NoSQL style data model (ADM) resulting from extending JSON with object database ideas
  • Two expressive and declarative query languages (SQL++ and AQL) that support a broad range of queries and analysis over semistructured data
  • A parallel runtime query execution engine, Apache Hyracks, that has been scale-tested on up to 1000+ cores and 500+ disks
  • Partitioned LSM-based data storage and indexing to support efficient ingestion and management of semistructured data
  • Support for query access to externally stored data (e.g., data in HDFS) as well as to data stored natively by AsterixDB
  • A rich set of primitive data types, including spatial and temporal data in addition to integer, floating point, and textual data
  • Secondary indexing options that include B+ trees, R trees, and inverted keyword (exact and fuzzy) index types
  • Support for fuzzy and spatial queries as well as for more traditional parametric queries
  • Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store

Learn more about AsterixDB at [http://asterixdb.apache.org] (http://asterixdb.apache.org)

##Building AsterixDB

To build AsterixDB from source, you should have a platform with the following:

  • A Unix-ish environment (Linux, OS X, will all do).
  • git
  • Maven 3.3.9 or newer.
  • Java 8 or newer.

Instructions for building the master:

  • Checkout AsterixDB master:

      $git clone https://github.com/apache/asterixdb.git
    
  • Build AsterixDB master:

      $cd asterixdb
      $mvn clean package -DskipTests
    

##Running AsterixDB (on your machine from your build) Here are steps to get AsterixDB running on your local machine:

##Documentation

AsterixDB's official documentation resides at [https://ci.apache.org/projects/asterixdb/index.html] (https://ci.apache.org/projects/asterixdb/index.html). This is built from the maven project under asterix-doc/ as a maven site. The documentation on the official website refers to the most stable build version, so for pre-release versions one should refer to the compiled documentation.

##Support/Contact

If you have any questions, please feel free to ask on our mailing list, users@asterixdb.apache.org. Join the list by sending an email to users-subscribe@asterixdb.apache.org. If you are interested in the internals or developement of AsterixDB, also please feel free to subscribe to our developer mailing list, dev@asterixdb.apache.org, by sending an email to dev-subscribe@asterixdb.apache.org.