Table of Contents

Quick Start

The fastest way to get set up with a single-machine sample instance of AsterixDB is to use the included sample helper scripts. To do so, in the extracted asterix-server directory, navigate to opt/local/bin/

user@localhost:~/
$cd asterix-server/
user@localhost:~/asterix-server
$cd opt/local/bin

This folder should contain 4 scripts, two pairs of .sh and .bat files respectively. start-sample-cluster.sh will simply start a basic sample cluster using the configuration files located in opt/local/conf/.

user@localhost:~/a/o/l/bin
$./start-sample-cluster.sh
CLUSTERDIR=/home/user/asterix-server/opt/local
INSTALLDIR=/home/user/asterix-server
LOGSDIR=/home/user/asterix-server/samples/opt/logs

INFO: Starting sample cluster...
INFO: Waiting up to 30 seconds for cluster 127.0.0.1:19002 to be available.
INFO: Cluster started and is ACTIVE.
user@localhost:~/a/o/l/bin
$

Now, there should be a running AsterixDB cluster on the machine. To go to the Web Interface, visit http://localhost:19001

Starting a small single-machine cluster using NCService

The above cluster was started using a script, but below is a description in detail of how precisely this was achieved. The config files here are analagous to the ones within samples/local/conf.

When running a cluster using the NCService there are 3 different kinds of processes involved:

  • NCDriver, also known as the Node Controller or NC for short. This is the process that does the actual work of queries and data management within the AsterixDB cluster
  • NCService configures and starts the NCDriver process. It is a simple daemon whose only purpose is to wait for the CCDriver process to call upon it to initiate cluster bootup.
  • CCDriver, which is the Cluster Controller process, also known as the CC. This process manages the distribution of tasks to all NCs, as well as providing the parameters of each NC to the NCService upon cluster startup. It also hosts the Web interface and query compiler and optimizer.

The cluster startup follows a particular sequence, which is as follows:

  1. Each host on which an NC is desired and is mentioned in the configuration, the NCService daemon is started first. It will listen and wait for the CC to contact it.
  2. The one host on which the CC is to be placed is started with an appropriate configuration file.
  3. The CC contacts all NCService daemons and the NCService subsequently starts and NCDriver process with the configration supplied by the CC
  4. Each NCDriver then contacts the CC to register itself as started

This process is briefly illustrated in the diagram below:

To start a small cluster consisting of 2 NodeControllers (red and blue) and 1 ClusterController (cc) on a single machine only 2 configuration files are required. The first one is

blue.conf:

[ncservice]
port=9091

It is a configuration file for the second NCService. This contains only the port that the NCService of the second NodeControllers listens to as it is non-standard. The first NCService does not need a configuration file, as it only uses default parameters. In a distributed environment with 1 NodeController per machine, no NCService needs a configuration file.

The second configuration file is

cc.conf:

[nc/red]
txn.log.dir=/tmp/asterix/red/txnlog
core.dump.dir=/tmp/asterix/red/coredump
iodevices=/tmp/asterix/red

[nc/blue]
port=9091
txn.log.dir=/tmp/asterix/blue/txnlog
core.dump.dir=/tmp/asterix/blue/coredump
iodevices=/tmp/asterix/blue

[nc]
app.class=org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint
address=127.0.0.1
command=asterixnc

[cc]
address = 127.0.0.1
console.listen.port = 12345

This is the configuration file for the cluster and it contains information that each NCService will use when starting the corresponding NCDriver as well as information for the CCDriver.

To start the cluster simply use the following steps

  1. Change directory into the asterix-server binary folder

    user@localhost:~/
    $cd asterix-server/
    user@localhost:~/asterix-server
    $cd samples/local/bin
    
  2. Start the 2 NCServices for red and blue.

    user@localhost:~/asterix-server
    $bin/asterixncservice -config-file blue.conf > blue-service.log 2>&1 &
    user@localhost:~/asterix-server
    $bin/asterixncservice >red-service.log 2>&1 &
    
  3. Start the CCDriver.

    user@localhost:~/asterix-server
    $bin/asterixcc -config-file cc.conf > cc.log 2>&1 &
    

The CCDriver will connect to the NCServices and thus initiate the configuration and the start of the NCDrivers. After running these scripts, jps should show a result similar to this:

user@localhost:~/asterix-server
$jps
13184 NCService
13200 NCDriver
13185 NCService
13186 CCDriver
13533 Jps
13198 NCDriver

The logs for the NCDrivers will be in $BASEDIR/logs.

To stop the cluster again simply run

$ kill `jps | egrep '(CDriver|NCService)' | awk '{print $1}'`

to kill all processes.

Deploying AsterixDB via NCService in a multi-machine setup

Deploying on multiple machines only differs in the configuration file and where each process is actually resident. Take for example a deployment on 3 machines, cacofonix-1,cacofonix-2,and cacofonix-3. cacofonix-1 will be the CC, and cacofonix-2 and cacofonix-3 will be the two NCs, respectively. The configuration would be as follows:

cc.conf:

[nc/red]
txn.log.dir=/lv_scratch/asterix/red/txnlog
core.dump.dir=/lv_scratch/asterix/red/coredump
iodevices=/lv_scratch/asterix/red
address=cacofonix-2

[nc/blue]
txn.log.dir=/lv_scratch/asterix/blue/txnlog
core.dump.dir=/lv_scratch/asterix/blue/coredump
iodevices=/lv_scratch/asterix/blue
address=cacofonix-3

[nc]
app.class=org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint
storagedir=storage
command=asterixnc

[cc]
address = cacofonix-1

To deploy, first the asterix-server binary must be present on each machine. Any method to transfer the archive to each machine will work, but here scp will be used for simplicity's sake.

user@localhost:~
$for f in {1,2,3}; scp asterix-server.zip cacofonix-$f:~/; end

Then unzip the binary on each machine. First, start the NCService processes on each of the slave machines. Any way of getting a shell on the machine is fine, be it physical or via ssh.

user@cacofonix-2 12:41:42 ~/asterix-server/
$ bin/asterixncservice > red-service.log 2>&1 &


user@cacofonix-3 12:41:42 ~/asterix-server/
$ bin/asterixncservice > blue-service.log 2>&1 &

Now that each NCService is waiting, the CC can be started.

user@cacofonix-1 12:41:42 ~/asterix-server/
$ bin/asterixcc -config-file cc.conf > cc.log 2>&1 &

The cluster should now be started and the Web UI available on the CC host at the default port.

Available Configuration Parameters

The following parameters are for the master process, under the "[cc]" section.

SectionParameterMeaningDefault
ccactive.portThe listen port of the active server19003
ccaddressDefault bind address for all services on this cluster controller127.0.0.1
ccapi.portThe listen port of the API server19002
ccapp.classApplication CC main classorg.apache.asterix.hyracks.bootstrap.CCApplication
ccclient.listen.addressSets the IP Address to listen for connections from clientssame as address
ccclient.listen.portSets the port to listen for connections from clients1098
cccluster.listen.addressSets the IP Address to listen for connections from NCssame as address
cccluster.listen.portSets the port to listen for connections from node controllers1099
cccluster.public.addressAddress that NCs should use to contact this CCsame as cluster.listen.address
cccluster.public.portPort that NCs should use to contact this CCsame as cluster.listen.port
cccluster.topologySets the XML file that defines the cluster topology<undefined>
ccconsole.listen.addressSets the listen address for the Cluster Controllersame as address
ccconsole.listen.portSets the http port for the Cluster Controller)16001
cccores.multiplierThe factor to multiply by the number of cores to determine maximum query concurrent execution level3
ccheartbeat.max.missesSets the maximum number of missed heartbeats before a node is marked as dead5
ccheartbeat.periodSets the time duration between two heartbeats from each node controller in milliseconds10000
ccjob.history.sizeLimits the number of historical jobs remembered by the system to the specified value10
ccjob.manager.classSpecify the implementation class name for the job managerorg.apache.hyracks.control.cc.job.JobManager
ccjob.queue.capacityThe maximum number of jobs to queue before rejecting new jobs4096
ccjob.queue.classSpecify the implementation class name for the job queueorg.apache.hyracks.control.cc.scheduler.FIFOJobQueue
ccprofile.dump.periodSets the time duration between two profile dumps from each node controller in milliseconds; 0 to disable0
ccresult.sweep.thresholdThe duration within which an instance of the result cleanup should be invoked in milliseconds60000
ccresult.ttlLimits the amount of time results for asynchronous jobs should be retained by the system in milliseconds86400000
ccroot.dirSets the root folder used for file operations${java.io.tmpdir}/asterixdb/ClusterControllerService
ccweb.portThe listen port of the legacy query interface19001
ccweb.queryinterface.portThe listen port of the query web interface19006

The following parameters for slave processes, under "[nc]" sections.

SectionParameterMeaningDefault
ncaddressDefault IP Address to bind listeners on this NC. All services will bind on this address unless a service-specific listen address is supplied.127.0.0.1
ncapp.classApplication NC Main Classorg.apache.asterix.hyracks.bootstrap.NCApplication
nccluster.addressCluster Controller address (required unless specified in config file)<undefined>
nccluster.connect.retriesNumber of attempts to contact CC before giving up5
nccluster.listen.addressIP Address to bind cluster listener on this NCsame as address
nccluster.listen.portIP port to bind cluster listener0
nccluster.portCluster Controller port1099
nccluster.public.addressPublic IP Address to announce cluster listenersame as public.address
nccluster.public.portPublic IP port to announce cluster listenersame as cluster.listen.port
nccommandCommand NCService should invoke to start the NCDriverhyracksnc
nccore.dump.dirThe directory where node core dumps should be written${java.io.tmpdir}/asterixdb/coredump
ncdata.listen.addressIP Address to bind data listenersame as address
ncdata.listen.portIP port to bind data listener0
ncdata.public.addressPublic IP Address to announce data listenersame as public.address
ncdata.public.portPublic IP port to announce data listenersame as data.listen.port
nciodevicesComma separated list of IO Device mount points${java.io.tmpdir}/asterixdb/iodevice
ncjvm.argsJVM args to pass to the NCDriver<undefined>
ncmessaging.listen.addressIP Address to bind messaging listenersame as address
ncmessaging.listen.portIP port to bind messaging listener0
ncmessaging.public.addressPublic IP Address to announce messaging listenersame as public.address
ncmessaging.public.portPublic IP port to announce messaging listenersame as messaging.listen.port
ncncservice.addressAddress the CC should use to contact the NCService associated with this NCsame as public.address
ncncservice.pidPID of the NCService which launched this NCDriver-1
ncncservice.portPort the CC should use to contact the NCService associated with this NC9090
ncnet.buffer.countNumber of network buffers per input/output channel1
ncnet.thread.countNumber of threads to use for Network I/O1
ncpublic.addressDefault public address that other processes should use to contact this NC. All services will advertise this address unless a service-specific public address is supplied.same as address
ncresult.listen.addressIP Address to bind dataset result distribution listenersame as address
ncresult.listen.portIP port to bind dataset result distribution listener0
ncresult.manager.memoryMemory usable for result caching at this Node Controller in bytes-1 (-1 B)
ncresult.public.addressPublic IP Address to announce dataset result distribution listenersame as public.address
ncresult.public.portPublic IP port to announce dataset result distribution listenersame as result.listen.port
ncresult.sweep.thresholdThe duration within which an instance of the result cleanup should be invoked in milliseconds60000
ncresult.ttlLimits the amount of time results for asynchronous jobs should be retained by the system in milliseconds86400000
ncstorage.buffercache.maxopenfilesThe maximum number of open files in the buffer cache2147483647
ncstorage.buffercache.pagesizeThe page size in bytes for pages in the buffer cache131072 (128 kB)
ncstorage.buffercache.sizeThe size of memory allocated to the disk buffer cache. The value should be a multiple of the buffer cache page size.1/4 of the JVM allocated memory
ncstorage.lsm.bloomfilter.falsepositiverateThe maximum acceptable false positive rate for bloom filters associated with LSM indexes0.01
ncstorage.memorycomponent.globalbudgetThe size of memory allocated to the memory components. The value should be a multiple of the memory component page size1/4 of the JVM allocated memory
ncstorage.memorycomponent.numcomponentsThe number of memory components to be used per lsm index2
ncstorage.memorycomponent.pagesizeThe page size in bytes for pages allocated to memory components131072 (128 kB)
ncstorage.metadata.memorycomponent.numpagesThe number of pages to allocate for a metadata memory component8
nctxn.log.dirThe directory where transaction logs should be stored${java.io.tmpdir}/asterixdb/txn-log

The following parameters are configured under the "[common]" section.

SectionParameterMeaningDefault
commonactive.memory.global.budgetThe memory budget (in bytes) for the active runtime67108864 (64 MB)
commoncompiler.framesizeThe page size (in bytes) for computation32768 (32 kB)
commoncompiler.groupmemoryThe memory budget (in bytes) for a group by operator instance in a partition33554432 (32 MB)
commoncompiler.joinmemoryThe memory budget (in bytes) for a join operator instance in a partition33554432 (32 MB)
commoncompiler.parallelismThe degree of parallelism for query execution. Zero means to use the storage parallelism as the query execution parallelism, while other integer values dictate the number of query execution parallel partitions. The system will fall back to use the number of all available CPU cores in the cluster as the degree of parallelism if the number set by a user is too large or too small0
commoncompiler.sortmemoryThe memory budget (in bytes) for a sort operator instance in a partition33554432 (32 MB)
commoncompiler.textsearchmemoryThe memory budget (in bytes) for an inverted-index-search operator instance in a partition33554432 (32 MB)
commonlog.levelThe logging level for master and slave processesWARNING
commonmax.wait.active.clusterThe max pending time (in seconds) for cluster startup. After the threshold, if the cluster still is not up and running, it is considered unavailable60
commonmessaging.frame.countNumber of reusable frames for NC to NC messaging512
commonmessaging.frame.sizeThe frame size to be used for NC to NC messaging4096 (4 kB)
commonmetadata.callback.portIP port to bind metadata callback listener (0 = random port)0
commonmetadata.listen.portIP port to bind metadata listener (0 = random port)0
commonmetadata.nodethe node which should serve as the metadata node<undefined>
commonmetadata.registration.timeout.secshow long in seconds to wait for the metadata node to register with the CC60
commonreplication.log.batchsizeThe size in bytes to replicate in each batch4096 (4 kB)
commonreplication.log.buffer.numpagesThe number of log buffer pages8
commonreplication.log.buffer.pagesizeThe size in bytes of each log buffer page131072 (128 kB)
commonreplication.max.remote.recovery.attemptsThe maximum number of times to attempt to recover from a replica on failure before giving up5
commonreplication.timeoutThe time in seconds to timeout when trying to contact a replica, before assuming it is dead15
commonstorage.max.active.writable.datasetsThe maximum number of datasets that can be concurrently modified8
commontxn.commitprofiler.enabledEnable output of commit profiler logsfalse
commontxn.commitprofiler.reportintervalInterval (in seconds) to report commit profiler logs5
commontxn.job.recovery.memorysizeThe memory budget (in bytes) used for recovery67108864 (64 MB)
commontxn.lock.escalationthresholdThe maximum number of entity locks to obtain before upgrading to a dataset lock1000
commontxn.lock.shrinktimerThe time (in milliseconds) where under utilization of resources will trigger a shrink phase5000
commontxn.lock.timeout.sweepthresholdInterval (in milliseconds) for checking lock timeout10000
commontxn.lock.timeout.waitthresholdTime out (in milliseconds) of waiting for a lock60000
commontxn.log.buffer.numpagesThe number of pages in the transaction log tail8
commontxn.log.buffer.pagesizeThe page size (in bytes) for transaction log buffer131072 (128 kB)
commontxn.log.checkpoint.historyThe number of checkpoints to keep in the transaction log0
commontxn.log.checkpoint.lsnthresholdThe checkpoint threshold (in terms of LSNs (log sequence numbers) that have been written to the transaction log, i.e., the length of the transaction log) for transaction logs67108864 (64 MB)
commontxn.log.checkpoint.pollfrequencyThe frequency (in seconds) the checkpoint thread should check to see if a checkpoint should be written120
commontxn.log.partitionsizeThe maximum size (in bytes) of each transaction log file268435456 (256 MB)

For the optional NCService process configuration file, the following parameters, under "[ncservice]" section.

ParameterMeaningDefault
addressThe address the NCService listens on for commands from the CC(all addresses)
portThe port for the NCService listens on for commands from the CC9090
logdirDirectory where NCService logs should be written ('-' indicates that output should go to stdout)${app.home}/logs (${user.home} if 'app.home' not present in NCService Java system properties.