1. fix the node failure scenario in job scheduler;  2. add fault-tolerance support and tests in pregelix
19 files changed