fix fault-tolerance and error reporting to handle disk failures
24 files changed