Failure Handling

Next: The Logging Option in Up: Design of TSS Previous: Integrated Device

Failure Handling

As TSS uses the RAID storage in all the 3 tiers, it can provide fault tolerance against a single disk failure. If an I/O on any one of the underlying devices fail, then TSS device should switch to the degraded mode, and avoid all the I/Os to the particular device, and use the redundancy in TSS device instead to handle the I/Os. If more than one of the underlying disks fail then the TSS device should exit gracefully.

Further, when the degraded disk is replaced by a working disk, the new disk has to be synched with the other disks.

In TSS, when any I/O on a particular device gives soft errors, that device should to be noted. If this happens regularly, then the device has to be marked as bad and all I/O to the disk is to be avoided, until the device is reconfigured.

Currently work is in progress on providing suitable failure handling mechanisms for TSS.

2001-09-13