The goal of the framework for fault tolerance is to offer a generic and
flexible architecture for developing reliable distributed applications. It is
designed to interoperate seamlessly with the environment for distributed,
adaptive services and to provide adequate mechanisms matched to the
application's needs.
This means that the fault tolerance mechanisms have to support dynamic
reconfiguration of a service regarding migration, partitioning, or replication.
Furthermore the provided mechanisms should be adaptable as well, for example,
regarding the type and number of tolerable faults. In addition to crash
failures, the system should be able to tolerate Byzantine failures of parts of
the system as well. This allows tolerating unanticipated erroneous behavior of
software and hardware as well as malicious intrusions.
The core part is
AGC, a fault tolerant group communication system that uses a
generic consensus algorithm interface for providing totally ordered
communication.
This group communication layer is used for implementing a passive and a active
replication mechanism. For Byzantine fault tolerance, only active replication
is supported.
To simplify application development, the necessary additional code for
replicated services is created automatically. The service only has to provide
one generic interface that allows transferring its state. This part is
supported by composable source-code transformations, described in the next
section.