University of Ulm
Computer Science
Verteilte Systeme/Distributed Systems
Projects   |   Publications   |   Teaching   |   Persons Search   |   Intranet

Current Position
University of Ulm
Computer Science
Distributed Systems Lab

Additional Topics
Project Description
Team and Contact
Student Projects

Fault-Tolerant Distributed Services


The goal of the framework for fault tolerance is to offer a generic and flexible architecture for developing reliable distributed applications. It is designed to interoperate seamlessly with the environment for distributed, adaptive services and to provide adequate mechanisms matched to the application's needs.

This means that the fault tolerance mechanisms have to support dynamic reconfiguration of a service regarding migration, partitioning, or replication. Furthermore the provided mechanisms should be adaptable as well, for example, regarding the type and number of tolerable faults. In addition to crash failures, the system should be able to tolerate Byzantine failures of parts of the system as well. This allows tolerating unanticipated erroneous behavior of software and hardware as well as malicious intrusions.

The core part is AGC, a fault tolerant group communication system that uses a generic consensus algorithm interface for providing totally ordered communication.

This group communication layer is used for implementing a passive and a active replication mechanism. For Byzantine fault tolerance, only active replication is supported.

To simplify application development, the necessary additional code for replicated services is created automatically. The service only has to provide one generic interface that allows transferring its state. This part is supported by composable source-code transformations, described in the next section.

Copyright © 2007 Distributed Systems Lab · Uni Ulm Imprint