Implicit replication in a network file server
Abstract
The design and implementation of a highly available network file server (HA-NFS) is reported. It is implemented on a network of workstations from the IBM RISC System/6000 family. HA-NFS servers preserve the semantics of the NFS protocol and can be used by existing NFS clients without modification. Therefore, existing application programs can benefit from high availability without alteration. HA-NFS achieves storage reliability by (optionally) replicating files on different disks. However, all copies of the same file are controlled by a single server, reducing the cost of ensuring consistency. To achieve server reliability, each server is implicitly replicated by a backup that can access the server's disks if the server fails. During normal operation, the backup monitors the liveness of the server but does not maintain information about the server's internal state. Each server maintains a disk log that records state information normally kept in memory. The disk log also records the changes to file-system structures. If the server fails, the backup will take over the server's disk, use the disk log to restore the server's file system to a consistent state, and reconstruct the server's prefailure volatile state. The backup will then impersonate the failed server and service requests on its behalf. The failure of the server is oblivious to the clients, which continue to send their requests to the failed server's address. Since the backup itself is a server for a different set of disks, operation continues with reduced performance. If two networks are available, the network can be implicitly replicated.