Software exploitation of a fault-tolerant computer with a large memory
Abstract
The DM/6000 hardware (a prototype, fault-tolerant RS/6000 built at the T J Watson Research Center) provides fault tolerance and a large, nonvolatile main memory. Running a commercial, general-purpose operating system on it, of itself, does nothing to increase software availability. In fact, the time to rebuild the contents of a large memory may decrease availability.We describe our techniques for hiding most of the main memory, which requires the operating system to access it only by way of services separate from the operating system. This can allow the memory and those access services to achieve much higher availability, which, in turn, increases the availability of the system as a whole. We also performed simulation studies to determine those conditions where this system organization can lead to improved performance for recoverable database applications.