Shared disk I/O cache
Original Publication Date: 2004-Jan-29
Included in the Prior Art Database: 2004-Jan-29
This paper presents a disk I/O Cache that can be shared among multiple operating systems running in a virtual machine environment. The cache we present here is designed to efficiently cache shared r/w data accessible in a storage area network.
Shared disk I/O cache
In future computing environments we expect a lot of operating system images running in a virtual machine to share data in a storage area network (san*). We use the abbreviation san (all lower case) to denote that we consider any kind of storage network not only FCP based SANs.). The virtual machine environment may either be based on HW partitioning or SW partitioning (e.g., z/VM or VMware). Virtualizing hundreds or thousands of images on a single hardware puts a particular burden on shared resources like memory, network capacity or storage I/O capacity of the adapters and their attached sans. In this paper we present a solution to reduce the required storage I/O capacity to a minimum. As a side effect average access time to shared data will be reduced too. As discussed in the conclusion our shared I/O cache may also be a foundation to reduce the usage of physical memory of a virtual system.
The architecture of our solution allows to implement performance critical parts
(e.g. LUN hashing) in a very thin virtualization layer, possibly in HW and moves the more elaborate control of the shared cache into the guest operation systems. Due to the distributed nature of our cache control our design is robust and resilient to failures of single components.
We have implemented a prototype of the shared I/O cache using Linux on zSeries running under z/VM. The cache was implemented using shared memory available to all z/VM guests (called DCSS) and a new block device driver to control cache accesses.
Previous solutions to this problem include the shared cache of VMware that supports data sharing in read only mode but implements a copy on write scheme in case data is to be modified . The mini disk cache available in z/VM  uses a write through mechanism. It has the virtualization engine as central point of control which may become a performance bottleneck and a single point of failure. In addition this kind of server functionality is too heavy to be implemented in HW.
Using Shared Memory for a Shared I/O Cache
In order to efficiently implement our shared I/O cache the virtualization layer must provide for main memory that may be mapped into the virtual address space of each guest operating system. Once mapped into its address space a guest operation must be able to read from this shared memory, to write into this shared memory and to perform at least one basic synchronization operation like test-and-set.
Referencing Blocks of Storage in a Storage Area Network
Our assumption is that a virtual machine running a lot of guest operating system is connected to some kind of san. In order to efficiently associate a cache entry with a block of data stored in the san each cache entry must be identified by a tag which is unique in the connected I/O network. This tag has to be associated with an I/O network unique definition of a single storage block or number of storage blocks.
For FCP this is a combination of: Port/N...