Cache Management using SLA Policies and QoS Criteria
Original Publication Date: 2002-Sep-15
Included in the Prior Art Database: 2003-Jun-21
Cache management techniques for web caches and host memory attempt to increase the benefit of cached content by maximizing the hit rate of object requests. Traditional methods for achieving better hit rates have been to use a Least Recently Used or Least Frequently Used algorithm for determining which pages to evict from the cache. These techniques consider cache entries from all services independently when choosing entries to evict from the cache. We propose a cache management enhancement that includes the notion of multiple services hosted on a single server. Using a web server as an example, the cache management system can be improved in two distinct areas to provide an increased ability to meet the demands of multiple customers. The first improvement is to separate the partitioning of the cache into services, such that each service is guaranteed enough cache memory to meet the minimum requirements of the Service Level Agreement in the cache. Using customer response time as a metric for service level agreements, it is possible to show that the amount of memory allocated to a particular service affects the cache hit/miss rates for the request streams of a given service. The cache miss rate directly affects the number of requests that flow from the cache to the storage system (and increase the time necessary to generate a response). By adjusting the cache memory allocated to a particular service it is possible to control the average response time of client requests (note that this can be performed dynamically based on changing arrival rates and also depends on the popularity distribution characteristics of a given web request stream). After the memory allocation is performed to meet the minimum requirements of the SLA's for all hosted services, additional memory allocation decisions can be made based on the perceived benefit in terms of response time for each service (again, this decision can be performed based on the observed request arrival rate, popularity distribution and average object size). A second improvement involves using a Quality of Service notion when deciding which pages to evict from the cache. Just as load balancing decisions are performed based on quality of service, it is possible for the cache to use these QoS characterizations for determining which of a set of objects should be removed from a cache. (Note that this idea can be combined with the SLA notion so that the set of candidate objects are all from a single service, or it can be implemented independently of SLA's such that the entire cache replacement algorithm incorporates the concept of QoS in the replacement algorithm). In the preferred embodiment, the QoS input is used as a weight in addition to existing criteria such as LFU in making the final eviction decision. By using this QoS notion we are able to provide an increased ability to offer differentiated services to clients.