Timestamp Method for Ensuring that Content Matches Notification in a Pull-Based Web Content Distribution System
Original Publication Date: 2002-Apr-26
Included in the Prior Art Database: 2003-Jun-20
Presented is a mechanism for validating the freshness of web content in a push-pull web Content Distribution (CD) system . In a push-pull CD system , CD servers push notifications to web servers and caches in the Content Distribution Network (CDN). (For the rest of the document, web servers and caches are generically referred to as CD nodes) Each notification specifies a list of Uniform Resource Locators (URLs) that must be updated at each CD node that receives the notification. Upon receiving a notification, each CD node pulls web content from one or more CD nodes in an earlier wave [*]. CD nodes are partitioned into many waves. Each notification sent to a CD nodes in wave n, where n 1, pulls content from one of the nodes in waves 1 through n-1 (or from the origin server if n 1). A key problem to be solved in this design is to ensure that the most recent version of the content pulled in by a CD node from another CD node in an earlier wave. This problem is illustrated in the following example. Consider a web cache C in the third wave attempting to pull URL u from its preferred web server S in the second wave, as illustrated in Figure 1. During correct operation, S is expected to have the current version of URL u before C. But, because of the distributed nature of the network, S may not have the correct version of the content that C is attempting to pull. When the notification was distributed to the servers in the second wave, S may have momentarily been disconnected from the CDN, but may not have yet realized it (i.e., the appropriate timeouts may not have expired). Until the time S realizes that it has been disconnected from the CDN, S cannot prevent C from requesting and pulling an out-of-date version of URL u. (The order of occurrence of the various events leading to this scenario is illustrated in Figure 1.) The key problem is that C can never figure out that the content is out-of-date in this scenario. C is out-of-sync with the rest of the CDN but it will continue to serve out stale data to its clients. Downstream caches and browsers may then see invalid content for a long time, much longer than the failure time window at S. Since S can be any HTTP server or a proxy cache, we need an HTTP-based mechanism that can serve as a validation check, so that C can confirm that pulled content matches that of the notification. 1 A CDN is organized as a n-ary tree with the Master Content Distribution Server (MCDS) at the root of the tree.