Browse Prior Art Database

Web analytics: Correlating FTP traffic to HTTP sessions Disclosure Number: IPCOM000016063D
Original Publication Date: 2002-Jun-15
Included in the Prior Art Database: 2003-Jun-21

Publishing Venue



In the arena of Web Analytics, HTTP web traffic is often sessionized to yield visitor or session numbers. When a web client makes an HTTP request to a web server for an HTTP resource, the client request is typically recorded by the web server. These requests can then be analyzed by software to group the requests into visits (or sessions). Sessionization is the calculation of the number of visits to a web site. When a web site delivers downloadable content, this downloadable content can be served by an FTP server, whereas the web site's pages are usually served by an HTTP server. Therefore, when the HTTP server records client activity, it is recording HTTP traffic, not FTP traffic (or requests served by an FTP server). The FTP server records its own request activity independently of HTTP activity, in its own way. It is valuable for web analytics purposes to be able to correlate FTP requests to sessions 'realized' by analyzing the HTTP requests. Without correlation, analysis metrics related to FTP web site activity is limited to broad FTP metrics, such as number of downloads, bytes downloaded, return codes, etc. You can report on the broad metrics relative to the broad HTTP metrics, such as number of HTTP hits or visits. For example, you could see that during a given time period, your web site had X visitors and Y FTP download requests. You could even average the totals to determine an estimate of download requests per session. However, these are only broad metrics. What if you wanted exact numbers, instead of approximations. If you had the ability to correlate each FTP hit to an HTTP session, you could see exactly how many FTP download requests, bytes download, etc, occurred from the target set of sessions. Furthermore, you could pinpoint a subset of sessions, (such as those of user 'John Doe'), or a specific session (e.g. 'John Doe's session where he bought $1000 worth of product), and determine exactly which FTP download requests were made during those sessions, and information about these FTP requests, such as bytes donwloaded, time, return code, etc. These are just a few examples, but is should be easy to see how powerful this concept is, and how much better it is than broad/estimated metrics available without the ability to correlate each FTP request to HTTP sessions. The correlation can take place as follows: