Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Web analytics: Correlating FTP traffic to HTTP sessions

IP.com Disclosure Number: IPCOM000016063D
Original Publication Date: 2002-Jun-15
Included in the Prior Art Database: 2003-Jun-21
Document File: 2 page(s) / 49K

Publishing Venue

IBM

Abstract

In the arena of Web Analytics, HTTP web traffic is often sessionized to yield visitor or session numbers. When a web client makes an HTTP request to a web server for an HTTP resource, the client request is typically recorded by the web server. These requests can then be analyzed by software to group the requests into visits (or sessions). Sessionization is the calculation of the number of visits to a web site. When a web site delivers downloadable content, this downloadable content can be served by an FTP server, whereas the web site's pages are usually served by an HTTP server. Therefore, when the HTTP server records client activity, it is recording HTTP traffic, not FTP traffic (or requests served by an FTP server). The FTP server records its own request activity independently of HTTP activity, in its own way. It is valuable for web analytics purposes to be able to correlate FTP requests to sessions 'realized' by analyzing the HTTP requests. Without correlation, analysis metrics related to FTP web site activity is limited to broad FTP metrics, such as number of downloads, bytes downloaded, return codes, etc. You can report on the broad metrics relative to the broad HTTP metrics, such as number of HTTP hits or visits. For example, you could see that during a given time period, your web site had X visitors and Y FTP download requests. You could even average the totals to determine an estimate of download requests per session. However, these are only broad metrics. What if you wanted exact numbers, instead of approximations. If you had the ability to correlate each FTP hit to an HTTP session, you could see exactly how many FTP download requests, bytes download, etc, occurred from the target set of sessions. Furthermore, you could pinpoint a subset of sessions, (such as those of user 'John Doe'), or a specific session (e.g. 'John Doe's session where he bought $1000 worth of product), and determine exactly which FTP download requests were made during those sessions, and information about these FTP requests, such as bytes donwloaded, time, return code, etc. These are just a few examples, but is should be easy to see how powerful this concept is, and how much better it is than broad/estimated metrics available without the ability to correlate each FTP request to HTTP sessions. The correlation can take place as follows:

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 36% of the total text.

Page 1 of 2

Web analytics: Correlating FTP traffic to HTTP sessions

In the arena of Web Analytics, HTTP web traffic is often sessionized to yield visitor or session numbers. When a web client makes an HTTP request to a web server for an HTTP resource, the client request is typically recorded by the web server. These requests can then be analyzed by software to group the requests into visits (or sessions). Sessionization is the calculation of the number of visits to a web site.

When a web site delivers downloadable content, this downloadable content can be served by an FTP server, whereas the web site's pages are usually served by an HTTP server. Therefore, when the HTTP server records client activity, it is recording HTTP traffic, not FTP traffic (or requests served by an FTP server). The FTP server records its own request activity independently of HTTP activity, in its own way.

It is valuable for web analytics purposes to be able to correlate FTP requests to sessions 'realized' by analyzing the HTTP requests. Without correlation, analysis metrics related to FTP web site activity is limited to broad FTP metrics, such as number of downloads, bytes downloaded, return codes, etc. You can report on the broad metrics relative to the broad HTTP metrics, such as number of HTTP hits or visits. For example, you could see that during a given time period, your web site had X visitors and Y FTP download requests. You could even average the totals to determine an estimate of download requests per session. However, these are only broad metrics. What if you wanted exact numbers, instead of approximations. If you had the ability to correlate each FTP hit to an HTTP session, you could see exactly how many FTP download requests, bytes download, etc, occurred from the target set of sessions. Furthermore, you could pinpoint a subset of sessions, (such as those of user 'John Doe'), or a specific session (e.g. 'John Doe's session where he bought $1000 worth of product), and determine exactly which FTP download requests were made during those sessions, and information about these FTP requests, such as bytes donwloaded, time, return code, etc. These are just a few examples, but is should be easy to see how powerful this concept is, and how much better it is than broad/estimated metrics available without the ability to correlate each FTP request to HTTP sessions.

The correlation can take place as follows:

Given separate servers: (1) HTTP servers with well defined request recording capabilities and limitations and (2) separate FTP servers (also with well defined request recording), a correlation mechanism is needed to associate the FTP download requests analyzed from the FTP output to the HTTP requests or HTTP sessions analyzed from the HTTP server's output. This mechanism would need to fit in the well defined request recording capabilities (or more appropriately - limitations) of each type of server.

Request recording capabilities of industry leading HTTP servers con...