Collection System for Rapidly Rotated Webserver Logs
Original Publication Date: 2002-Oct-12
Included in the Prior Art Database: 2003-Jun-20
A solution is disclosed that is capable of collecting logs from webservers when those logs are rotated on the order of minutes. This solution introduces a set of log collection clients and a set of log collection servers which can be configured into a "tree" for handling log file transfers to a central location for later processing. This system guarantees reliable end-to-end transfer of log files from the clients to the central processing location. For a cluster of webservers operating at high volume, it can be advantageous for log processing if the webserver rapidly rotates it access logs. When webserver log rotation is done in the order of minutes, the number of log files created per day quickly increases into the tens of thousands of logs when operating a high volume website. In order to process these logs (a CPU and I/O intense activity), it is desirable to transfer the logging data form the webserver node to a log processing node. Current systems addressing this have several problems: current systems have a difficult time determining if a log is ready to be transferred, and thus cannot rapidly process the logs. This can result in a slowdown in log transfer and subsequent log processing, or alternatively could result in a log transfer which is done before the webserver has completed writing the log, resulting in lost data. current systems do not guarantee end-to-end transfer of the log file. current systems, if they perform log file compression, generally perform that compression on the webserver node, which can negatively impact the performance of the webserver itself. This solution addresses these shortcomings by introducing a log collection client and a log collection server. The log collection client is responsible for determining when a log file is available to be transferred, computing a checksum for the log file, transferring the log file to the log collection server using TCP/IP, receiving acknowledgements from the log collection server, and removing the log file from the webserver node. The client determines if the log file is available by forcing the webserver to use timestamped log files, being configured with a log rotation interval calculation, and by adhering to the webserver held lock on the log file. The client does not delete the logfile until the acknowledgement from the log collection server comes back as successful. The log collection server is responsible for receiving the log file, verifying the log file checksum, optionally compressing the log file, optionally storing the log file, optionally transferring the log file to yet another collector, issuing a successful acknowledgement to the client (or downstream collector) if all actions succeeded, or issuing a failure message to the client if any action failed. Note that multiple log collection clients can be "connected" to one log collection server, and log collection servers can be chained with other log collection servers to build a hierarchical log delivery structure. If the file is to be stored, it is stored in a directory structure corresponding to the "site" that the log was assigned to, and the storage name contains the hostname of the log collection client.