Browse Prior Art Database

Method and System for Automatic Log Discovery, Identification, Collection and Parsing

IP.com Disclosure Number: IPCOM000250171D
Publication Date: 2017-Jun-07
Document File: 4 page(s) / 201K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed are a method and system for automatic log discovery, identification, collection, and parsing in real time for log file management.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

1

Method and System for Automatic Log Discovery, Identification, Collection, and Parsing

Abstract

Disclosed are a method and system for automatic log discovery, identification, collection, and parsing in real time for log file management.

In the information technology industry, log management is very important for tracking and analyzing a system status in real time. Essential components of log management are the efficient discovery, identification, collection, and parsing of logged contents and files. Log management is important for sensitive log file identification. Log collection, classification, storage, and management are import for all IT services. Log management is an expensive endeavor for most Global 2000 companies. Sensitive data may be structured or unstructured, but the users do not distinguish, nor do the regulations.

Log management and server monitoring are requirements and essential components in modern health cloud business due to:

• Health Insurance Portability and Accountability Act/Good Practices (HIPAA/GxP) compliance

• Operation Optimization • Performance Optimization • Auditing

However, there are a lot of unsolved problems and issues in the area. For instance, regulations (such as HIPAA) require all logs to be collected for auditing. With the current approach, service/component developers provide a list of known logs and a logging/monitoring/alerting team has a list of default logs to be collected for operating system (OS)/application/middleware, etc.). The current approach cannot guarantee that all logs are collected and are slow to respond to server software stack changes. A third- party middleware/application may generate logs that are unknown to developers. The servers are consistently repurposed with new logs being identified; therefore, it is necessary to focus on how to automatically identify logs on the server and add those for collection.

The novel solution is a method and system for automatic log discovery, identification, collection, and parsing in real time for log file management.

The logging system implementation offers the following features: • Monitor all files being created/modified/deleted by active processes • Identify a new candidate log file accordingly • Construct two-levels (character and word) of representations for input files for

better feature generations

2

• Define a deep learning-based natural language processing (NLP) module for identifying whether a file is a log file (model called hierarchical long short term memory (LSTM)

• Provi...