Browse Prior Art Database

Automatic Categorization of IT Infrastructure Service Management Data using Natural Language Processing and Machine Learning Disclosure Number: IPCOM000245200D
Publication Date: 2016-Feb-18
Document File: 6 page(s) / 70K

Publishing Venue

The Prior Art Database



The proposed solution is an integrated and automated system which can analyze textual description fields of service management tickets, transform data by combining rows and columns based on the patterns, cleanse data by removing unwanted fields and phrases, extract keywords from relevant transformed but unstructured data field using natural language processing, categorize defects into groups using machine learning by extracting features, auto-generate rules from machine learning output, combine extracted information, rules and reuse for future, produce integrated metric report from each and every step stated above without human intervention to reduce effort, time and cost. The system is for enabling cognitive computing in the IT infrastructure domain. It has the capabilities gathers and processes the historical and real time unstructured ticketing data of each specific nature of business domains. It can understand the data, learning on its own, providing quick multi-level of classification insights with a responsive model for any future incoming streaming data. It also can provide high accuracy on unstructured text data entered by ticketing tools.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 32% of the total text.

Page 01 of 6

Automatic Categorization of IT Infrastructure Service Management Data using Natural Language Processing and Machine Learning


• Every organization needs to use unstructured IPC (Incident | Problem | Change) or IT infrastructure service management data to

• understand major problem drivers, • forecast call volumes, • predict resource requirements, • provide problem insights to team which help to be aware of the recurring problems and fix it permanently.

• Analyzing unstructured data in incidents and service requests - e.g. summary, description, notes, symptoms etc. to:

• Identify hotspots and trends to proactively plan for staffing and automation • Get deeper automated analysis of ticket to improve end user satisfaction • Ensure faster problem diagnosis using ticket context to reduce mean time for resolution • Current tools and technologies do not cater to the high volume of IPC data, need of real time predictions, learning on its own, providing insights and decisions.

Drawback of Existing Solutions

Need for New Solution

limited number of patterns to classify the problem

many rules are framed to give better results and accuracy with confidence

a problem can be classified only to a particular category with the existing solutions

at initial stage multiple category is assigned but in the sequence of process, each conflict will be reduced based on the case and reduce algorithm process

no automated way of rule engine update

automated way of rule engine update and re-train to get better results

the algorithms are only a pattern matching (normal general regular expression)

sequence of rule based process to exactly point out the problem by classifying it appropriately

no hierarchy based categorization

hierarchy based categorization on the problem

no dictionary is being maintained to reduce the noise of the data

dictionary is maintained to reduce the noise of the data


Page 02 of 6

no user interface for end users to manage

user interface has been provided to manage it by end user

Information Retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing.

Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human computer interaction.

Statistical Modeling is a method for formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more other variables. The model is statistical as the variables are not deterministically but stochastically related.

n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, s...