Smart Fault Handling and Recovery System
Publication Date: 2010-Nov-18
The IP.com Prior Art Database
Fault handling is important in software systems. The common mechanism is that the software predefine the possible fault types and their corresponding fault handling actions in source code during the development phase. The fault types and handling action cannot be changed after software delivery, so whenever customers have new fault types or new fault handling requirement, the software has to be patched by source code update. This is inconvenience, time consuming and may lead to extra expense for customers. This disclosure raises a new framework and completed system, "Smart Fault Handling and Recovery System". It gives customer the chance to define their own fault types, the corresponding fault handling rules, and can provide the fault handling and recovery modules. The new framework can smartly identify the fault types during system running, pickup the corresponding customer fault handling rule to run. With the new framework, even after the system is delivered, customer can customize the system fault handling behaviors any time, to meet their new requirements, without the changes to the system source code.
Page 01 of 6
Smart Fault Handling and Recovery System
In most of the recent application software, fault handling becomes more and more popular. The common mechanism is that the software predefines some known fault/exception types. Whenever the fault really happens, the related module will be triggered to handle it.
. It can solve some problems, however
(1) It's very hard to define the huge number of fault types, especially in the distributed or middleware software. Because those software or systems do not run alone, instead they are co-
working with other systems to exchange data, perform transaction
which lose the flexibility to handle complex situations particularly when the
And another side is various databases which could be of
any type. Because databases have many vendors and types, it's very hard to predefine the fault types in the monitoring and analyzing system, so that whenever fault happens, it's very hard to handle and not possible to do automatic recovery. In most case, human needs to be involved to solve that and restart the process.
Due to the limitation of current fault handling mechanism, this disclosure raises a new approach and completed system, "Smart Fault Handling and Recovery System". It gives customer the chance to define their own fault types, the corresponding fault handling rules and fault handling and recovery modules.
Also along with customer's business running, customers can update the
rules repository as needed, and share the rule repository with other systems.
The "Smart Fault Handling and Recovery System" is a completed system to do:
Provide a development toolkit to automatically insert the smart fault handling code into software code during development phase.
The fault types and fault handling rules can be configured any time after system delivery by customer based on the business requirement
Do smart fault analysis to Identify the fault types, and handle faults according to the customized rules during runtime Integrate customer's fault handling or recovery modules during runtime
And some software may also provide functions to recover or retry the fault process
it has limitations as listed below:
and etc. The faults coming from other systems usually cannot be pre-defined.
(2) The process to handle the fault is pre-defined,
situation is keeping changed.
(3) It's very hard to recover the cross-system faults as each system's fault handling are pre-defined and cannot change.
Take example, there are two systems working together. One is the business data monitoring and analyzing software,
connect to various databases to collect business data and make analysis.
Page 02 of 6
Dynamic fault handling rules adjustment during runtime base on the fault handling results
The fault handling rules are stored in repository. The rules can be shared across systems. The rule repository can be in the form of XML file, database or else
The "Smart Fault Handling and Recovery System" is composited of 3 parts: