Browse Prior Art Database

A method to recover administrative agents with memory in distributed computing environment

IP.com Disclosure Number: IPCOM000235522D
Publication Date: 2014-Mar-06
Document File: 4 page(s) / 126K

Publishing Venue

The IP.com Prior Art Database

Abstract

This invention provides a method that can recover the administrative agents with memory from outage. After the agents back to service, it can automatically fetch the status of the tasks that were started before the outage. When the central controller asks for the tasks running status, the agents can then provide the correct result.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 4

A method to recover administrative agents with memory in distributed computing environment

In the distributed computing environment, the central controller usually controls the whole system through administrative agents which are installed on the distributed nodes. The central controller communicates with the agents, sends requests to run tasks on the nodes and get the task running results from the agents.

Let's look into this common process:


1. If no abnormal situation happens, the whole process is like this:

1


Page 02 of 4


1) The central controller sends a request to the agent to run a task on the node.


2) The agent receives the request and makes the node to start the task.


3) The node runs the task.


4) The task finishes with successful or failed status and the agent gets it.


5) The central controller tries to get the task running result from the agent and it gets the correct result.

If something wrong with the agent, the whole process is like this:


1) The central controller sends a request to the agent to run a task on the node.


2) The agent receive the request and make the node to start the task.


3) The node runs the task.


4) For some reason (planned or unplanned), the agent is down while the node is still working on the task.


5) The agent recovers itself.


6) The central controller tries to get the task running result from the agent, but the result is wrong (maybe always failed).

The reason why the controller can't get the task running result is that when the agent recovers from the outage, it doesn't know the status of the tasks or even doesn't know there is any running task before it's outage.

This invention provides a method that can recover the administrative agents with memory from outage. After the agents back to service, it can automatically fetch the status of the tasks that were started before the outage. When the central controller asks for the tasks running status, the agents can then provide the correct result.

2


Page 03 of 4

The following describes how this invention works (in...