A Method to Execute Interact Operation in a High Performance Computing Cluster Disclosure Number: IPCOM000247570D
Publication Date: 2016-Sep-18
Document File: 8 page(s) / 210K

In high performance computing environment, some users require to execute interactive operation, this operation need a stable long connection between server and client. But in a unstable environment such as DHCP (Dynamic host configuration protocol) network, the connection maybe broken sometimes and impact the user's operation. This paper provide a method to resolve this problem, it will transfer user’s operation to REST (Representational State Transfer ) web service call, and use HTTP (Hyper Text Transport Protocol)/ HTTPS (HyperText Transfer Protocol Secure) protocol to communicate with each other, it can help user to execute interactive operation in a unstable network smoothly.

A Method to Execute Interact Operation in a High Performance Computing Cluster

In high performance computing cluster, some customers want to execute interact operation, where the execution of such operations is a challenging problem that requires the special tools and services. Current workload management product has two methods to execute interact operationin a high performance computing environment: method (A) is to install the cluster client program to make this machine as a client node in the high performance computing cluster, and this client only handle user's interact operation, but do not take part in the computing task; method (B) is to connect thehigh performance computing cluster via SSH (Secure Shell) tool, and execute user's interact operation on computing node directly.

Method (A): make client machine as a client node in the high performance computing cluster

Method (B): connect high performance computing cluster via SSH tool

However, the above methods have some problems in practice. Using method (A), except the computing node, high performance computing cluster need spend extra resources to monitor and manage the client nodes, and it will impact the performance of computing cluster. Especially in some enterprise network, the


client machine uses the dynamic IP (Internet Protocol) address. When the IP address of a client machine is changed, the connection will be broken, and client machine must connect to high performance computing cluster again and force the high performance computing cluster to refresh the node list. Another

problem is that any machine with the cluster client program installed actually becomes a cluster node. Such machines usually haslower security protection and has higher risk having the whole cluster infected by malicious program.

Using method (B), it has the same problems as method (A) if client machines use dynamic IP address, the connection will be broken when client machine's IP is changed. The client may lose their work and need login to high performance computing cluster again. Another problem is that the cluster cannot control the users' behavior after they login a computing node, and the computing work may be impacted if the client executes unexpected commands.

Both method (A) and (B), the most serious problem is that the remote interact information will be lost due to broken network connection, so user will lose their work and must be restart interact operation again.

The new method to execute interact operation in a high performance computing cluster via collaborate between multiple components. On client side, an emulated command console is created, which play as an external virtual node of high performance computing cluster. It allows client machines execute interact operation directly, as like execute those operation on a computing node in high performance computing cluster, and it can reduce the impa...