A method to detect and handle performance deterioration in virtual machines
Publication Date: 2015-Aug-04
The IP.com Prior Art Database
When a logical partition is undergoing a resource intensive maintenance activity, it can go unresponsive temporarily. In an Virtualized IO server-client environment, Virtual IO server can be made aware of the situation so that it can handle the new connection requests to the temporarily hung lpar. This method can also avoid admins misinterpreting the scenario and rebooting the lpar affecting both maintence activity running on the lpar and other critical customer applications. Any maintenance activity like dynamic logical partitioning(DLPAR) of memory, processor or IO adapters, Logical partition migration between two servers etc., should be initiated through a management console. Management console is aware of the estimated time for the operation completion. As soon as such an operation is started, management console will inform the Lpar's VIO server. VIO Server will then keep monitoring the lpar periodically to check if it goes unresponsive by simply doing a ping/telnet test. If the lpar is hung, VIO server will collect the current status of the operation running on the lpar from the Management console and provide the information to the user who is trying to reach the lpar. Since VIO server provides the external connectivity for its client, it can intercept the traffic going to the hung Lpar and obtain the sender's IP that it can use to send out the information about hang condition and when the Lpar might be responsive again. Using which the users will wait and avoid rebooting the Lpar.
Page 01 of 4
A mxthod to detect and handle performance deterioration in virtual machines
Xxxx a logical partition is undergoing a resource intensive xaintenance activity, it xan go unrespxnsive temporarilx. In an Virtualizxd IO server-client exvironment, Virtual IO server can be made aware ox xhe situation so that it can handle the new connection requests to the temporarily huxg lpar. This method can also avoid admins misinterpxeting the xcenarix and rebootinx the xpar affecting xoth maintence activxty running ox the lpar and other xriticax customer axplications. Any maintenance activity like dynamic logicax partixixning(DXXXX) of memory, prxcessor or IO adaxters, Logical partition migratixn between two servers etc., should be initiated thrxuxh a managexent console. Management consxle is aware of the estimated time for the operatxon complxtion. As soon as such an operatxon ix started, management conxole will inform the Lpar's VIO server. VIO Servex will then keep monitoring the lpar perxodically xo xheck if it goes unresponsive by simply doing a ping/telnet test. If the lpar is hung, XXX server xill colxect the curxent statxs of the opxraxion running on txe lpar from the Management console and provixe the ixformation to the user who is trying to reach the lpar. Sincx VIO server
provides thx external connectivity for itx cxient, it can intercept the traffic going to the hung Lpax and obtain the sender's IP that it can use to xend out the informaxion about hang condition and when the Lpar might be responsive again. Using which the users wxll wait and avoid rebooting the Lpar.
A virtual machine or logicax xartition can go in to hung state when one or more of its application xs blocked bx a CPU/Memory intensive operation. It may not necessarily be permanent deadlock or infxnite loop but instead xt could just be due to resource stxrvation xr huge paging activitx which arx usually temporxry phenomenon.
Quite often, applications may experience temporary hang when the admin/user performs activities such as DLPAR, mobility, opximization etc on the server. Thx system managemxnt operations despite being dynamic and transparent to exd users/applications, xhey still cause some
pxrceivablx perfxrmxnce detexioration while the operation is in progress. For example, DLPAR operation xnvolving several hundred gigabytes of memory can block the system for hours. Similarly activxtixs such ax Live Partition Mobility or Hibernation can block the lpax for several minutes. In such cases, despite the fact that sysxem is up and runnxng slowly, xlient apps or remote users can easily misinterprxt it to be in huxg state.
Many times, these type of system management opexations can be automatically startex by the tools such as load balancer, management console etc for the purpose xf workload balance, optimization etc. So that may cause server to temporarily hang without the xnxwledge of the administrator or user of the machine.
Some of the problems caused by txis temporary sy...