Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

A smart pagesize optimization method for large scale parallel application

IP.com Disclosure Number: IPCOM000245680D
Publication Date: 2016-Mar-30
Document File: 7 page(s) / 256K

Publishing Venue

The IP.com Prior Art Database

Abstract

Currently, most OS system uses the 4K as the default page size for a running process. However, it is not suitable for the system to always use such a small page size for all applications for parallel computing in a large scal HPC system. This article describe a method that can smartly use the proper page size for parallel computing in a large scal HPC cluster, which can effectively improve the performace of user's job.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 41% of the total text.

Page 01 of 7

A smart pagesize optimization method for large scale parallel application

With the

              of scaling, the demand of large memory for the high performance computing application is getting bigger and bigger in a large scale cluster. Conventionally, the OS system uses 4K as the default page size for a running process

.

increasing

system to

such a small page size for memory-bounded process for parallel computing

                                                                                   . Owing to using the default 4K page size, the TLB miss and page fault is growing, it impacts the performance of the application consequently.

The core idea of this invention, illustrated in Fig.1, can be summarized as below:

always use

in a large scale HPC system

H
owever, it is not suitable for the


a) When a task

run

the first time on

the cluster, it'

the performance tool to track the page

faults and performance event. According to these tracks, a

page-size will be automatically produced based on a set of

methods in our invention, which can be used

                                            symmetrically, the policy generated on node A can be applied on node B accordingly. Say in other word, a page-size policy can be applicable for the same task on the whole cluster.

                                     on the same cluster, the page-size policy cached previously, will be used to help task launched with the specific page-size. This will be highly beneficial to the performance of the entire job.

c) Assume, the hardware of each computing node on the cluster

is

of a parallel job

are

s

to set the page size for

to run

any node in

recommended

s used by

the same task later.


b) After that, when the same task

submitted

1


Page 02 of 7

Start Job

is the Job run on this cluster first time?

YES

NO

get the pagesize policy from Global table

Running the task with performance tool

use the policy to back the related memory region (bss/data/txt/heap)

 gather the most proper pagesize by performance tool

generate policy and load the policy inito local table

Running the task

load the local table into the global table on resource manager

End

Fig.1 General Work flow

Let's analyze the potential performance improvement gained by the workflow above.

Assume, with the default page-size policy, each Job spendstime Tj to run. Normally, the job need to cost less time T'j by taking advantage of the optimized page-size policy. However, for the first time of running job, the job need to spend extra time to run together with the performance tool to analyze the

page-size, which cost p

 T


time.And, the user needs to run the job n times totally. P is a ratio then, to compare the job task running time with optimized

2


Page 03 of 7

page-size to the time without optimization.

( ) ( 1)

 j p j T T n T j

+∆ + −

'

=

P

nT

(1)

  T T T p T T

   + ∆ ∆
= = + >

j p

1 1

p

When n = 1,

j

j

. It means the new method cost more time than the Jobwithout page-size policy, therefore, there is no meaningful

to use this new solution at all.

In the case 2

n , if P could be less one, then it means the total job running time could be potentially improved by introducin...