Browse Prior Art Database

A smart pagesize optimization method for large scale parallel application Disclosure Number: IPCOM000245680D
Publication Date: 2016-Mar-30
Document File: 7 page(s) / 257K

Publishing Venue

The Prior Art Database


Currently, most OS system uses the 4K as the default page size for a running process. However, it is not suitable for the system to always use such a small page size for all applications for parallel computing in a large scal HPC system. This article describe a method that can smartly use the proper page size for parallel computing in a large scal HPC cluster, which can effectively improve the performace of user's job.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 41% of the total text.

Page 01 of 7

A smart pagesize optimization method for large scale parallel application

With the

              of scaling, the demand of large memory for the high performance computing application is getting bigger and bigger in a large scale cluster. Conventionally, the OS system uses 4K as the default page size for a running process



system to

such a small page size for memory-bounded process for parallel computing

                                                                                   . Owing to using the default 4K page size, the TLB miss and page fault is growing, it impacts the performance of the application consequently.

The core idea of this invention, illustrated in Fig.1, can be summarized as below:

always use

in a large scale HPC system

owever, it is not suitable for the

a) When a task


the first time on

the cluster, it'

the performance tool to track the page

faults and performance event. According to these tracks, a

page-size will be automatically produced based on a set of

methods in our invention, which can be used

                                            symmetrically, the policy generated on node A can be applied on node B accordingly. Say in other word, a page-size policy can be applicable for the same task on the whole cluster.

                                     on the same cluster, the page-size policy cached previously, will be used to help task launched with the specific page-size. This will be highly beneficial to the performance of the entire job.

c) Assume, the hardware of each computing node on the cluster


of a parallel job



to set the page size for

to run

any node in


s used by

the same task later.

b) After that, when the same task



Page 02 of 7

Start Job

is the Job run on this cluster first time?



get the pagesize policy from Global table

Running the task with performance tool

use the policy to back the related memory region (bss/data/txt/heap)

 gather the most proper pagesize by performance tool

generate policy and load the policy inito local table

Running the task

load the local table into the global table on resource manager


Fig.1 General Work flow

Let's analyze the potential performance improvement gained by the workflow above.

Assume, with the default page-size policy, each Job spendstime Tj to run. Normally, the job need to cost less time T'j by taking advantage of the optimized page-size policy. However, for the first time of running job, the job need to spend extra time to run together with the performance tool to analyze the

page-size, which cost p


time.And, the user needs to run the job n times totally. P is a ratio then, to compare the job task running time with optimized


Page 03 of 7

page-size to the time without optimization.

( ) ( 1)

 j p j T T n T j

+∆ + −






  T T T p T T

   + ∆ ∆
= = + >

j p

1 1


When n = 1,



. It means the new method cost more time than the Jobwithout page-size policy, therefore, there is no meaningful

to use this new solution at all.

In the case 2

n , if P could be less one, then it means the total job running time could be potentially improved by introducin...