Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Mechanism to Decrease Cold Cache Startup Effects in Instruction Cache Prefetching

IP.com Disclosure Number: IPCOM000248538D
Publication Date: 2016-Dec-14
Document File: 6 page(s) / 97K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for decreasing cold cache startup effects in instruction cache prefetching.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 42% of the total text.

1

Mechanism to Decrease Cold Cache Startup Effects in Instruction Cache Prefetching

Disclosed is a method for decreasing cold cache startup effects in instruction cache prefetching.

Instruction cache misses are a significant performance problem for commercial

applications. Estimated improvements for using an infinite L1 Instruction cache (ICache) are

about 30% for many commercial applications. However, increasing the L1 ICache is not

effective in commercial applications because the instruction footprint is typically very large

compared to L1 ICache. Also, many L2 instruction misses are served from large caches (L3) or

memory. Therefore, instruction prefetching is a more cost-effective solution. However,

predicting the correct instructions to prefetch takes training time, and performance is reduced

during the training time. During the training time, the predicting mechanism, hereafter called

"the predictor", is not giving any recommendation. During this time, the L3 cache is warming up

one cache line at a time for each miss. There is a significant performance loss due to access of

memory, which has a very high latency compared to latencies for accessing an L3/L2/L1 hit. As

a consequence, the cycles lost to the initial memory access significantly reduce the benefits of the

predictor. Another effect is context switching, which tends to flush cache contents when a new

application is installed. After that, it takes time to train the predictor and install a working set

size in L3. For these reasons, it is necessary to warm up the L3 cache quickly without training.

This is accomplished by using a simple predictor that quickly brings several cache lines in

parallel to L3. This avoids the time that it takes to bring one cache line in at a time for every L3

miss. The predictor is modified to detect a cold startup effect based on which cache level is

accessed and which conditions trigger a miss. A special bit is used to detect startup conditions

by the predictor. The predictor is initialized to an “initial” cold state. The “initial” state of the

predictor could be initialized when there is a context switch or other situations. If the predictor

cannot give any recommendation when there is a demand instruction cache miss, Next-N line

prefetching which brings in several cache lines in parallel may be triggered. However, instead of

using Next-N line prefetching, other simple predictors can be used.

An embodiment of the disclosed method detects the conditions of a cold access. This

mechanism may not require an instruction prefetching predictor. In some situations, it is possible

to detect that a cold start happened, using information about the demand instruction cache miss

and triggering a prefetch under some conditions, thus warming up L3 cache much faster. Next-N

line prefetching starts prefetching when there is an instruction demand miss, bringing an

additional N extra lines located right after the demand miss.

Currently, it shows very good improvements i...