Browse Prior Art Database

Method for detecting caching opportunities in software.

IP.com Disclosure Number: IPCOM000011999D
Original Publication Date: 2003-Apr-01
Included in the Prior Art Database: 2003-Apr-01
Document File: 3 page(s) / 46K

Publishing Venue

IBM

Abstract

This article describes heuristics for the identification of caching opportunities in software. These heuristics are based on examining performance data collected from a running application. The data includes method call and return events (e.g., A calls B at time T), calling argument values and method return values. The heuristics can then be applied to automate the injection of caching logic for the purpose of assessing the benefits of applying caching to a particular executable (under a particular workload).

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 43% of the total text.

Page 1 of 3

Method for detecting caching opportunities in software.

Using techniques described in other articles, argument values, return values, invocation times and call flow information can be obtained for parts of a running application. This data can then presented to the user to assist in the identification of caching and other optimisation opportunities. However, this step can - to some extent - be automated. This article describes some of the automated mechanisms for identifying and characterising/parameterising caching opportunities.

Step 1. Identify a set of candidate methods X. This can be done using frequency counts
(e.g, methods called most frequently), or some other profiling technique (e.g., a tool like tprof to assess time spent in methods).

Step 2. For each method, X[i] in X, do:
a. Determine presence of a many-to-one mapping from all or some argument values to a return value. This is implemented by checking that duplicate sets of arguments all have the same return value. This is a clustering exercise. The existence of such a pattern is a strong indicator for caching. As a trivial example, suppose the sequence of input arguments to a method were:

1.111 1.2 1.3 1.5 1.8 1.9 2.1

with corresponding return values

1 1 1 1 1 1 2

    There is a strong likelihood that the computation within the method can be improved by not computing this function over and over again.

    Note that methods with multiple input arguments can be handled simplistically by considering the cross-product of input arguments (e.g., you can concatenate all arguments for a single call together into one "big" argument).

    An automated caching mechanism could be generated based on historical values, and used in-place to predict the value of employing this optimisation.

    If this is the case, then report this behaviour, and the benefit, to the user for manual intervention.
b. Determine whether or not the execution 'cost' of a method depends on the value of incoming arguments. E.g. the method takes much longer with some arguments than with others, execution time increases as some measurable value of an argument increases, etc. Identify the arguments that have most impact on the execution time of the method. Detecting this can be automated by performing a linear regression of method response time over the input argument value (for a simple 1 input argument, return value method. If the method has multiple input arguments, then a multiple linear regression can be performed).

    If a positive identification is made, again, the same mechanism indicated in a. can be applied to estimate the improvement in either caching, or recoding the computation in method X[i] for these more expensive arguments.

    If there are improvements, then report this behaviour, and the benefit, to the user for manual intervention.

Page 2 of 3

    c. Determine if the uses of a method are clustered around a few argument values and return values. This is not quite the same thing as a). It deals with identifying high freque...