Technique for Halving Vector Load Times
Original Publication Date: 1985-Apr-01
Included in the Prior Art Database: 2005-Feb-18
Current vector machines process many operands in pipelined fashion to achieve high hardware utilization and consequently very high levels of performance. Asymptotically, each vector arithmetic pipeline can produce one result per cycle. Since many of the newer vector processors have multiple pipes, the machine can produce more than one result per cycle. This leads to very high memory bandwidth requirements, since, for efficient operation, the pipelines must be kept full. Most vector machines can perform arithmetic in several precisions; typically, in short precision (32 bits), double precision (64 (bits), and extended precision (128 bits). In most of these machines single precision vector loads and double precision vector loads require the same number of cycles to load a vector into vector registers from memory.