Browse Prior Art Database

Cache friendly layout for Finite State Transducers in Linguistic Analysis Applications

IP.com Disclosure Number: IPCOM000016252D
Original Publication Date: 2002-Sep-16
Included in the Prior Art Database: 2003-Jun-21

Publishing Venue

IBM

Abstract

A program is disclosed that reorganises the layout of a language dictionary which is implemented as a finite-state transducer in such a way that optimal use is made of cache memory when executing the reorganised finite state machine. Finite-state processing typically has simple access code, so it is the speed of the memory access which is crucial for the performance. Modern computers contain several storage devices which are arranged in a hierarchy starting with disk storage which has a very large capacity but slow access speed and ending with processor cache memory (and registers) which has very small capacity but extremely fast access speed. As you move up each level in the storage hierarchy the memory gets faster, but the capacity decreases. Finite-state transducers (FSTs) are implemented by mapping the nodes in the transducer's network into linear memory locations. It is apparent, that the choice of the mapping will have impact on the performance of finite-state processing, because it directly affects the performance of caching. This mapping is typically done by the internal logic of the builder, which may be good or bad in terms of cache friendliness, but could hardly expected to be the optimal unless this factor is explicitly considered during the builder design. In modern computers, the hardware and operating system attempt to provide optimal cache usage by loading an entire block of memory into the cache at the same time. This strategy only works well because most programs show a pattern of accessing memory in a sequential way. However, if we are executing a finite-state machine whose nodes have been randomly mapped to memory locations, this will require frequent reloading of the cache. On the other hand, if the next visited node in the finite-state machine is stored close to the node currently being visited, there is an increased chance that the memory location containing this next node will already be in the cache. Description of the Optimisation Algorithm