Browse Prior Art Database

Cache friendly layout for Finite State Transducers in Linguistic Analysis Applications

IP.com Disclosure Number: IPCOM000016252D
Original Publication Date: 2002-Sep-16
Included in the Prior Art Database: 2003-Jun-21
Document File: 4 page(s) / 62K

Publishing Venue

IBM

Abstract

A program is disclosed that reorganises the layout of a language dictionary which is implemented as a finite-state transducer in such a way that optimal use is made of cache memory when executing the reorganised finite state machine. Finite-state processing typically has simple access code, so it is the speed of the memory access which is crucial for the performance. Modern computers contain several storage devices which are arranged in a hierarchy starting with disk storage which has a very large capacity but slow access speed and ending with processor cache memory (and registers) which has very small capacity but extremely fast access speed. As you move up each level in the storage hierarchy the memory gets faster, but the capacity decreases. Finite-state transducers (FSTs) are implemented by mapping the nodes in the transducer's network into linear memory locations. It is apparent, that the choice of the mapping will have impact on the performance of finite-state processing, because it directly affects the performance of caching. This mapping is typically done by the internal logic of the builder, which may be good or bad in terms of cache friendliness, but could hardly expected to be the optimal unless this factor is explicitly considered during the builder design. In modern computers, the hardware and operating system attempt to provide optimal cache usage by loading an entire block of memory into the cache at the same time. This strategy only works well because most programs show a pattern of accessing memory in a sequential way. However, if we are executing a finite-state machine whose nodes have been randomly mapped to memory locations, this will require frequent reloading of the cache. On the other hand, if the next visited node in the finite-state machine is stored close to the node currently being visited, there is an increased chance that the memory location containing this next node will already be in the cache. Description of the Optimisation Algorithm

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 33% of the total text.

Page 1 of 4

  Cache friendly layout for Finite State Transducers in Linguistic Analysis Applications

A program is disclosed that reorganises the layout of a language dictionary which is implemented as a finite-state transducer in such a way that optimal use is made of cache memory when executing the reorganised finite state machine.

   Finite-state processing typically has simple access code, so it is the speed of the memory access which is crucial for the performance. Modern computers contain several storage devices which are arranged in a hierarchy starting with disk storage which has a very large capacity but slow access speed and ending with processor cache memory (and registers) which has very small capacity but extremely fast access speed. As you move up each level in the storage hierarchy the memory gets faster, but the capacity decreases.

   Finite-state transducers (FSTs) are implemented by mapping the nodes in the transducer's network into linear memory locations. It is apparent, that the choice of the mapping will have impact on the performance of finite-state processing, because it directly affects the performance of caching. This mapping is typically done by the internal logic of the builder, which may be good or bad in terms of cache friendliness, but could hardly expected to be the optimal unless this factor is explicitly considered during the builder design.

   In modern computers, the hardware and operating system attempt to provide optimal cache usage by loading an entire block of memory into the cache at the same time. This strategy only works well because most programs show a pattern of accessing memory in a sequential way. However, if we are executing a finite-state machine whose nodes have been randomly mapped to memory locations, this will require frequent reloading of the cache. On the other hand, if the next visited node in the finite-state machine is stored close to the node currently being visited, there is an increased chance that the memory location containing this next node will already be in the cache. Description of the Optimisation Algorithm

   We tested four optimisation algorithms, shortly described below. The following is a detailed description of the algorithm which gave the best results in our experiments. It can be described as a depth first reordering using traversal frequency as a basis for ordering.
1. Construct an original FST with arbitrary placement of nodes. However each transition in the FST should have an additional field called visit_count, which is initially set to 0.
2. Use this original to perform dictionary look-ups on a large corpus which is statistically representative of the text which will be analysed with this dictionary in the future. Each time an arc in the FST is traversed, you increment the value of the visit_count field
3. Now you start the optimization phase, to do this you need to create two integer arrays forward_map and backward_map. Each of these arrays should contain as many elements as ther...