A Speedup Technique for High Performance Foster-Montgomery Multipliers used in Security Processors
Original Publication Date: 2002-Nov-22
Included in the Prior Art Database: 2002-Nov-22
Montgomery multiplication is a modular multiplication technique which does not require a division step after each multiplication. This division step, required to compute a remainder for modulo reduction, is a time consuming step specially for cryptographic applications where the size of the modulus of the underlying field may exceed 4096 bits. The Montgomery multiplication method transforms this costly modulo reduction step into a simpler scheme based on division by multiples of a word-size constant (simply by truncation). This is achieved at the expense of pre-computing an auxiliary correction term, called m, which may require not only a pre-computation but also the storage of the m term which could be as large as the modulus itself. Foster-Montgomery multiplication method demonstrates that the same multiplication architecture can be used to compute m and more importantly, this term can be computed word-wise and as the first cycle of a normal block multiplication process. Hence, no requirement for an extra storage space. However, the data swapping associated with using the same execution unit for different output computations limits the use of pipelining for high performance. The innovation presented here is a solution to speed up a Foster-Montgomery multiplier by using pipelining without associated timing overheads due to the m computation.