Surety is performing system maintenance this weekend. Electronic date stamps on new Prior Art Database disclosures may be delayed.
Browse Prior Art Database

Optimal PowerPC branch-free code sequences for common operations

IP.com Disclosure Number: IPCOM000132493D
Original Publication Date: 2005-Dec-19
Included in the Prior Art Database: 2005-Dec-19

Publishing Venue



Certain common operations such as comparisons can be performed by carefully chosen short branch-free sequences of IBM* PowerPC* instructions. The PowerPC Compiler Writer's Guide documents a good selection of these sequences but here are described sequences for additional operations or more efficient sequences or sequences extendable to both 64-bit or 32-bit modes. The operations include signed and unsigned comparisons returning -1 or +1 for true, signed and unsigned maximum and minimum operations, sign operation (+1,-1,0) and 3 value result (+1, -1, 0) signed and unsigned comparisons. These sequences can in many cases be used with little adaptation on other RISC processors.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 13% of the total text.

Page 1 of 14

Optimal PowerPC branch-free code sequences for common operations

Compilers generate code by translating source code into machine instructions. Certain common operations can be recognised and translated into precalculated efficient sequences of machine code. Certain of these sequences suitable for IBM* PowerPC* computers are documented in the 'PowerPC Compiler Writer's Guide' (ISBN 0-9649654-0-2) and the book 'Hacker's Delight' (ISBN: 0201914654 ), but the known solutions may not be optimal and it may be possible to provide sequences for some operations which have fewer instructions or take less time to execute.

    Here are disclosed techniques for forumlating sequences of machine instructions which perform certain operations in fewer instructions or machine cycles than previously published. These sequences are expressed as PowerPC instructions, but many could be translated to other machines, particularly sequences which do not use the carry flag or only use the carry flag as input to or output from addition or subtraction operations.

    For Java** and C code on PowerPC hardware there are 4 common modes in which these operations can be carried out. These are considered in turn below. 32-bit operations in 32-bit mode

    The carry flag is set from the carry from the 32-bit operation, so the carry flag can be used. If use of the carry flag can be avoided then this is better as it could allow two independent operations to be carried out in parallel without conflicts on the carry flag. 32-bit operations in 64-bit mode

    The carry bit is set as a result of a carry out of the 64 bit place, so in general the carry cannot be used for this situation. One exception is the carry bit is set after a srawi instruction from the result of the 32-bit shift. Also, if the values in registers have been or are sign extended to 64 bits then the carry will be appropriately set after operations on the values. It may be better to sign or zero extend the inputs to 64 bits, then either use 64/64 sequences or sequences especially designed for this 32/64 mode. 64-bit operations in 64-bit mode

    The carry flag is set from the carry from the 64-bit operation, so the carry flag can be used. Code sequences from the 32/32 mode can be used with very little modification. Some shifts may need to be extended to dword operands and 63-bit, rather than 31-bit shifts. 64-bit operations in 32-bit mode

    Two 32-bit registers are needed to represent each 64-bit quantity. Code sequences for this mode can be extended from the 32/32 or 64/64 bit operations, but with adaptations for multiword adds and subtracts using the carry flag, shifts need adaptation, so a sradi RB, RA, 63 could become srawi RBh, RAh, 31. Logical rather than arithmetic operations on 64-bit values are to be preferred as there does not need to be a carry between the upper and lower halves of the 64-bit quantity.

Page 2 of 14

    The sequences below are expressed for 32-bit operations in 32-bit mode, but are easily extendable to...