Browse Prior Art Database

64bit Swap CPU operation Disclosure Number: IPCOM000022509D
Original Publication Date: 2004-Mar-18
Included in the Prior Art Database: 2004-Mar-18
Document File: 3 page(s) / 49K

Publishing Venue



Disclosed is a 64bit Swap CPU operation that swaps all 64 bits in one instruction to reduce the CPU operations' latency.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 58% of the total text.

Page 1 of 3

64bit Swap CPU operation

Currently, there is a problem with the memory model between the
big-endian processor and the little-endian IO bus. Data that
came across the IO bus are not in the order that it should be
(for the CPU). With this bit format issue, the software normally
arranges the data structure so the software doesn't need to byte
swap the data. Examples below show two versions of data
structures for different bit formats.

Structure descriptor { Structure descriptor {

    Char interrupt_event; Char

    Char Rx_event; Char

    Char Tx_event; Char Rx_eevnt;
Char hw_status; Char
} }

The data structure arranged works fine with the status/even data.
If the data is an address pointer or offset and is longer than
one byte, then swapping needs to perform before it can be use.
Below is an example of machine instructions which are needed to
perform 32 bitwise data swap operation. It will take more
instructions when the data is 64 bits.

((((x) & 0xFF)<<24) | (((x) & 0xFF00)<<8) | (((x) & 0xFF0000)>>8)
| (((x) & 0xFF000000)>>24))

This breaks down into the following assembler instructions (for
the POWER architecture):

addi 38010048 1 AI gr0=gr1,72,ca"
lwz 807F01A8 1 L4A
addi 38810068 1 AI gr4=gr1,104,ca"
addi 38A00001 1 LI gr5=1
stw 93810068 1 ST4A <a4:d104:l4>(gr1,104)=gr28
rlwinm 5466421E 1 RN4 gr6=gr3,8,0xFF0000
rlwinm 5467463E 1 SRL4 gr7=gr3,24
rlwimi 5066C00E 1 RI4 gr6=gr3,24,gr6,0xFF000000
stw 90010074 1 ST4A <a4:d116:l4>(gr1,116)=gr0
rlwimi 5066C42E 1 RI4 gr6=gr3,24,gr6,0xFF00
or 7CC03B78 1 O gr0=gr6,gr7

The swapping operation is happening very often with IO adapter
driver in the big-endian processor and the little-endian IO bus
environment. And we can see that the cost to performance on this


Page 2 of 3

operation is very expensive in today's system.

This invention's purpose is to add a new CPU machine instruction.
The new operation will have the new hardware assistance logic to
swap all 64 bits in one instruction, thereby reducing the CPU
operations latency. The new instruction will allow swapping of
any number of bits starting at any position in the 64 bits.

The invention creates a new machine instruction with a new
hardware assistant. The new instruction returns any number of
bits swapped from any where in the 64 bit register. The hardware
logic first shifts the bit into the right position and then
performs the swap.

The new instruction may have format as below:

    Swap (memory_address, position_of_start_bits, and