Bridging raw TCP binary data to HTTP
Publication Date: 2010-Jul-23
The IP.com Prior Art Database
DataPower Appliances are able to deal with XML as well as binary data received through many protocols (HTTP, MQ, FTP, ...). They are also able to deal with XML received over raw TCP connection without any protocol. There exist many specific binary data formats which are transmitted over raw TCP. Examples are SML (Smart Metering Language, Energy and Utilities), ISO 8583, ... . Even there are combined formats like Microsoft Exchange which transfer XML data padded with binary data. The binary data disallows "normal" XML processing. While it is easy to process this binary data on Datapower if received over a protocol (HTTP, MQ, ...) it is not possible to process because it is sent over raw TCP. A TCP proxy receives raw TCP data on the frontside and just forwards it to a backend. Responses received from the backend are forwarded to the frontend. This is a solution which describes a modification of a TCP proxy (which is an available DataPower service). It bridges arbitrary binary data received on the frontend to HTTP chunked encoding (which is able to deal with binary data) to the backend. The backend will send responses back with HTTP chunked encoding. The modified TCP proxy bridges this response back to raw TCP as response to the frontend. While this solves a problem for DataPower Appliances it is a general solution. Thinking on the millions of devices out in the world which are only able to send raw TCP data and applications/hardware which are only able to process data received by a protocol (most are able to process HTTP) this bridging of rawTCP to HTTP is very valuable. This solution could be the basis of a micro controller just having two ethernet ports for front- and back-end and provide just the bridging functionality. Such little boxes would allow millions of existing devices communicate with existing hardware/software solutions able to deal with HTTP. This solution can be used to process the (raw TCP) readouts of millions of power meters every few minutes through DataPower which is not possible, currently. Another application of this solution is the processing of Microsoft Exchange traffic in DataPower in order to have DataPower as single entry point into customer network (DataPower is a security device). This way the XML data inside the raw TCP binary data as well as the binary data can be processed.
Bridging raw TCP binary data to HTTP
Detailed description of the modified TCP proxy steps:
This is the dummy header sent to the backend on connection creation: "POST /
HTTP/1.1\r\nHost: 126.96.36.199\r\nUser-Agent: none
chunked\r\n\r\n". The POST statement defines HTTP/1.1 protocol which provides chunked transcoding. The HOST entry is necessary but not used, therefore the ip address is set to 188.8.131.52. Alternatively the more correct IP address of the backend could be supplied. The User-Agent entry is necessary but unimportant.
It could be any string but "none" describes the situation at best since we have no user agent on the frontside. Last but not least the Transfer-Encoding specifies chunked transfer which is necessary for this solution.
Receiving any data packet from the frontend it just gets wrapped by some bytes
making it a chunked data packet for submission to the backend. The TCP proxy being the basis of this solution typically has two buffers where the frontside and backside data received are buffered before sending to the backside/frontside. This buffer has a specific length, take 32KB as an example. This size is an upper bound of the maximal packet sent to the backend, and in this case 16bit or 4 hex digits are enough to signal the data length. Chunked packets are of the form "hex-length(DATA)\r\nDATA\r
"0\r\n\r\n" will be sent to the backend to signal end of transmission (0 size packet).
This solution just receives the data from the backend and knows to receive the
response HTTP/1.1 chunk encoded because of the request. It just determines the HTTP response header and discards it before sending further packets to the frontside. The end of the HTTP response header is defined by the sequence "\r\n\r\n" which is easy to detect.
For each chunk-encoded data packet received by the backend the preceding
length and "\r\n" and the "\r\n" at the end of the packet are removed before just passing the remaining data to the frontside. This is just the reverse of operation under 2).
The last packet received from the backend is always "0
\r\n" data in the backside buffer. Another specific process is
responsible to transmit that buffer data to the frontside, the trick is that not the data received from the backside will end in that buffer, but the data already modified by steps 4-6.
An important aspect of this system-solution (micro-controller solution with 2 ethernet ports) is the ability to add missing HTTP header data to the (artificial) HTTP header in step 1. Additionally simple transformations might be specified here. A simple example would be to send the received binary data from the frontside hex-encoded
to the backend for a 291 byte DATA packet.
As described above, if client signals shutdown of the socket connection,
n", therefore packet "0123
DATA\r\n" gets send
n" which is just
While the previous description states that the received packets are directly sent, th...