Browse Prior Art Database

CRiSP - Content Rewriting Service Provider

IP.com Disclosure Number: IPCOM000126283D
Original Publication Date: 2005-Jul-12
Included in the Prior Art Database: 2005-Jul-12
Document File: 3 page(s) / 90K

Publishing Venue

IBM

Abstract

Disclosed is an flexible content rewriting service using input and output rules based on regular expressions.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

CRiSP - Content Rewriting Service Provider

At present there are a number of portlets that use parsing and rewriting of HTML to reverse proxy content from one server and make it appear to come from another. For example:

The Domino Application Portlet (hereafter referred to as DAP) intercepts content generated by Domino* and rewrites it to point all URLs back to the portlet. This is achieved by using a set of pattern matching rules (implemented as two distinct mechanisms: Apache** regular expression rules and a bespoke HTML parser) to identify and rewrite URLs within the content.

The Web Clipping Portlet works in a similar manner but uses DOM to locate URLs that are subsequently normalised and rewritten to point back to the portlet.

    The process of parsing and rewriting the content is quite heavy on the portal server and results in poor performance from the portlets and a general sluggishness from the portal.

    The proposal is to remove the content rewriting process entirely from DAP (ie, the portlet) and implement it as an independent service on a dedicated server. This would have the benefit of improving scalability and performance and may open the way for other applications to benefit from a fast content parser/rewriter.

    At first glance, it might appear that this proposal is simply another reverse proxy server (much like Apache Server, Edge Server or Microsoft ISA Server***), however, there is a distinction:

Proxy servers render the entire content of a response as if it were coming from the proxy server itself. They do not facilitate the embedding of the rewritten content into a portlet, servlet or other appropriate container.

Proxy servers require that the servers that are supplying the content (to be reverse proxied) be configured for reverse proxying; no such configuration is required by this service.

    In a nutshell, the service provides a programmable mechanism for content rewriting. Any application (including existing reverse proxy servers) that requires content to be rewritten, whether it be HTML or otherwise, may avail of the service.

    Currently, DAP itself has an embedded rewriting engine. See Figure 1 below. Each instance of the portlet is responsible for rewriting content it has retrieved from Domino. As mentioned previously, the process of parsing and rewriting Domino content places a huge processing burden on the portlet and, therefore the portal server.

Page 2 of 3

    By extracting the rewriting engine and hosting it on another, possibly dedicated, server, the processor-intensive rewriting task is removed from the portal server. See Figure 2 below. It should be noted that the rewriting engine is not responsible for retrieving content (to be rewritten), it is purely a re-writer. The Domino request/response connection is still handled by DAP; the service is then used to rewrite the content before DAP responds to the original client request.

    A plethora of technologies could be used to wrap t...