Browse Prior Art Database

Method of the Use of XML to Hierarchically Arrange Regular Expressions for the Purpose of Applying Text Transformations to Source Code.

IP.com Disclosure Number: IPCOM000012242D
Original Publication Date: 2003-Apr-22
Included in the Prior Art Database: 2003-Apr-22
Document File: 7 page(s) / 43K

Publishing Venue

IBM

Abstract

Disclosed is a method for using a combination of XML and regular expressions to build an easily-extensible and configurable text transformation engine. Generally, this text transformation engine can be used to perform any transformations that are possible in regular expression substitutions. Specifically, this text transformation engine can be used to add syntax color highlighting and HTML transformations to raw source code, enabling source code to be viewable within a web browser.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 34% of the total text.

Page 1 of 7

  Method of the use of XML to hierarchically arrange regular expressions for the purpose of applying text transformations to source code .

  The Syntax Highlighter is a java-based tool that transforms the raw source code of several different languages into a format that allows it to be viewed within a web browser. Some of the tasks performed by this transformation tool are as follows:

Replaces HTML entity characters with viewable equivalents (e.g. [&]-->[&amp;], [<]-->[&lt;], [>]-->[&gt;], ...) Replaces newline characters with HTML breaks (<br>) Replaces whitespace with &nbsp; characters. Adds color highlighting to source code by insertion of <span> attribute tags.

Figure 1 shows a web browser displaying JSP source code that has been transformed using the Syntax Highlighter tool.

Fig 1. Sample display of JSP source code that has been transformed and has had color syntax highlighting added using the Syntax Highlighter tool.

1

Page 2 of 7

The Syntax Highligher uses regular expressions to find patterns in and to format raw source code. These regular expressions are stored as 'rules' in an XML configuration file. This file defines hierarchically-arranged lists of regular expression rules for each language type defined. Each language section of the configuration file contains a series of Apply, Extract, and Join rules.

Figure 2 shows a section of the XML configuration file, showing the regular expression rules defined for the Java code type.

Fig 2. Java rules section of XML configuration file of Syntax Highligher. < SyntaxHighlighterRules >

< Language name = "Java" >

         < Apply pattern = "s/ </ &lt;/sg" /> <!-- less thans -->
< Apply pattern = "s/>/&gt;/sg" /> <!-- greater thans
-->

2

[This page contains 1 picture or other non-text object]

Page 3 of 7

         < Apply pattern = 's/\\"/\\&quot;/sg' /> <!-- escaped
quotes -->

         < Extract pattern =
'/("[^"\n]*(?:\/\/|\/\*|\*\/)+[^"\n]*")/m' > <!-- quotes
containing comments -->

              < Apply pattern =
"s/^(.*)$/\900color:blue\901$1\902/s" />

         </ Extract >
< Extract pattern = "/[^\/](\/\*\*.*?\*\/)/s" > <!--
javadoc comments -->

              < Apply pattern =
"s/^(.*)$/\900color:mediumblue\901$1\902/s" />

              < Apply pattern =
"s/(@(author|docRoot|deprecated|exception|inheritDoc|link|link
plain|param|return|see|serial|serialData|serialField|since|thr
ows|value|version))/\900color:steelblue\901$1\902/sg" />

         </ Extract >
< Extract pattern = "/[^\/](\/\*.*?\*\/)/s" > <!--
c-style comments -->

              < Apply pattern =
"s/^(.*)$/\900color:green\901$1\902/s" />

         </ Extract >
< Extract pattern = "/(\/\/.*)/m" > <!-- c++ style
comments -->

              < Apply pattern =
"s/^(.*)$/\900color:green\901$1\902/s" />

         </ Extract >
< Extract pattern = "/('[^'\n]*'|&dq[^&dq\n]*&dq)/m" >
<!-- quotes -->

              < Apply pattern =
"s/^(.*)$/\900color:blue\901$1\902/s" />

         </ Extract >
< Apply pattern =
"s/(\W)(import|package|boolean|byte|char|double|float|final|st
atic|transient|synchronized|private|protected|public|int|long|
short|abstract|class|interface|extends|implements|null|true|fa
l...