Surety is performing system maintenance this weekend. Electronic date stamps on new Prior Art Database disclosures may be delayed.
Browse Prior Art Database

Efficient solution for providing XML Schema "(nested) character class subtraction"

IP.com Disclosure Number: IPCOM000179548D
Original Publication Date: 2009-Feb-17
Included in the Prior Art Database: 2009-Feb-17
Document File: 2 page(s) / 26K

Publishing Venue



This invention is on an efficient solution for providing "(nested) character class subtraction" to XML Schema.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Efficient solution for providing XML Schema "(nested) character class subtraction"


XML Schema Datatypes (http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/) comprise so called character classes, used mainly for

like restrictions of

's. For example the character class of "character or digit" is given by "[A-Za-z0-9]".

Sometimes it is needed to filter out some values or ranges from the character classes, which is called "character class subtraction": http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#nt-negCharGroup This may be found in another spec under the term "A - B": http://www.w3.org/TR/REC-xml/#sec-notation

For example the term "[a-z-[aeiou]]" matches all consonants. A more complicate and realistic example is this description of British postcodes ( http://en.wikipedia.org/wiki/UK


"(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][ A-HJKSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})" This matches e.g. SW8 2LP.

This article discusses "character class subtraction" and states that only 2 engines (JGsoft and .NET) support this XML Schema feature: http://www.regular-expressions.info/xmlcharclass.html#subtract Even more, this page describes the "nested character class subtraction" aspect of the spec, which is not implemented anywhere: [0-9-[0-6-[0-3]]] is the same as [0-37-9].


A typically option for implementing XML Schema pattern handing is to use translation to the PCRE library (http://www.pcre.org/pcre.txt). The base idea of the invention is in addition to the normal use of PCRE library:
prepending of each subtracted character class with 7 characters of fill in (#######) to the surrounding character class
local replacement of the fill in by making use of the "negative look ahead (?!...)" and "non capturing groups (?:...)"...