Dismiss
InnovationQ/InnovationQ Plus content will be updated on Sunday, June 25, 10am ET, with new patent and non-patent literature collections. Click here to learn more.
Browse Prior Art Database

Efficient solution for providing XML Schema "(nested) character class subtraction"

IP.com Disclosure Number: IPCOM000179548D
Original Publication Date: 2009-Feb-17
Included in the Prior Art Database: 2009-Feb-17
Document File: 2 page(s) / 26K

Publishing Venue

IBM

Abstract

This invention is on an efficient solution for providing "(nested) character class subtraction" to XML Schema.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Efficient solution for providing XML Schema "(nested) character class subtraction"

Background:

XML Schema Datatypes (http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/) comprise so called character classes, used mainly for

like restrictions of

's. For example the character class of "character or digit" is given by "[A-Za-z0-9]".

Sometimes it is needed to filter out some values or ranges from the character classes, which is called "character class subtraction": http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#nt-negCharGroup This may be found in another spec under the term "A - B": http://www.w3.org/TR/REC-xml/#sec-notation

For example the term "[a-z-[aeiou]]" matches all consonants. A more complicate and realistic example is this description of British postcodes ( http://en.wikipedia.org/wiki/UK

_postcodes):

"(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][ A-HJKSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})" This matches e.g. SW8 2LP.

This article discusses "character class subtraction" and states that only 2 engines (JGsoft and .NET) support this XML Schema feature: http://www.regular-expressions.info/xmlcharclass.html#subtract Even more, this page describes the "nested character class subtraction" aspect of the spec, which is not implemented anywhere: [0-9-[0-6-[0-3]]] is the same as [0-37-9].

Summary:

A typically option for implementing XML Schema pattern handing is to use translation to the PCRE library (http://www.pcre.org/pcre.txt). The base idea of the invention is in addition to the normal use of PCRE library:
prepending of each subtracted character class with 7 characters of fill in (#######) to the surrounding character class
local replacement of the fill in by making use of the "negative look ahead (?!...)" and "non capturing groups (?:...)"...