Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Algorithm for Determining the Scope of HTML Tags

IP.com Disclosure Number: IPCOM000123496D
Original Publication Date: 1998-Dec-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 3 page(s) / 81K

Publishing Venue

IBM

Related People

Wosnick, SB: AUTHOR [+2]

Abstract

An HTML (HyperText Markup Language) tag need not have an explicit end tag. End tags can be required, optional or forbidden. Also, some HTML tags can be nested only within certain tags, and some cannot be nested at all. An HTML tag can be ended explicitly by its end tag, or implicitly by the start or end of another tag. When processing HTML tags, it is necessary to be able to determine when an HTML tag has ended. This algorithm gives an efficient way of determining when the scope of an HTML tag is ended.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Algorithm for Determining the Scope of HTML Tags

   An HTML (HyperText Markup Language) tag need not have
an explicit end tag.  End tags can be required, optional or
forbidden.  Also, some HTML tags can be nested only within certain
tags, and some cannot be nested at all.  An HTML tag can be ended
explicitly by its end tag, or implicitly by the start or end of
another tag.  When processing HTML tags, it is necessary to be able
to determine when an HTML tag has ended.  This algorithm gives an
efficient way of determining when the scope of an HTML tag is ended.

   The HTML specification specifies what each HTML tag can
contain, what tags it can be contained in, and whether an end tag is
required, optional or forbidden.  By defining a table which contains
this information and by defining a set of processing rules, one can
easily determine when the scope of an HTML tag has ended.

   A hierarchy of elements is defined and each HTML tag is
in exactly one of these elements.  The hierarchy is as follows:
  Single Block Element   Block Element...Single or Block Element
    |                      |                        .
    ........................                         .
               |                                     .
.
           Text Element...............................
               |
        ...................................
        |                                 |
      Leaf Element              Empty Element
        |
   Inline Element

   Here is the definition of the elements:
  o  Single Block Element
     -  can contain Text Element, Leaf Element, Inline Element,
        Empty Element or character texts
     -  is implicitly ended by a Single Block Element or Block
        Element
  o  Block Element
     -  can contain another Block Element, Text Element, Leaf
        Element, Inline Element, Empty Element or character
        texts
     -  is implicitly ended by end of file
  o  Text Element
     -  can contain another Text Element, Leaf Element, Inline
        Element, Empty Element or character texts
     -  is implicitly ended by a Block Element
  o  Leaf Element
     -  can contain Inline Elements or character texts
  ...