Browse Prior Art Database

Consistency check of words on link among hypertext documents Disclosure Number: IPCOM000021855D
Original Publication Date: 2004-Feb-13
Included in the Prior Art Database: 2004-Feb-13
Document File: 2 page(s) / 26K

Publishing Venue



A program is disclosed that checks consistency of terms used in a set of hyper-text documents. This program picks up anchor texts within a document that are not used in a referenced document that is specified in the anchor tag. This program helps document editors and directors make a dicision to modify the terms to keep consistency among documents.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 2

Consistency check of words on link among hypertext documents

Consistency check algorithm is shown below.

1. For each target HTML document, extract anchor text, (eg, a text enclosed  by a pair of <a> and </a> in a HTML document), referencing URL (eg, a url  described in the 'href' attribute within anchor tag), and document texts  referenced, and then construct a tuple with them.

page02.htm l


.....<a href="page02.htm l">インストー ル </a>...... .. ...

C onstruct a tuple

A nchor text , referencing U R L , referenced docum ent strings

http://.... /page02.htm l

インストー ル

... .. ...... .. ...

(http://.... /page02.htm l)

                         Fig.1: making a tuple of text, url, and document

2. For each anchor text in a tuple, expand it to several words that include  synonyms or translated words of anchor text by using an existing word or  translation database. In Fig 2, original anchor text is expanded into three  words including text itself, synonym and translated word.

A nchor text , referencing U R L , referenced docum ent strings

h ttp ://.... /p a g e 0 2 .h tm l ... ..

... ... .. ...

(h ttp ://.... /p a g e 0 2 .h tm l)

インストー ル


T ranslation

T e x t its e lf (A )

Synonym (A ')

T ra n s la tio n (A '')



インス トー ル



In sta ll

                           Fig. 2: Expansion of anchor text

3. For each expanded word, search the word in referenced document. If none  of expanded words are found, the anchor text is not used in referenced  document. This means that two documents (document that holds anchor text 


[This page contains 6 pictures or other non-text objects]

Page 2 of 2

and document referenced) may have no relation. It depends on the semantics of documents whether this is a problem or�...