A program is disclosed that checks consistency of terms used in a set of hyper-text documents. This program picks up anchor texts within a document that are not used in a referenced document that is specified in the anchor tag. This program helps document editors and directors make a dicision to modify the terms to keep consistency among documents.

Consistency check algorithm is shown below.

1. For each target HTML document, extract anchor text, (eg, a text enclosed  by a pair of <a> and </a> in a HTML document), referencing URL (eg, a url  described in the 'href' attribute within anchor tag), and document texts  referenced, and then construct a tuple with them.

page02.htm l


.....<a href="page02.htm l">インストー ル </a>...... .. ...

C onstruct a tuple

A nchor text , referencing U R L , referenced docum ent strings

http://.... /page02.htm l

インストー ル

... .. ...... .. ...

(http://.... /page02.htm l)

                         Fig.1: making a tuple of text, url, and document

2. For each anchor text in a tuple, expand it to several words that include  synonyms or translated words of anchor text by using an existing word or  translation database. In Fig 2, original anchor text is expanded into three  words including text itself, synonym and translated word.

A nchor text , referencing U R L , referenced docum ent strings

h ttp ://.... /p a g e 0 2 .h tm l ... ..

... ... .. ...

(h ttp ://.... /p a g e 0 2 .h tm l)

インストー ル


T ranslation

T e x t its e lf (A )

Synonym (A ')

T ra n s la tio n (A '')



インス トー ル



In sta ll

                           Fig. 2: Expansion of anchor text

3. For each expanded word, search the word in referenced document. If none  of expanded words are found, the anchor text is not used in referenced  document. This means that two documents (document that holds anchor text 


and document referenced) may have no relation. It depends on the semantics of documents whether this is a problem or�...