Browse Prior Art Database

Organization of a List of Italian Words and Related Search Algorithm

IP.com Disclosure Number: IPCOM000057291D
Original Publication Date: 1988-Apr-01
Included in the Prior Art Database: 2005-Feb-15

Publishing Venue

IBM

Related People

Authors:
Paoli, C [+details]

Abstract

The article describes the organization of a list of Italian words stored on a disk and having a tree structure based on a sequence of three consecutive characters (triplets) and a related algorithm to check for the presence of a given word in the list. From an examination of about 360,000 words, it appears that the number of different triplets allowed at the beginning of a word is 1,800, the possible triplets allowed in any position of a word are about 3,600 and the number of words having the first six characters identical is less than 30 in 99% of the cases. The list is structured so as to use the two initial triplets of the word to address the portion of the disk where the word, if it exists, is stored. Words (about 300) having a length less than four characters are checked using an exhaustive alphabetic list.