Browse Prior Art Database

Spanish Suffix Table for Extracting Word Stems

IP.com Disclosure Number: IPCOM000118670D
Original Publication Date: 1997-May-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 2 page(s) / 62K

Publishing Venue

IBM

Related People

Porter, TW: AUTHOR

Abstract

Disclosed is a suffix rules file specific to the Spanish language which is to be used in conjunction with the Paice stemming algorithm.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Spanish Suffix Table for Extracting Word Stems

      Disclosed is a suffix rules file specific to the Spanish
language which is to be used in conjunction with the Paice stemming
algorithm.

      Stemming words is one approach to generating a search and
retrieval index which is smaller and more efficient than a full text
index.  One stemming algorithm which has been published is the Paice
Stemmer.  It requires a list of suffixes and rules to apply when
removing the suffixes to generate a word stem.  A suffix rules file
has been published for English but not for other languages.  The
algorithm shows a listing of the disclosed Spanish suffix rules file.
Note that the application of the algorithm is not case sensitive.

      The Paice algorithm works by parsing a "token" or "word" from
the input stream.  It then reverses the order of the characters in
the token and compares that to the suffix rules file.  The suffix
rules file is checked in the order given, from the first line to the
last line.  If the first token of one of the rules in the suffix
rules file is an exact subset of the first <n> characters of the
input token, then the rule is applied.  The rule is the remaining
tokens of that line in the suffix rules file.

      The optional second token in a suffix rules file line may be an
asterisk (*).  This indicates that the rule is to be applied only if
this is the first rule which matched the input token.  The next token
in the suffix rules fil...