Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Spam Policing Scheme

IP.com Disclosure Number: IPCOM000019943D
Original Publication Date: 2003-Oct-13
Included in the Prior Art Database: 2003-Oct-13
Document File: 1 page(s) / 56K

Publishing Venue

IBM

Abstract

X

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 72% of the total text.

Page 1 of 1

Spam Policing Scheme

  The ASN nodes implement a distributed SPAM database that is constantly being updated and synchronized. Mail messages are represented by a "fuzzy" hash-code. Mail messages that differ only in some words or use a different sequence of the same words result in the identical hash code. Hash-codes of this type are, for example, used in a so-called Distributed Checksum Clearinghouse project. The algorithm executed on the ASN nodes comprises several parts:

1. Each message is categorized as being either no SPAM, potential SPAM or very likely SPAM. The SPAM categorization is based on the well-known technique of Bayesian SPAM filtering combined with frequency observations, i.e. how often a mail message has been seen during a period of time. The result of the SPAM categorization is a pair of numbers; the filter output in the range of 0 (no SPAM) to 1 (most likely SPAM) and the observation frequency.

2. For messages with a SPAM probability above "potential SPAM", a "fuzzy" hash-code is computed. This hash-code, together with the SPAM categorization is distributed to all neighboring ASM nodes. This is called a SPAM database update.

3. An ASN node that receives a SPAM database update checks whether the messages in the update exists in its own database. For existing messages, a new SPAM categorization is computed based on the input of the neighbors. The newly computed SPAM categorization is then propagated further.

4. For messages in the "very likely SPAM"...