Browse Prior Art Database

Spam Suppression via Client Side "Honeypot" Technique

IP.com Disclosure Number: IPCOM000132093D
Original Publication Date: 2005-Dec-01
Included in the Prior Art Database: 2005-Dec-01
Document File: 1 page(s) / 27K

Publishing Venue

IBM

Abstract

Recognizing that a large number of people have multiple email addresses (personal such as ones ISP and perhaps web based via say hotmail, organizational such as via a volunteer charity, business, or perhaps indirect via a mailing list), and that security personnel have had much success using "honeypots" to deflect attempts at unauthorized use of information systems; We propose using a honeypot technique on the email client to detect SPAM which would otherwise evade detection by the exiting techniques.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 1

Spam Suppression via Client Side "Honeypot" Technique

Worldwide more than 60% of email is SPAM . As a result of this, it costs businesses and individuals thousand of non-productive hours to deal with this on-slot of unsolicited mail with the likelihood that their client systems may be infected through security lapses with SPYWARE, which will transmit personal information to unauthorized parties, or trojans, which turn the machine into a SPAM BOT (a machine used to send SPAM) . Various steps have been taken by both email servers and clients to suppress SPAM but, although much gets suppressed, a great deal still shows up in ones personal mail folder. Often this is because the spammers will slightly vary the content of messages and mail headers to avoid the document classification algorithms (eg Bayesian filter ) often used on the client system and the header analysis and duplicate message comparision techniques used at the server. What is needed is an additional personalized filtering technique that can reduce the number of messages that escape the current techniques.

Document fingerprint algorithms compare pairs of documents and generate a probability that they share content. This class of algorithm is used for such things as identifing plagiarized documents. A typical application would involve comparing a student paper against a library of scholarly works to identify un-attributed copying of portions of one or more of the library of works into the student submission. The more sophisticated of these algorithms would be able to identify content fragment copies in which the fragment had been slightly modified.

Document classification algorithms create a knowledge-base of the characteristics associated with one or more classifications (eg 'project x', 'medical procedure', 'Spam', ...) and then allow one to generate the classification probabilities using the knowledge base for a given subject document. The Bayesian algorithm is a simple form of document classifier for a single class based on the characteristic of word probabilities in a knowledge base. Classification algorithms are used by many mail servers and mail clients to identify Spam.

Ou...