Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method and System for Jointly Detecting Spammer and Spam Emails with Minimal Supervised Effort

IP.com Disclosure Number: IPCOM000237447D
Publication Date: 2014-Jun-18

Publishing Venue

The IP.com Prior Art Database

Related People

Achint Thomas: INVENTOR [+4]

Abstract

A method and system is disclosed for jointly detecting spammer and spam emails with minimal supervised effort. The method and system introduces a two-dimensional anti-spam measurement in an unsupervised framework for joint detection of spam email and spammers by exploiting the heterogeneous user-email network structure. The method and system incorporates blacklists and updates them automatically.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 17% of the total text.

Method and System for Jointly Detecting Spammer and Spam Emails with Minimal Supervised Effort

Abstract

A method and system is disclosed for jointly detecting spammer and spam emails with minimal supervised effort.  The method and system introduces a two-dimensional anti-spam measurement in an unsupervised framework for joint detection of spam email and spammers by exploiting the heterogeneous user-email network structure.  The method and system incorporates blacklists and updates them automatically.

Description

Disclosed is a method and system for jointly detecting spammer and spam emails with minimal supervised effort.  The method and system jointly detects spam emails and spammers within a heterogeneous email network, which is built from email data to capture intra and inter relationships of users and emails.  The method and system introduces a two-dimensional anti-spam measurement, and constructs an iterative, propagation based algorithm by exploiting the heterogeneous network structure.

The method and system employs a heterogeneous email graph to capture the intra and inter relationships of users and emails.  For a typical email dataset, there are two types of objects, users or email addresses and emails.  Let   and  be the sets of users and emails where n and m are the numbers of users and emails respectively.  The heterogeneous email network is introduced to capture the intra and inter relationships among users and emails as shown in fig. 1.

Figure 1

The heterogeneous email network includes three sub-networks, a user-user communication network, a user-email network and an email-email network.  Adjacency matrices of the user-user communication network is represented by , the user-email network as , and the email-email network as .

 denotes the adjacency matrix of the user-user communication network, which is built from emails' headers.  Each email header contains a sender, stored in the from field, and a list of recipients, stored in the to and cc fields.  A link is added from a sender  to each of the user’s recipients and the resulting user-user network is directed.  There are two types of users in , spammers and non-spammers, wherein the spammers and non-spammers are related to each other by four types of relations.  The four relations are from spammers to non-spammers, from non-spammers to non-spammers, from spammers to spammers, and from non-spammers to spammers as shown in fig. 2(a).

Figure 2(a)

Some key observations are obtained about the user-user network that are, Non-spammers seldom communicate to spammers initiatively and spammers frequently send emails to each other to disguise themselves as normal users.  These unique properties corresponding to the observations of the user-user network distinguishes the user-user network from other types of social networks such as friendships in a social network follow relations in another social network.

 denotes the adjacency matrix of the user-email network where a u...