Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method and System for Identifying Templates in Machine Generated Email

IP.com Disclosure Number: IPCOM000241874D
Publication Date: 2015-Jun-05
Document File: 1 page(s) / 22K

Publishing Venue

The IP.com Prior Art Database

Related People

Zohar Karnin: INVENTOR [+3]

Abstract

A method and system is disclosed for identifying templates in machine generated email by parsing email content. The method and system includes identifying substrings in the email content and extracting the substrings. The extracted substrings are compared with a plurality of emails. If the plurality of emails include the same substrings, then the plurality of emails include same templates.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 54% of the total text.

Method and System for Identifying Templates in Machine Generated Email

Abstract

A method and system is disclosed for identifying templates in machine generated email by parsing email content.  The method and system includes identifying substrings in the email content and extracting the substrings.  The extracted substrings are compared with a plurality of emails.  If the plurality of emails include the same substrings, then the plurality of emails include same templates.      

Description

Disclosed is a method and system for identifying templates in machine generated email by parsing email content.  The method and system includes identifying substrings in the email content and extracting the substrings.  The extracted substrings are compared with a plurality of emails.  If the plurality of emails include the same substrings, then the plurality of emails include same templates.

In accordance with the method and system, identification of templates in email messages includes an offline phase and an online phase.  During the offline phase, a pool of email messages from a domain are identified and collected. Each email message is transformed into a small set of fingerprints in a way that similar email messages will have roughly the same set of fingerprints.  Thereafter, a cluster of email messages are identified that have roughly a same set of fingerprints.  Each cluster of email messages is a template.  For every template a few fingerprints which appear in a maxim...