Browse Prior Art Database

Method and System for Categorizing Email Sender Domains by Exploiting Web Search Results

IP.com Disclosure Number: IPCOM000239717D
Publication Date: 2014-Nov-27
Document File: 2 page(s) / 32K

Publishing Venue

The IP.com Prior Art Database

Related People

Songjian Chen: INVENTOR [+3]

Abstract

A method and system is disclosed for categorizing one or more email sender domains by exploiting one or more web search results. In order to categorize the one or more email sender domains, the method and system extracts one or more features for the one or more email sender domains from the one or more web search results. Thereafter, the method and system utilizes one or more binary classifiers for each category using training data with the features extracted from the one or more web search results.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 52% of the total text.

Method and System for Categorizing Email Sender Domains by Exploiting Web Search Results

Abstract

A method and system is disclosed for categorizing one or more email sender domains by exploiting one or more web search results.  In order to categorize the one or more email sender domains, the method and system extracts one or more features for the one or more email sender domains from the one or more web search results.  Thereafter, the method and system utilizes one or more binary classifiers for each category using training data with the features extracted from the one or more web search results.

Description

Disclosed is a method and system for categorizing one or more email sender domains by exploiting one or more web search results.  The method and system combines wisdom of web users and search engines in order to categorize the one or more email sender domains.

The method and system categorizes the one or more email sender domains in two major steps.

In the first step, the method and system extracts one or more features from the one or more web search results.  The method and system extracts the one or more features for the one or more email sender domains from at least one of one or more queries issued by one or more web users that lead to clicks on the domain and one or more web pages displayed in search results with the one or more email sender domains in the query.

In accordance with the first step, when each domain is sent to the search engine as a query, the method and system parses one or more search result pages obtained as a result of the query.  Thus, the method and system collects one or more keywords and phrases in the one or more search result pages.  The method and system, then, retains top ‘M’ keywords/phrases based on Term Frequency (TF)/ Inverse Document Frequency (IDF) scores.  The method and system also collects one or more search queries for each domain that leads to clicks on the domain from search logs.  Thereafter, the method an...