Browse Prior Art Database

Method and System for Predicting Advertisement Relevance using Phrase-Based Click and Text Features

IP.com Disclosure Number: IPCOM000195757D
Publication Date: 2010-May-17
Document File: 2 page(s) / 40K

Publishing Venue

The IP.com Prior Art Database

Related People

Eren Manavoglu: INVENTOR [+6]

Abstract

A method and system for predicting advertisement relevance using phrase-based click and text features is disclosed. The method filters rare queries and advertisements or advertisements using phrase-based click and text features.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Method and System for Predicting Advertisement Relevance using Phrase-Based Click and Text Features

Abstract

A method and system for predicting advertisement relevance using phrase-based click and text features is disclosed.  The method filters rare queries and advertisements or advertisements using phrase-based click and text features.

Description

Disclosed is a method and system for relevance filtering for rare queries and advertisements.  In order to filter rare queries, salient phrases with probabilistic weights are extracted using a segmentation approach.  Further, phrase-based features are built for a search relevance model.  Additionally, the user click information is extended to tail query-ad pairs that do not accumulate sufficient unique traffic, but have sufficient statistics when considering segment based history.

In an instance, for relevance filtering, a binary classifier trained to detect relevant and non-relevant advertisements for a particular query may be used.  The baseline model has 19 text-based features.  The 19 text based features include query length and six features that compare the query to multiple zones of an advertisement.  The multiple zones of an advertisement may be but are not limited to the title, description, and display URL.  Additionally, the six features include word overlap (unigram and bigram), character overlap (unigram and bigram), cosine Term Frequency – Inverse Term Frequency (TF-IDF) similarity, and a proximity feature that counts the number of bigrams in the query that preserve the order of the words in the advertisement zone.  Further, historical click rates for a query-ad pair may also be used as a feature to provide a strong indication of relevance.  When there is not sufficient click history for a specific query-ad pair, aggregate history across all advertisements in a campaign or an entire account is used.  The aggregate history provides observed click behavior on similar advertisements from the same advertiser.  

Thereafter, phrase-based overlap features and phrase-based historical click-rate features are used to improve the baseline model.  In this regard, phrases are extracted via a machine learned segmentation model.  Additionally, for sequential labeling tasks such as, segmentation and part-of-speech tagging, as well as in web search linear chain Conditional Random Fields (CRFs) are used.  CRFs facilitate use of arbitrary feature fu...