Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

A Method and System for Predicting Gender and Age of Users

IP.com Disclosure Number: IPCOM000239716D
Publication Date: 2014-Nov-27
Document File: 3 page(s) / 63K

Publishing Venue

The IP.com Prior Art Database

Related People

Allie Watfa: INVENTOR [+2]

Abstract

A method and system is disclosed for predicting gender and age of users. The method and system predicts the gender based on browsing behavior of the users. The method and system proposes a solution based on a known framework, such as, a Pig framework that predicts users' gender at an hourly granularity. The framework includes two main steps, first, learning from publisher side impression data and an age tendency associated with webpages, and second, predicting unknown users' age from demographic information using a Bayesian framework.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 47% of the total text.

A Method and System for Predicting Gender and Age of Users

Abstract

A method and system is disclosed for predicting gender and age of users.  The method and system predicts the gender based on browsing behavior of the users.  The method and system proposes a solution based on a known framework, such as, a Pig framework that predicts users' gender at an hourly granularity.  The framework includes two main steps, first, learning from publisher side impression data and an age tendency associated with webpages, and second, predicting unknown users' age from demographic information using a Bayesian framework.

Description

Disclosed is a method and system for predicting gender and age of users.

The method and system utilizes a data science approach for reducing number of unknowns in gender in reports such as Yahoo Ad Manager Plus* (YAM+) reports.  The method and system uses the data from publisher side platforms to predict the gender of unknown users using Hadoop MapReduce** framework.  Since YAM+ reports data from multiple exchanges, the method and system is unable to predict the gender of unknown users coming from other exchanges.  However, for some users the method and system uses the data from the publisher side platforms namely Right Media Exchange*** (RMX) to predict the unknown genders and give the advertisers some more insight into user data.

The method and system makes the predictions based on the browsing pattern of users with known gender active within that hour.  Here, the method and system utilizes the simple yet powerful Naive Bayes classification algorithm.  The proposed algorithm selects the training set data from the same hour for which the predictions are to be made.  The method and system utilizes data obtained from a YAM+ batch data processing pipeline.  The pipeline produces five minute feeds of YAM+ impressions that are reported to advertisers.  Additionally, the pipeline produces the rmx_ybb_impressions feed, which is the impression feed, post Traffic Protection filtering, from the RMX (YAX) publisher side platform.

The RMX impressions feed is a five minute feed that includes a list of impression events that happened in the five minute duration.  The feed typically contains approximately 300; 000 impression events.  Each impression event in RMX includes the following three fields among many others, event_guid, gender and section_id.

The section id field contains the id of the section of a particular webpage that was displayed to the user.  The id information is used to get a count of people with known genders visiting that particular webpage or section.  The event_guid is the unique event identifier which is used to join the RMX impression events with the YAM+ impression events.  The gender field is one of M, F and U depending on whether the gender is known on the publisher side.  Similarly, the YAM+ impressions feed is a five minute feed containing approximately one million events.  Each event includes the f...