Browse Prior Art Database

System and Method for Utilizing Meta Analyses for a Multi-Level Multi-Arm Bandit in Displaying Advertisement

IP.com Disclosure Number: IPCOM000246611D
Publication Date: 2016-Jun-21
Document File: 3 page(s) / 135K

Publishing Venue

The IP.com Prior Art Database

Related People

Hongxia Yang: INVENTOR [+5]

Abstract

Disclosed is a method and system for utilizing meta-analyses for a multi-level multi-arm bandit in displaying advertisement. The multi-level multi-arm bandit accompanied with a meta-analysis evaluation framework is used to balance cost-per-action (CPA) exploitation and exploration for exploring bidding space in different bidding prediction models.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

System and Method for Utilizing Meta Analyses for a Multi-Level Multi-Arm Bandit in Displaying Advertisement

Abstract

Disclosed is a method and system for utilizing meta-analyses for a multi-level multi-arm bandit in displaying advertisement.  The multi-level multi-arm bandit accompanied with a meta-analysis evaluation framework is used to balance cost-per-action (CPA) exploitation and exploration for exploring bidding space in different bidding prediction models.

Description

A method and system is disclosed for utilizing meta-analyses for a multi-level multi-arm bandit in display advertisement.  The multi-level multi-arm bandit accompanied with a meta-analysis evaluation framework is used to balance cost-per-action (CPA) exploitation and exploration for exploring bidding space in different bidding prediction models.

The method and system utilizes the multi-level multi-arm bandit (MLMB) to encompass overlapping of an arm and delayed conversion rates feedback in a unified framework by extending Thomson sampling.  Firstly, the Thomson sampling can be detailed by assuming winning impressions Yt = (y1,….yt) which are represented as sequence of observed rewards and at = (a1,…..at) is a strategy, arm/model selected at each time step. 

In a scenario, each arm is supported by one kind of machine learning model for conversion rate (CVR) prediction and also utilizes the same arm to represent a CVR prediction model.  Let β be a collection of parameters which control features of the MLMB as respective rewards for each arm of the impact of covariates.  Let f(yt/at = k, β) be the likelihood function or reward distribution and П (β) be prior distribution.  For instance, binary rewards in this scenario (such as click or not, or conversion or not) can be defined as

where, βk is the probability of success in arm k, and for each k, the method and system assign a β(α0, α1) prior as:

Then, the resulting posterior after observing successes s (e.g., click or conversion) in n trials (impressions) of arm k is β (s + α0,n – s + α1).  Thus, the Thompson Sampling formulation is used to select a sample from each of the posterior distribution and pull the arm with the highest valued sample.

Further, on demand side platform (DSP+), the learning arms can be broadly divided into three levels.  The three levels can be, such as Rule based optimum arms or publisher based dynamic strategies, Machine Learning arms and Default bid strategy.  Here, the machine learning arms can be...