Browse Prior Art Database

An Online Experimentation Platform Monitoring Tool

IP.com Disclosure Number: IPCOM000241963D
Publication Date: 2015-Jun-11
Document File: 5 page(s) / 332K

Publishing Venue

The IP.com Prior Art Database

Related People

Miao Chen: INVENTOR [+3]

Abstract

An online experimentation platform monitoring tool is disclosed. The monitoring tool implements statistical tests at two stages to reveal problematic experiment layer as well as identify problematic hash values. Data from a Hadoop cluster is automatically fetched and reports are generated on a dashboard on a daily basis by the monitoring tool. The monitoring tool is built to support large-scale experimentation system, in terms of large user counts, multiple products, and multi-layering experimentation platform.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 35% of the total text.

An Online Experimentation Platform Monitoring Tool

Abstract

An online experimentation platform monitoring tool is disclosed.  The monitoring tool implements statistical tests at two stages to reveal problematic experiment layer as well as identify problematic hash values.  Data from a Hadoop cluster is automatically fetched and reports are generated on a dashboard on a daily basis by the monitoring tool.  The monitoring tool is built to support large-scale experimentation system, in terms of large user counts, multiple products, and multi-layering experimentation platform.

Description

In the traffic splitting system, user units, e.g. user Identifier (ID) or browser cookie, are hashed through a well-defined hash function.  In multi-layering experimentation platform, on each experimentation layer, the count of user units is required to be uniformly distributed across all hash values.  In theory, this can be easily achieved by selecting a good hash function.  However, in practice the traffic distribution can be non-uniform due to various causes.  One example is that some experiments can contaminate the user unit ID and thus, the user units assigned to the given hash range is either higher or lower than expected.  Then this contaminated hash range is no longer a healthy range to be used for other future experiments.  Thus, the healthy status of the traffic splitting system is required to be monitored, and the problematic experimental layers and hash value ranges are required to be reported.  Online controlled experiment (A/B testing) is frequently used for evaluating innovative approaches at Internet companies.  One key premise for online experiment is that users are randomly assigned into different experiment groups, such that there is no systematic bias among user groups.  For the experimentation platform, there is a need to keep track of the health status of the traffic splitting system.

Disclosed is an online experimentation platform monitoring tool.  The online experimentation platform monitoring tool is a traffic splitting monitoring tool which implements statistical tests at two stages to reveal problematic experiment layer as well as identify problematic hash values.  The traffic splitting monitoring tool automatically fetches data from a Hadoop cluster and generates reports on a dashboard on a daily basis.  Further, the traffic splitting monitoring tool is built to support large-scale experimentation system, in terms of large user counts, multiple products, and multi-layering experimentation platform.

On a daily basis, the traffic splitting monitoring tool automatically pulls data from the Hadoop cluster using an oozie scheduler and hashes the active user unit using the same hash function as in experimentation platform using pig script.  Thus, a mapping is generated of a user unit to hash value, and further to bucket Identifier (ID) assignment.  The processed data is stored in a local database.  In theory, the user unit...