Browse Prior Art Database

Machine Learning based Sequence Classification Algorithm

IP.com Disclosure Number: IPCOM000234928D
Publication Date: 2014-Feb-17
Document File: 4 page(s) / 26K

Publishing Venue

The IP.com Prior Art Database

Related People

Chaitanya GSK: INVENTOR

Abstract

A method is disclosed for decoding a Conditional Random Field (CRF) algorithm wherein the CRF algorithm is a machine learning based sequence classification algorithm. The CRF algorithm computes probabilities of sequences for a given set of data and stores the sequences in a model. The CRF algorithm queries the model to decode the model and return a best sequence set.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 54% of the total text.

Machine Learning based Sequence Classification Algorithm

Abstract

A method is disclosed for decoding a Conditional Random Field (CRF) algorithm wherein the CRF algorithm is a machine learning based sequence classification algorithm.  The CRF algorithm computes probabilities of sequences for a given set of data and stores the sequences in a model.  The CRF algorithm queries the model to decode the model and return a best sequence set.

Description

Disclosed is a method for decoding a Conditional Random Field (CRF) algorithm wherein the CRF algorithm is a machine learning based sequence classification algorithm.  The CRF algorithm computes probabilities of sequences and stores in a model.  The CRF algorithm queries the model to decode the model and return a best sequence set.

In accordance with the method, the algorithm performs a learning procedure and a decoding procedure.  During the learning procedure probabilities of sequences for a given set of data are computed and stored in a model.  During the decoding stage, the model is queried to return a best sequence set.  A Viterbi decoder is used to compute the best sequence set.   

The Viterbi decoder computes the best sequence set Q= {q1, q2, q3, …qT} for a given observation sequence O= {O1, O2, O3, …OT}.  A quantity δi is defined as

δi = maxof(Qi and Oi).  This value stands for a best score along a single path at time t.

Additionally, a back array is defined wherein the back array is used to track δt(i).  The Vierbo decoder uses a forward-backward procedure to figure out the best state sequence.  The Viterbi decoder computes best scores of all elements in column of a matrix and backtracks the path.

score(i,1) = probFromModel[i]*emissions[i][o-­‐1]

score(i,t) = emissions[i][o ­ t] * max{k in states} (score(k, t­1)*transition[k][i])

where, T = number of items in the input string

            L = number of labels in the model.

/* Compute the scores at (t, *). */

for t, 1 : T

/* Compute the score of (t, j). */

for i, 0 : L

for j, 0: L

/* Transit from (t-1, i) to (t, j). */

trans = TRANS_SCORE[i]

score = prev[i] + trans[j]

// Store this path if it has the maximum score.

if (max_score < score) {

   max_score = score

   /* Backward link (#t, #j) -> (#t-1, #i). */

   back[j] = i

}

/* Add the state score on (t, j). */

cur[j] = max_score + state[j]

The sequence of labels is built by searching for the best value in a last score array.  Using this index as a starting point in the back-pointer array, the best sequence is decoded.

/* Backtracking initiation: Find the node...