Browse Prior Art Database

An efficient method of scoring a sequence rules model

IP.com Disclosure Number: IPCOM000238318D
Publication Date: 2014-Aug-18
Document File: 6 page(s) / 109K

Publishing Venue

The IP.com Prior Art Database

Abstract

The present invention provides an efficient method of scoring a sequence rules model which can work on both non-distributed and distributed systems.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 45% of the total text.

Page 01 of 6

An efficient method of scoring a sequence rules model

The sequence rules model represents rules for various sets or items, sequence rules describe patterns in sequences.For instance, a rule can express that after purchasing products A and B, customers tend to buy product C sooner or later as well. With sequences rules, you can predict what comes next, it's widely used in various business areas. For example, in the retail industry, you can find typical series of purchases.

Once a model is built, the scoring process is a very frequent procedure for making decision, so the performance is the most important criterion for a scoring engine. Compared with the traditional method that called obverse looking up, here, we propose a method of reverse looking up scoring a sequence rules model to improve the performance greatly, and this new method can be easily deployed into a distributed system with the workload balanced.

The behavior of scoring a sequence rules model is very different from other common models; it's not record by record, and needs to read multiple records at the same time for a same group field that groups the item sets into transaction groups. For the method ofobverse looking up, it firstly constructs all possible antecedent sequences, the scale of possible antecedent sequences can be very huge, which grows almost by geometric ratio. The following is the formula to compute the total number of max possible antecedent sequences for a single transaction group:

Notes: Ris the number of records which have the same group value; Nis the number of item fields, Mis the number of rules in PMML. Suppose R= 10 and N= 3, the number of max possible antecedent sequences will be 1073741823
Once all possible antecedents are available, it needs to traverse them all, and find their consequent sequences in all sequence rules of the PMML model, if the consequent sequence exists, then record it and its confidence. Finally, sort all consequent sequences by their confidences, return the top Krules as the

final predictions. Its time complex is O( ).

1


Page 02 of 6

For the proposed method of reverse looking up, it does not construct any possible antecedent sequence, it justreads into all sequence rules of the PMML model into memory, and then traverse them all, if one rule exists in the input data, then record it and its confidence, the next is same as the previous

method, return the top Krules as the final predictions. Its time complex is O(M N R ), more Nand Rincreasing, more performance improved, its performance is improved greatly for big data.

Unlike other models based on record by record, we cansee the procedure of scoring a sequence rules model is a very massive task, and the scale of a sequence rules model in many practical areas such as bigsupermarket chains can be millions to billions rules, when use this big model to make predictions for a big data, the scoring method based on a non-distributed system will be a main bottleneck of performanc...