One Versus All OVA Classification System for ICD10 Coding
Publication Date: 2015-Oct-06
The IP.com Prior Art Database
This paper describes a method to train the Library of Support Vector Machines LibSVM linear kernel classifier for an ICD10 code The method uses a fixed large number of randomly drawn negatives without regard to the number of positive examples that are available An implemented ensemble approach is leveraged that uses independent features to eliminate false positives from primary ICD10 code classifiers with the help of assistant classifiers which are trained for ICD10 code characteristics instead of the code itself
Page 01 of 3
One-Versus-All (OVA) Classification System for ICD10 Coding Abstract
This paper describes a method to train the Library of Support Vector Machines (LibSVM) linear kernel classifier for an ICD10 code. The method uses a fixed large number of randomly drawn negatives without regard to the number of positive examples that are available. An implemented ensemble approach is leveraged that uses independent features to eliminate false positives from primary ICD10 code classifiers with the help of assistant classifiers, which are trained for ICD10 code characteristics instead of the code itself.
Support Vector Machines (SVM) are popular and powerful classification algorithms, which are used in many classification-related problems including text mining and document classification. SVM have natural applicability to medical coding because clinical notes and encounters are text documents that have a set of assigned codes where an SVM classifier can be trained for each code. It is straightforward to apply the model in run-time coding. However, a primary concern is the scalability of the system.
Because the coding problem can be considered as a multi-class classification problem with thousands of classes (one for each code), and it is not practical to develop classifiers to compare each pair of classes, a one-versus-all (OVA) training strategy is proposed. In this type of model, there is a classifier corresponding to each class, intended to distinguish positive examples of that class from all the other examples. SVM has many built-in optional kernels and a linear kernel is selected for computational performance considerations based on characteristics of the datasets.
The SVM approach presented has several important properties. The approach:
1. Implements data-driven SVM parameter computation instead of using default values or expensive grid optimization.
2. Assembles feature vectors using corpora independent term frequency by implementing a new normalization formula to build feature vector out of term frequency (TF) instead of using binary or term frequency - inverse document frequency (TF-IDF) normalization. TF-IDF is a numerical statistic that reflects how important a word is to a document in a corpus.
3. Uses a large fixed number of negative examples in model training in order to control the false positive rate.
4. Leverages posterior probability instead of raw score to define positives in the SVM coding process.
5. Utilizes a lightweight model integration strategy to make the one-versus-all SVM classification system more scalable in coding process.
Each of these properties will be described in detail. All the evaluation scores, if any, cited in this document are obtained within the following experimental setting:
Data set used: 16,463 nosologist annotated clinical notes present with dictionary annotations. Two-thirds of the notes were used to train the classifiers and the rest (one-third) were u...