Browse Prior Art Database

Automated teacher data selection instrument on deep learning system

IP.com Disclosure Number: IPCOM000249041D
Publication Date: 2017-Jan-30
Document File: 1 page(s) / 59K

Publishing Venue

The IP.com Prior Art Database

Abstract

We developed automated QA sifting function for Watson Natural Language Classifier(NLC) using the multiple number of classifiers trained by user feedback data and original training data.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 79% of the total text.

1

Automated teacher data selection instrument on deep learning system

When we train the corpus in IBM Watson Natural Language Classifier(NLC) using existing data such as customer's FAQ or the other knowledge, the trainers of NLC enter so many questions to Watson and the trainers also enter the correct answer in feedback function. After that, some SMEs in that area investigate the QA pairs generated by the trainers using a certain criteria and decide which QA pairs are adequate to register on Watson NLC as new training data . In order to reduce the work load of SME, we can develop batch job function to execute the procedure of SME's investigation. However, if there are some noise data in the QA pairs caused by a mistake, the batch job may register the wrong QA pair on Watson. We have to sift(filter) wrong QA pairs in advance to avoid the registration of them. We developed automated QA sifting function for Watson NLC.

We gather new QA feedback data on Watson NLC for a certain period. We divide them to several number of groups. In the diagram attached below, the number of group is 3. We make some NLC corpora using existing QA data plus new QA feedback data which was divided into some groups above. The number of corpus is also 3 in the diagram. Now we can investigate each feedback data is good or no good, using the NLC corpus created above. If new QA feedback data contains noise QA pair, Watson NLC answers correctly only 1/n times when the noise data is on the test, because ther...