Browse Prior Art Database

DOMAIN-SPECIFIC LANGUAGE MODEL USING DOMAIN LITERATURE AND EXPERTS' SPOKEN LANGUAGE

IP.com Disclosure Number: IPCOM000251170D
Publication Date: 2017-Oct-20

Publishing Venue

The IP.com Prior Art Database

Related People

Pranjal Daga: AUTHOR [+6]

Abstract

Techniques are provided to digitize and infer domain specific knowledge from conversations such as voice and video recordings. Existing Artificial Intelligence (AI) / Machine Learning (ML) algorithms and tools fail to provide an acceptable accuracy in transcription or semantic context generation. Described is a methodology to automate the process of building a Domain Language Model (DLM) using domain specific literature and experts' spoken language model. This improves the accuracy of transcription of audio and video to an acceptable level. The DLM may be used to infer knowledge from any form of customer conversations or any conversation recording, for use in conversational AI, bots, search, and troubleshooting assistants.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 16% of the total text.

Copyright 2017 Cisco Systems, Inc. 1

DOMAIN-SPECIFIC LANGUAGE MODEL USING DOMAIN LITERATURE AND EXPERTS’ SPOKEN LANGUAGE

AUTHORS: Pranjal Daga Qihong Shao

Dmitry Goloubew Gyana Dash

Antonio Nucci Carlos Pignataro

CISCO SYSTEMS, INC.

ABSTRACT

Techniques are provided to digitize and infer domain specific knowledge from

conversations such as voice and video recordings. Existing Artificial Intelligence (AI) /

Machine Learning (ML) algorithms and tools fail to provide an acceptable accuracy in

transcription or semantic context generation. Described is a methodology to automate the

process of building a Domain Language Model (DLM) using domain specific literature

and experts’ spoken language model. This improves the accuracy of transcription of audio

and video to an acceptable level. The DLM may be used to infer knowledge from any form

of customer conversations or any conversation recording, for use in conversational AI, bots,

search, and troubleshooting assistants.

DETAILED DESCRIPTION

Many Information Technology (IT) service organizations depend on their ability to

maintain and use domain knowledge. Human interactions serve as an important source of

such domain knowledge. One example includes conversations between the IT service

organization and its customers during interactive troubleshooting sessions, planning

workshops, and/or brainstorming sessions. In the age of electronic communications, many

interesting interactions continue to occur in verbal form (e.g., in real and virtual meetings,

face to face, over electronic media, etc.). To become a source of actionable knowledge,

these conversations are subjected to digitization on their “data to information to knowledge

to wisdom” (DIKW) journey. For more information regarding DIKW, see

https://en.wikipedia.org/wiki/DIKW_pyramid.

Copyright 2017 Cisco Systems, Inc. 2

Existing AI/ML algorithms or tools fail to provide acceptable accuracy in

transcription or semantic context generation to infer domain specific knowledge from

unstructured content and voice and video recordings (e.g., service request/ticketing

systems, customer meeting recordings, etc.). For verbal conversations, the first phase of

the process is transcribing the speech into text. State of the art automatic speech recognition

(ASR) systems for general conversations have reached super-human levels. For more

information regarding ASR, see https://www.microsoft.com/enus/research/blog/microsoft-

researchers-achieve-new-conversational-speech-recognitionmilestone/.

However, the state of the art in specialized knowledge domains is relatively low

quality. For instance, attempting to transcribe vendor-customer or intra-vendor

conversations often results in high double-digit error rates. This high error rate can be

attributed to several causes, including domain terminology, out of domain speaking, and

non-native language interactions.

With respect to domain terminology, fields rich with domain terminology (such as

IT or medicine) often include m...