Surety is performing system maintenance this weekend. Electronic date stamps on new Prior Art Database disclosures may be delayed.
Browse Prior Art Database

A Voice Assistant to Recognize Dialect

IP.com Disclosure Number: IPCOM000249156D
Publication Date: 2017-Feb-08
Document File: 3 page(s) / 140K

Publishing Venue

The IP.com Prior Art Database


The invented system provides a method and system to recognize dialect, which can help people who can not speak standard language to communicate with outside electric devices. In summary, a universal framework to recognize various dialects is proposed, and a user-friendly interface to interact with other systems is implemented.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 80% of the total text.


A Voice Assistant to Recognize Dialect

1. The problem

Many people can not speak standard languages (such as Mandarin, English, etc.). They can only speak local dialect well. Our invention provides

a voice assistant for them to communicate directly with outside electric devices via local dialect. For example, a person can speak local dialect to

directly open a browser in his computer.

Figure 1: The problem

2. Main Idea

Method and system of translating various dialects into system actions by:

― Use end-to-end neural network to build speech model;

― Use speech model to translate dialect into text;

― Map text into system action through semantic analysis.

3. How it works

(1) System Framework

As shown in Figure 2, the invented system chiefly consists of system engine, speech model, voice input, user interface, semantic analysis, and

actuator. The system engine processes the voice input, and it outputs discrete words based on speech model. Then the words are mapped into

system actions through semantic analysis. The actuator invokes the concrete system responses and presents the results to the user via user

interface and action speaker.


Figure 2: System framework

(2) Speech Model

A speech model translates dialect speech into text. As shown in Figure 3, adequate dialect speech data are collected. After preprocessing, an

end-to-end neural network model is utilized to understand the voice of the speaking, and it outputs the corresponding text. The neural network

includes convolutio...