Subsetting baseforms to handle accent/sociolect variation
Original Publication Date: 2005-Aug-02
Included in the Prior Art Database: 2005-Aug-02
A major issue for speech recognition is accent and sociolet coverage. This proposal circumvents this issue by maintaining the general applicability of the technology and subsetting transcriptions/baseforms to enhance recognition capabilities and hitrate. The proposal is based on differences between the way in which the same word is pronounced across accents and sociolects.
Subsetting baseforms to handle accent /sociolect variation
A major issue for speech recognition is accent (and sociolect) coverage. The
common solutions include:
(a) dynamic speaker adaptation - whereby the reco engine gradually moves the probability ratings on base state and transition probabilities to suit a specific set of users;
(b) the application developer must provide many transcriptions (or "baseforms") to cater for the same word spoken by many speakers of different accents etc.
Dynamic adaptation under (a) does not cater for multiple variations per se but shifts the entire acoustic model towards a single and specific user population. It thereby loses the ability to cope with multiple accents. Application developer intervention under (b) proliferates the size of the "pronunciation dictionary" (the transcriptions making up the "baseform pool") with downstream effects on performance and efficiency. But more importantly fails to take advantage of predictable subsetting to be propogated during runtime.
This proposal circumvents both issues by maintaining the general applicability of the technology and subsetting transcriptions/baseforms to enhance recognition capabilities and hitrate.
This proposal subsets predictable accent and sociolect changes -- a trivial example being words like "bath" and "cup" in UK English for Northern versus Southern British English speakers - so that as soon as a recognition engine returns a result, it can subset all subsequent recognition attempts so that a smaller baseform pool is managed, and that hitrate increases without modification to the acoustic model by steering audio input to the appropriate subset of paths. A knock-on effect here would be both for initial acoustic model training, as well as for adaptation itself.
This proposal is based on common and known (documented) differences between the way in which the same word is pronounced across accents and sociolects.
There are both "rules" and common observations about how accents a...