VOICE INTERFACE FOR OPTICAL PRODUCT
Publication Date: 2005-Jan-24
The IP.com Prior Art Database
User interfaces of current products are mainly based on optical indication output by the device (e.g. bar graph display, alphanumerical display). This is sufficient in most cases. However, for blind users or in case the user is remote from the device so that he cannot recognize the display, problems arise. This is especially the case if the information is not just "level of loudness" or the like, but a very detailed information, e.g. a playlist of an MP3 device having tens or hundreds of "free text" entries. In such cases it is desirable that the device outputs acoustic indications, preferably a voice output is used. Disadvantage is that a speech synthesizer having the capability to generate voice output in several languages requires huge memory capacity. One possibility to solve this problem would be to pre-record the information to be output, e.g. the title and artist name of an MP3 song. However, if some more detailed information about a certain title is available, then also huge memory capacity is needed. Solution: A first proposal is to extract the information needed to be voice output when an MP3 disk is created and to store the respective voice data on the same disk. No huge storage capacity for voice generation is thus required in the device, but the disk itself contains the necessary information. Required capacity on the disk for these voice information is assumed not very high, as it refers to approximately 200 songs each requiring about 40 words. Storage on the disk could be similar to the OSTA database for storing artist, genre etc. It is proposed to use a prefix, e.g. %text% to distinguish the voice data from other data. However, there is still required a huge database for the conversion "text to voice" of the extracted information. A second proposal refers to IPA, a selection of symbols for pronunciation. An ID3 tag is a comment without length restriction, so the auxiliary data field might be huge. It is proposed to store a database about how to pronounce all the tags on the disk. This database would not contain the complete voice information but only information about how to generate the voice. An alternative solution is to store the compressed voice itself. As synthesizer it is proposed to use e.g. "festival" which is a multi-language, license-free software (University of Edinburgh) in C++, or "flite" in C. Advantage: Makes possible voice assisted user interface using free text messages without the requirement of huge storage capacity.