Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

System and Method Using a Trusted Secure Voice-to-Text Server for Non-Real-Time Conversion on a Mobile Device

IP.com Disclosure Number: IPCOM000131894D
Publication Date: 2005-Nov-21
Document File: 2 page(s) / 30K

Publishing Venue

The IP.com Prior Art Database

Abstract

A user receives a text message on a wireless communication device and wishes to provide a detailed reply. The user is in a position where perhaps the incoming correspondence is very quick to read, however the potential reply is very long. The user is not in a position to use the physical keypad of the device to conveniently compose a reply. An example is an email that is read while walking. Also the time for original message composition can be extended, where the user is not in a position for example to use a keypad (e.g. walking). A 'reply with voice'/'compose with voice' solution is ideal for such a situation. Importantly, I view this as an ideal 3G/UMTS extended application, due to the architecture proposed. Preamble: An innovative 5-function thumbwheel (like a thumbwheel with an additional upclick and downclick option) is instructive as an example of possible usage with reply with voice functionality. There can be also dedicated reply with voice buttons that have multi functions (similar to push to talk button). This 5 function thumbwheel, while within an email, does standard scroll up/down/click functions. This thumbwheel has 2 additional actions of upclick and downclick that would need to be context-sensitive. -”up-click" is "Voice Command" mode -"down-click" is "Reply/compose with Voice" mode.. In 'up-click', the uP has rudimentary capability to perform voice recognition of the standard mobile device command menu, likely by voice pattern training at the device. For example, the user may select reply, reply all , reply with voice, reply all with voice, delete, mark unread, file etc. only with their voice. The 'Voice Command' is not essential to the invention. In 'down-click' as long the application is in context (i.e. reply mode), the users voice will be recorded while 'downclick' is pressed. Use Case: 1-incoming email is read 2-user selects 'reply with voice'. 3. Device now is in compose-reply mode, while the user selects 'down-click' the device records sound . A subsequent icon is shown to indicate a voice recording. The email may NOT be sent at this point. 4. A menu item is now available (either via upclick voice recognition or via the standard thumbwheel click) to 'confirm voice2text'. Once this is selected, the voice file is securely sent the trusted voice2text server. The voice2text (V2T) server is associated with the email service. That is to say, by either the GME protocol or another bearer path (such as a bootstrapped VPN over the Internet), the voice file reaches the V2T server. Since this is not 'real-time', powerful and accurate algorithms may be employed on the V2T server to provide an accurate user experience. Once the V2T conversion is completed, the V2T server corresponds with the service to send a 'meta-email' with the users voice message converted to text. The turnaround time is envisioned to be ~20s. NOTE: There is a dependence (in one embodiment) for unlimited file attachment upload capability. In another less preferable embodiment, similar to MDS page fragmentation, the voice file may be sent in datagram sizes as per MDP limitations (e.g. 30KB or soon to be 60KB fragments). 5. The user visually browses the v2t text conversions and sends the email without ever having tapped on a key. It is easy to extend editing functions, e.g. the user dictates and converts V2T sentence by sentence, and the V2T reply email is coded (e.g. by colour) for each voice file/sentence that is converted. By selecting one colour, the user may thus choose it to be overwritten and reconverted with new voice text as an editing function. It is also easy to conceive of originating emails and calender events or any PIM function in this fashion, including directory lookup, subject and text dictations. Miscellaneous Thoughts: -Language selection for the V2T server may be important for accuracy, and the localization of the device may be included in a header of the voice file. -In no way does this invention preclude direct V2T conversions on the device, however, given current device uP speeds and the existing uP utilization by the radio/apps, I suspect that it would actually consume more battery and time recognizing random speech patterns than compressing and sending over the air the actual voice files for remote non real-time processing (some analysis is likely required to confirm this). Even server-based, real-time, commercial random voice recognition software is highly inaccurate (e.g. Bell Simpatico 'Emily'). -One compromise potentially applicable is where the V2T server could 'teach' the wireless device to recognize known sounds. For example, in off hours, the V2T server could send a compressed text vs. voice profile lookup table that are shown historically at the V2T server to be highly reliable. Now, these phonemes are recognized immediately by the wireless device and not sent OTA. - It is another option that the actual voice attachment be the email reply (less desirable). The recipient would then need to 'play' the reply.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

MOBILE DEVICE “REPLY WITH VOICE”

System and Method Using a Trusted Secure Voice-to-Text Server for Non-Real-Time Conversion on a

Mobile

Device

Disclosed Anonymously

A user receives a text message on a wireless communication device and wishes to provide a detailed reply. The user is in a position where perhaps the incoming correspondence is very quick to read, however the potential reply is very long. The user is not in a position to use the physical keypad of the device to conveniently compose a reply. An example is an email that is read while walking.

Also the time for original message composition can be extended, where the user is not in a position for example to use a keypad (e.g. walking).

A 'reply with voice'/'compose with voice' solution is ideal for such a situation. Importantly, I view this as an ideal 3G/UMTS extended application, due to the architecture proposed.

Preamble:

An innovative 5-function thumbwheel (like a thumbwheel with an additional upclick and downclick option) is instructive as an example of possible usage with reply with voice functionality. There can be also dedicated reply with voice buttons that have multi functions (similar to push to talk button).

This 5 function thumbwheel, while within an email, does standard scroll up/down/click functions. This thumbwheel has 2 additional actions of upclick and downclick that would need to be context-sensitive.

-”up-click" is "Voice Command" mode

-"down-click" is "Reply/compose with Voice" mode..

In 'up-click', the uP has rudimentary capability to perform voice recognition of the standard mobile device command menu, likely by voice pattern training at the device.  For example, the user may select reply, reply all , reply with voice, reply all with voice, delete, mark unread, file etc. only with their voice.  The 'Voice Command' is not essential to the invention.

In 'down-click' as long the application is in context (i.e. reply mode), the users voice will be recorded while 'downclick' is pressed.

Use Case:

1-incoming email is read

2-user selects 'reply with voice'.

3. Device now is in compose-reply mode, while the user selects 'down-click' the device records sound . A subsequent icon is shown to indicate a voice recording. The email may NOT be sent at this point.

4. A menu item is now available (either via upclick voice recognition or via the standard thumbwheel click) to 'confirm voice2text'. Once this is selected, the voice file is securely sent the trusted voice2text server. The voice2text (V2T) server is associated with the email service. That is to say, by either the GME protocol or another bearer path (such as a bootstrapped VPN over the Internet), the voice file reaches the V2T...