Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Synchronizing Speech on Multiple Systems

IP.com Disclosure Number: IPCOM000114424D
Original Publication Date: 1994-Dec-01
Included in the Prior Art Database: 2005-Mar-28
Document File: 2 page(s) / 63K

Publishing Venue

IBM

Related People

Paradine, C: AUTHOR [+2]

Abstract

Real time transfer of audio information such as speech between computer systems is conventionally performed by a digital sampling process.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Synchronizing Speech on Multiple Systems

      Real time transfer of audio information such as speech between
computer systems is conventionally performed by a digital sampling
process.

      Speech samples are captured on one system using a clock
(typically 8 kc/s).  These samples are transmitted to another system
which plays them out through a speaker using a clock which is
nominally the same but which in practice will drift with respect to
the first clock.  This drift is too small to affect the continuity of
the speech.  However, it can mean that over a long period of time the
receiver may fall behind the sender and the listener suffers an
unnecessarily long delay before he hears the sender.  Or the receiver
may overrun the sender.

      The packets of samples (either fixed length or variable length)
also suffer a variable delay as they go through the system chiefly
from the network but also from the hardware and software at the
nodes.  I.e., they are not self-synchronising.

      This disclosure provides a simple way of maintaining a fairly
constant delay between the sender and receiver, even though no common
reference clock exists.

      Running with a small delay at the receiver (e.g., 100-200
msecs) is acceptable because it means that the receiver can absorb
fluctuations in the arrival time of packets over the net, presenting
continuous speech to the listener.  If this delay or latency is L,
then clearly continuity is lost when a packet arrives later than L.
(Generally the speech has to be filled with silence until the packet
arrives).  This condition is called an overrun - the playback
hardware has outstripped the provider.  This...