Surety is performing system maintenance this weekend. Electronic date stamps on new Prior Art Database disclosures may be delayed.
Browse Prior Art Database

System and method to isolate speaker's voice in VOIP conferencing solutions.

IP.com Disclosure Number: IPCOM000249209D
Publication Date: 2017-Feb-10
Document File: 4 page(s) / 50K

Publishing Venue

The IP.com Prior Art Database


Disclosed is a system for isolating speaker voice into multi conferencing environment. The sytems provides novel concept of real time voice comparison on relay server which has information about all the existing running conferences, to find if one user voice is participating into multiple conferences. It also triggers the algorithm to detect the voice needed into the conference by finding intended participants. The systems provides real time notification to end users if the other user voices are not found into existing conferences then taking user's permission with innovative user interface to allow those voices.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 42% of the total text.


System and method to isolate speaker 's voice in VOIP conferencing solutions .

With the advancement in technology that provides Audio Video conferencing capability , the use of Audio Video conferencing has grown manifolds. The organization adoption of Audio Video has seen rise for different collaboration meetings but it is still largely based on the very pre-determined, pre-defined and scheduled meeting for any specific project and teams purpose. Audio Video communication has great lot of traction for many 1xq calling capability and also in businesses like Banking, Retail, Whole sale markets, however again that is largely for the sake. There is no analytics and data consumption techniques being applied. The Audio Video communication gaining traction, it is important that we leverage this technology to its fullest specially in the age when Social Collaboration tools and platforms that we have today.

While most of time in a conference call every participant injects some or more background noise. This will result into a disturbance into the whole conference . In such scenario, there is a improved and feasible system or method is required which will reduce this bad experience to reduce or remove the background noise completely so that it is audible to everyone very clearly. Hence the system is disclosed in this article which will isolate participant voice from background before sending it to Media Control Unit. The system gives confidence to user that his voice & video is audible without bothering about background noise.

The core solution of the system is isolation of speaker's voice from unwanted voices / audio packets in a conference.

This can be achieved by creating a system that, 1. Client only send audio packets to remote party / MCU when video image processing detects the lip movement into the captured and processed video frame. 2. Mixer Unit (MCU), lookup the incoming audio/voice samples with other on-going conferences and remove the same on find the voice match. 3. Upload the voice samples to client which are found by MCU as removal content. 4. Client sends only audio packets to MCU which are unmatched with received audio sample.

Advantages : 1. Users can take calls from common area.

2. The mechanism helps to avoid disturbances from other audio source than participant, by -

+ not sending the unwanted voices or audio packets in a conference call. + removing the unwanted voices from conference call.

3. User don't need to operator mute/unmute to avoid disturbances.


1. Isolating speaker's voice from other unwanted voices and discarding other voices based on video image/frame processing

-- In a conference call client keep getting the video frames. -- Client process the video image and find if there is any lip movement that indicate talking status for user in focus / very-close in video frame. -- On not detecting any lip movement, image processing engine set a flag not-talking -- On receiving voice signals, client request the lip movement f...