InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Automatic Speaker Identification in Telephone Conferences

IP.com Disclosure Number: IPCOM000018574D
Original Publication Date: 2003-Jul-24
Included in the Prior Art Database: 2003-Jul-24
Document File: 3 page(s) / 44K

Publishing Venue



This article introduces automatic speaker identification in telephone conferences. The idea is to make use of the fact that a corporate intranet or the Internet is available in parallel to the telephone network. This parallel data network is used for communicating information about participants and speakers in telephone conferences. All participants can see on their portable computer the names and still pictures of all other conference participants. The still picture and/or the name of the currently speaking participant is highlighted.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 37% of the total text.

Page 1 of 3

Automatic Speaker Identification in Telephone Conferences

   Part of everyday work at corporations with different office locations are telephone conversations. However, particularly, in large corporations not everyone participating in a telephone conference is able to recognize all other participants by their voices. A fundamental problem of telephone conferences is, thus, that productivity and effectiveness is suboptimal because not everyone actually knows who is speaking. Also noise and distortion due to poor telephony links may further complicate determining the speaker.

Video conferencing tools transmit both voice and video signals so that the speaker can be identified in most cases. Video conferencing can be run over the telephony network (typically ISDN) or over a data network (e.g. corporate IP Intranet or the Internet). There are several drawbacks of video conferencing over telephony networks, such as the poor video signal (resolution and delay) with long distance calls (e.g. Europe to Japan), the difficulty to associate names to faces in the video display, the high telephone cost, the large investment cost, only few video conferencing rooms are typically available, overhead of reserving room and setting up the video conference, the difficulty to passively participate a meeting due to social reasons. The drawbacks of video conferencing over data networks are that data networks (e.g. IP networks) do not yet provide the quality of service guarantees to transmit audio and video signals, and that there are some investment costs (e.g. camera).

Thus, a mechanism for automatic identification of speakers in telephone conferences is introduced. The key idea is to use a corporate intranet or the Internet for communicating information about the participants and current speakers of a telephone conference.

There are several advantages in connection with the proposed idea: Productivity and effectiveness during telephone conferences are increased. A speaker can be identified by still picture, name and position. There is virtually no installation required. There are no investment costs. There are no additional operational cost.

A typical telephone conference proceeds as follows: The invited meeting participants are given a dial-in number and pass-code which they use to dial-in from arbitrary locations over the telephony network. The dial-in can be from a meeting room (with other conference participants), or an office, or from home, from a mobile phone, or from a hotel room, etc.. In todays business world conference participants typically have a portable computer available. In most cases, such a portable computer is already connected to a corporate intranet or the Internet either by wired or wireless access technology. Furthermore, a standard portable computer today has a microphone built-in.

The idea is to make use of the fact that a corporate intranet or the Internet is available in parallel to the telephony network. This parallel data netw...