Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Multiple Note Music Recognition

IP.com Disclosure Number: IPCOM000100476D
Original Publication Date: 1990-Apr-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 2 page(s) / 98K

Publishing Venue

IBM

Related People

Grice, DG: AUTHOR [+4]

Abstract

In the analysis of speech signals, the determination of the pitch of the voice can be an important piece of information. There are several methods traditionally used to solve this problem, but the all assume that the signal being analyzed came from a single voice and, therefore, has a single pitch. What is presented here is a method for determining the pitches of multiple signals added together. The description is for a system used to analyze musical notes but the same technique can be applied to speech signals involving multiple speakers. An application of this algorithm is the automatic transcription of music being played.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Multiple Note Music Recognition

       In the analysis of speech signals, the determination of
the pitch of the voice can be an important piece of information.
There are several methods traditionally used to solve this problem,
but the all assume that the signal being analyzed came from a single
voice and, therefore, has a single pitch. What is presented here is a
method for determining the pitches of multiple signals added
together.  The description is for a system used to analyze musical
notes but the same technique can be applied to speech signals
involving multiple speakers.  An application of this algorithm is the
automatic transcription of music being played.

      The first step in the process is the calculation of the power
spectrum of the signal and the extraction of the peaks of this
distribution.  It is assumed that these peaks correspond to either
pitch frequencies or harmonics of a pitch frequency.  Harmonics can
be of any amplitude but an amplitude threshold is set for those peaks
that can constitute a unique note.

      Given a list of frequency peaks, the next step is to decide
what set of notes and harmonics are most likely to have produced that
set.  (While not part of the implementation described here, higher
level information, such as the types of instruments producing the
notes, the notes that have been sounding previously, etc., could be
used to enhance the estimation of the most likely candidates
currently being played.)  A sifting algorithm is used to determine a
most likely note being played.  A large peak is chosen and assumed to
be the Nth harmonic of some note F0, where N and F0 are unknown.
Given the range of F0's to be encountered (limited by the instruments
being played), a range of N's can be calculated.  For each of the N's
in this range, a quality/confidence measure is calculated as follows.
 Given the peak at NxF0, for a given N F0 is known within some
tolerance.  (The power spectrum will have peaks that are precise only
to FS/M where FS is the sampling frequency of the digital system and
M is the number of signal points used to calculate the power
spectrum.)  Given this tolerance, it is possible to predict where the
other harmonics of the F0 should be.  Since not all of the harmonics
will be present, the quality measure consists of determining the N
that produces the highest percentage of harmonics present in the
actual signal, and given equal percentages for 2 N's choosing the N
that covers th...