Pitch detection algorithm


Pitch detection algorithm

A pitch detection algorithm (PDA) is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or virtually periodic signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain or the frequency domain.

PDAs are used in various contexts (e.g. phonetics, music information retrieval, speech coding, musical performance systems) and so there may be different demands placed upon the algorithm. There is as yet no single perfect PDA, so a variety of algorithms exist, most falling broadly into the classes given below [D. Gerhard. [http://www.cs.uregina.ca/Research/Techreports/2003-06.pdf Pitch Extraction and Fundamental Frequency: History and Current Techniques] , technical report, Dept. of Computer Science, University of Regina, 2003.] .

Time-domain approaches

In the time domain, a PDA typically estimates the period of the quasiperiodic signal, then inverts that value to give the frequency.

One simple approach would be to measure the distance between zero crossing points of the signal (i.e. the Zero Crossing Rate). However, this does not work well with complex waveforms which are composed of multiple sine waves with differing periods. Nevertheless, there are cases in which zero-crossing can be a useful measure, for example in some speech applications where a single source is assumed. The algorithm's simplicity makes it "cheap" to implement.

More sophisticated approaches compare segments of the signal with other segments offset by a trial period to find a match. AMDF (average magnitude difference function), ASDF (Average Squared Difference Function), or the similar autocorrelation work this way. These algorithms can give quite accurate results for highly periodic signals. However, they have false detection problems (often "octave errors"), can sometimes cope badly with noisy signals (depending on the implementation) and - in their basic implementations - do not deal with polyphonic sounds (which involve multiple musical notes of different pitches).

Current time-domain pitch detector algorithms tend to build upon the basic methods referred to above, with additional refinements to bring the performance more in line with a human assessment of pitch. For example, the YIN algorithm [A. de Cheveigné and H. Kawahara. [http://www.ircam.fr/pcm/cheveign/pss/2002_JASA_YIN.pdf YIN, a fundamental frequency estimator for speech and music.] The Journal of the Acoustical Society of America, 111:1917, 2002. doi|10.1121/1.1458024] and the MPM algorithm [P. McLeod and G. Wyvill. [http://csweb.otago.ac.nz/tartini/papers/A_Smarter_Way_to_Find_Pitch.pdf A smarter way to find pitch.] In Proceedings of the International Computer Music Conference (ICMC’05), 2005.] are both based upon autocorrelation.

Frequency-domain approaches

In the frequency domain, polyphonic detection is possible, usually utilizing the Fast Fourier Transform (FFT) to convert the signal to a frequency spectrum. This requires more processing power as the desired accuracy increases, although the well-known efficiency of the FFT algorithm makes it suitably efficient for many purposes.

Popular frequency domain algorithms include: the harmonic product spectrum [http://cnx.org/content/m11714/latest/ Pitch Detection Algorithms] , online resource from Connexions] ; cepstral analysis and maximum likelihood which attempts to match the frequency domain characteristics to pre-defined frequency maps (useful for detecting pitch of fixed tuning instruments); and the detection of peaks due to harmonic series [Mitre, Adriano; Queiroz, Marcelo; Faria, Régis. [http://www.ime.usp.br/~mqz/Mitre_AESBR2006.pdf Accurate and Efficient Fundamental Frequency Determination from Precise Partial Estimates.] Proceedings of the 4th AES Brazil Conference. 113-118, 2006.] .

To improve on the pitch estimate derived from the discrete Fourier spectrum, techniques such as "spectral reassignment" (phase based) or "Grandke interpolation" (magnitude based) can be used to go beyond the resolution provided by the FFT analysis.

Fundamental frequency of speech

The fundamental frequency of speech can vary from 40 Hz for low-pitched male voices to 600 Hz for children or high-pitched female voices cite book |last=Huang |first=Xuedong |coauthors=Alex Acero, Hsiao-Wuen Hon |title=Spoken Language Processing |origyear=2001 |publisher=Prentice Hall PTR |language=English |isbn=0-13-022616-5 |pages=325 ] .

Autocorrelation methods need at least two pitch periods to detect pitch. To detect a fundamental frequency of 40 Hz this means that at least 50 milliseconds (ms) of the speech signal must be analyzed. However, during 50 ms, speech with higher fundamental frequencies may not necessarily have the same fundamental frequency throughout the window.

References

ee also

* Frequency estimation
* Linear predictive coding


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Pitch — may refer to:In music: * Pitch (music), the property of a sound or musical tone measured by its perceived frequency ** Range (music), the distance from the lowest to the highest pitch a musical instrument can play ** Vocal range, the distance… …   Wikipedia

  • Pitch correction — is the process of correcting the intonation of an audio signal without affecting other aspects of its sound. Pitch correction first detects the pitch of an audio signal (using a live pitch detection algorithm), then calculates the desired change… …   Wikipedia

  • Pitch (music) — In musical notation, the different vertical positions of notes indicate different pitches. Pitch is an auditory perceptual property that allows the ordering of sounds on a frequency related scale.[1] Pitches are compared as higher and lower in… …   Wikipedia

  • Audio timescale-pitch modification — Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch.Pitch scaling or pitch shifting is the reverse: the process of changing the pitch without affecting the speed. There are also more… …   Wikipedia

  • Voice activity detection — (also known as speech activity detection or, more simply, speech detection) is an algorithm used in speech processing wherein the presence or absence of human speech is detected in regions of audio. The main uses of VAD are in speech coding and… …   Wikipedia

  • Onset (audio) — Onset refers to the beginning of a musical note or other sound, in which the amplitude rises from zero to an initial peak. It is related to (but different from) the concept of a transient: all musical notes have an onset, but do not necessarily… …   Wikipedia

  • Frequency estimation — This article is about the technique in signal processing. The term frequency estimation can also refer to probability estimation. Frequency estimation is the process of estimating the complex frequency components of a signal in the presence of… …   Wikipedia

  • Autocorrelation — is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal which has been buried under noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is used… …   Wikipedia

  • Fundamental frequency — Vibration and standing waves in a string, The fundamental and the first 6 overtones The fundamental frequency, often referred to simply as the fundamental and abbreviated f0 or F0, is defined as the lowest frequency of a periodic waveform. In… …   Wikipedia

  • PDA (disambiguation) — A PDA is most commonly a Personal digital assistant, also known as a Personal data assistant, a mobile electronic device. PDA may also refer to: In science, medicine and technology: Patent ductus arteriosus, a heart defect Posterior descending… …   Wikipedia