Talk:MATLAB Programming/Phase Vocoder and Encoder

If you have questions, comments, deletion requests, cheques, or offers of employment, please contact me by email, Special:Emailuser/Jsalsman. Thank you. Jsalsman 03:57, 10 October 2007 (UTC)

Integrated into Wikibooks
Note this is now part of MATLAB Programming. Jsalsman 04:52, 10 October 2007 (UTC)

History from en.wp
* (cur) (last) 04:07, 10 October 2007 1of3 (Talk | contribs) (1,974 bytes) (→See also - wikibooks) (undo) * (cur) (last) 14:58, 9 October 2007 141.3.74.36 (Talk) (1,924 bytes) (→See also - deleted lemma) (undo) * (cur) (last) 15:44, 2 June 2007 75.35.115.68 (Talk) (2,038 bytes) (explain subpage see-also for example code) (undo) * (cur) (last) 17:11, 27 May 2007 Nrcprm2026 (Talk | contribs) (2,016 bytes) (→See also - get example code out of userspace) (undo) * (cur) (last) 02:11, 25 April 2007 Nrcprm2026 (Talk | contribs) (2,018 bytes) (→External links - restub, +cat) (undo) * (cur) (last) 08:29, 8 April 2007 Nrcprm2026 (Talk | contribs) (→See also) (undo) * (cur) (last) 23:28, 14 February 2007 Yewlongbow (Talk | contribs) (undo) * (cur) (last) 05:42, 25 November 2006 Alaibot (Talk | contribs) m (Robot: Automated text replacement (- +{uncategorizedstub|November 2006}})) (undo) * (cur) (last) 13:15, 8 November 2006 Bluebot (Talk | contribs) (tagging, added uncategorised tag) (undo) * (cur) (last) 22:06, 4 January 2006 Retodon8 (Talk | contribs) m (RVV by 193.145.56.194) (undo) * (cur) (last) 16:17, 20 December 2005 193.145.56.194 (Talk) (→See also) (undo) * (cur) (last) 14:16, 19 October 2005 Omegatron (Talk | contribs) (→External links) (undo) * (cur) (last) 14:15, 19 October 2005 Omegatron (Talk | contribs) (→External links) (undo) * (cur) (last) 14:12, 19 October 2005 Omegatron (Talk | contribs) (→External links) (undo) * (cur) (last) 14:09, 19 October 2005 Omegatron (Talk | contribs) (undo) * (cur) (last) 14:03, 19 October 2005 Omegatron (Talk | contribs) (undo) * (cur) (last) 13:47, 19 October 2005 Omegatron (Talk | contribs) (cut and paste move) (undo) * (cur) (last) 03:48, 19 October 2005 Mnorris (Talk | contribs) (undo) * (cur) (last) 03:47, 19 October 2005 Mnorris (Talk | contribs) (undo) * (cur) (last) 14:08, 26 April 2005 Jshadias (Talk | contribs) (undo) * (cur) (last) 16:55, 26 January 2005 Omegatron (Talk | contribs)

This code has nothing to do with the phase vocoder
Hi there,

the basic principle of the phase vocoder is to determine phase relations between frames from the observed phase difference in adjacent frames. For this to work the encoder needs to store the observed phase difference. AS the encoder here is calculating only the amplitude spectrum and the phases/frequencies that are used are the nominal frequencies of the spectral bins and not the phases of the observed signal this code does not produce anything that remotely resembles even the very first algorithm proposed by Flanagan in 1966 see Phase vocoder even less the work by Laroche or the transient processing proposed by Roebel. This code should be removed because nearly all statements that are made are fundamentally wrong.

Raebolex (discuss • contribs) 13:01, 2 July 2011 (UTC)


 * Are there any reasons that maintaining phase coherence during resynthesis after time- and/or frequency-scale modification is not more central to the basic principle of the phase vocoder? That is precisely what the code here does.  Any means to maintain phase coherence during resynthesis after modification can not preserve most of the original phase information, and must discard all of it for heavily polyphonic sounds.  If you have counterexamples, please share them.  When the STFT bins and frames are small enough, what benefit is there in using any of the initial signal's phase information?
 * Instead of deleting the code in its entirety, why not provide an alternative improved implementation, and improve the specific statements you believe to be flawed? Jsalsman (discuss • contribs) 14:16, 2 July 2011 (UTC)

Sorry, I'm afraid you misunderstand the terms "maintain phase coherence". maintain means" preserve as it was". In your code you throw away the phase during the analysis. Therefore, during synthesis you cannot maintain them! I already said so: please read the papers you mention. Notably the paper from Laroche. You will see all is about the phase relations as they are in the original sound. Constructing phases from amplitudes is a completley different story and has nothing to do with the phase vocoder. Here you construct phases from nominal bin frequencies. But in the real sounds phases are only weakly related to bin frequencies. If you want to keep the code, then either you have to rewrite it completely to make it reflect the phase vocoder, or you rename the section to not make people believe this has anything to do with the phase vocoder.

The first thing I would propose is that you remove all the comments about what your code does in the introduction. I can assure you that you are not having a single bit in your code that reflects the ideas of Laroche and you are even further away from the other paper. I am really sorry but if you have read these papers then you have misunderstood everything. Raebolex (discuss • contribs) 18:28, 2 July 2011 (UTC)


 * I'm willing to change as much commentary and code comments as we need to reach an agreement, but I'm reluctant to modify the actual code because it's been used successfully in audio editors for pitch modification and time scaling, and (along with a pitch tracker) a software autotune process. "Coherence" usually denotes self-consistency. In the resynthesis process, maintaining phase coherence is a very local task working horizontally across frames after selecting peaks vertically across bins, at least since Puckette's work.  If we accept that any phase vocoder will have to discard most if not all of the original signal's phase information when pitch- or time-scaling a signal as simple as a single human voice, then does it make any sense if "coherence" refers to preservation of the original signal's phase?  So, is it still appropriate to refer to the resynthesis portion as a phase vocoder?  Is there any reason to believe that preservation of any of the original signal's phase is an essential or necessary aspect of phase vocoders in general?  The fact that it must be discarded in many if not most cases leads me to the opposite conclusion.
 * Alternatively, if the code here was modified to record and start with the original signal's phases, would that satisfy your concerns?
 * I've changed the commentary about the 1999 and 2003 sources; please let me know what you think. Jsalsman (discuss • contribs) 20:52, 2 July 2011 (UTC)
 * ...Another possibility might be to call this a "Phase-Locked Vocoder" after Puckette (1995) who pioneered the basic idea. Would that be preferable? Jsalsman (discuss • contribs) 23:03, 2 July 2011 (UTC)

Your code indeed ensures phase coherence in some sense. Not in the phase vocoder sense though. Look for example into the paper from Puckette where he generates the synthesis spectrum Y(u_i, k). The phases are clearly derived from the input spectrum. Yours aren't. The most central idea of the phase vocoder is the equation w(k)=... on page 1 high in column 2 of the puckette paper. This equation was introduced by Flanagan when he established the term phase vocoder in 1966. See here for a very brief summary of what the basic phase vocoder principle is supposed to be. So as long as you don't use that equation you cannot call your code phase vocoder!

So what are you doing and why does it somehow work? As far as I understand your code you use the FFT maximum bins as estimate of the sinusoidal frequency and synthesize the phase according to this frequency assuming the sinusoid is stationary. For that you use an idea introduced by Miller Puckette in his 1995 paper. So this is a sort of a sinusoidal model that generates sinusoids directly in the spectral domain. It is very common for sinusoidal models to not respect phase relations in the original signal and to consider only phase coherence on the synthesis side. BTW, after having studied your code I come to the conclusion that it resembles the ideas presented in US patent 5401897!! But I think your implementation is making so many errors and approximations that nobody will consider this a copyright issue.

With respect to correctness. An example for a problem is the alternating phase principle that Puckette introduced. This is correct only if window size == fft size (variable N in the paper). In your code this relation is not ensured. There are other inconsistencies but I won't go any further here. If you would fix them all you would indeed have something useful.

>:Alternatively, if the code here was modified to record and start with the original signal's phases, would that satisfy your concerns?

No, this doesn't make much of a difference. See my comment on what is essential for a phase vocoder above. I suggest calling this code an (approximate) spectral domain analysis/resynthesis with an additive sinusoidal model.

Raebolex (discuss • contribs) 10:43, 3 July 2011 (UTC)


 * On one hand you say that deriving phases from the input spectrum is part of the most central idea of a phase vocoder, but on the other hand you say if I changed the code to use those input phases it wouldn't make much of a difference. I've had detailed discussions about this algorithm and its properties with engineers from CCRMA, where you cite that summary.  I propose that we try to get a third opinion about this from a DSP email list or similar. Jsalsman (discuss • contribs) 19:50, 3 July 2011 (UTC)

You need to derive the phases over time from the input signal not only the start phase. If you want to you code to be an exemple of the phase vocoder  then you have to establish the bin frequency from the phase differences using the equation provided by Flanagan. This equation has been used in all the papers about the phase vocoder you have cited above and is the central idea of the phase vocoder. Now you have said first you don't want to change your code (because it worked here and there) and then that you may change the starting phase. But changing the starting phase only, does not introduce the principle idea of the phase vocoder into the implementation and is therefore not solving the issue. So please change the title and the first phrase of the wiki page.

Note that the fact that you discussed things with ccrma engineers does not qualify the code to be a phase vocoder example. Please provide a convincing argumentation why a phase vocoder code example should not use the fundamental phase vocoder equation. If you cannot provide it yourself then you may ask a third party to provide it. Once again, I am not claiming that your code does not work, but, I am claiming that it cannot be called phase vocoder.

Please don't remove the dispute indication because the dispute is not resolved! Raebolex (discuss • contribs) 07:14, 7 July 2011 (UTC)


 * Okay, how about renaming this "Phase-locked vocoder"? Jsalsman (discuss • contribs) 17:34, 8 July 2011 (UTC)

If you look into the Phase-Locked vocoder from Puckette then you have only a part of it. I'd think you can say "Phase-Locked vocoder" if in the introduction you explain that this implementation is not exactly following the paper from Puckette but incorporates some of the central ideas. For further information reader should then look into the paper. Raebolex (discuss • contribs) 15:32, 9 July 2011 (UTC)


 * This algorithm is far more efficient than Puckette's or Dolson et al's. Perhaps it might be useful to provide a comparison table of execution times for resynthesis. Are there any reliable sources supporting your assertion that a phase vocoder must satisfy Flanagan's original equation? Jsalsman (discuss • contribs) 17:31, 9 July 2011 (UTC)
 * How long do you need to find such reliable sources, if they exist, before the dispute can be considered resolved? Jsalsman (discuss • contribs) 03:27, 19 July 2011 (UTC)

Is this a joke? Please read my posts above. All papers that you cite use this equation. The term phase vocoder has been introduced based on this equation. I've given you a link to the page of Julius smith from CCRMA, one of the major authorities for audio signal processing. If you understand what is written in all the references I have given then you have reliable sources. Your question seems to indicate that you don't understand what is written in the papers! Then please read Dolsons tutorial article in the computer music journal. It is written much less mathematically.

Please note that increased efficiency is no valid argument. And compared to the algorithm proposed by Dolson and Laroche I have strong doubts that your code is more efficient. You have to do a sort of all bins. This is most likely less efficient then what Dolson and Laroche have proposed. Anyway, by removing parts from an algorithm you can always make it more efficient. For example you can remove perceptual masking from MP3 compression, it becomes more efficient, but it wont work as good, and it should not be called MP3.

I won't discuss here anymore. If you don't understand the arguments I have given then I think the discussion is useless. I therefore I will then add an official deletion request to make other contributors aware of this wrong and misleading content. Raebolex (discuss • contribs) 17:20, 19 July 2011 (UTC)


 * Where has Julius Smith claimed that the algorithm described here doesn't conform to Flanagan's phase vocoder equation? If he has said or implied such a thing, he is mistaken. For example, consider low frequencies: any phase-locked vocoder satisfies Flanagan's equation at low enough frequencies even if it doesn't satisfy it at higher frequencies. Also, can you please explain what you mean about partial order alternatives to a full sort? I can't think of any partial orders which would be more useful than a full sort in the average case. Jsalsman (discuss • contribs) 19:23, 23 July 2011 (UTC)

Ok, here we go. More discussion. The basic equation that Flanagan introduced is equation 10 in the ref given by Julius Smith. You don't have this equation. Smith writes that the phase vocoder is based on an analysis of instantaneous phases and a phase difference to frequency at different points of the text:

The digital computer made it possible for the phase vocoder to easily support phase modulation of the synthesis oscillators as well as implementing their amplitude envelopes. Thus, in addition to computing the instantaneous amplitude at the output of each (complex) band-pass filter, the instantaneous phase was also computed. (Phase could be converted to frequency by taking a time derivative.) ... The phase vocoder also relaxed the requirement of pitch-following (needed in the vocoder), because the phase modulation computed by the analysis stage automatically fine-tuned each sinusoidal component within its filter-bank channel.

Since Flanagans paper all papers about the phase vocoder use Flanagans eq. 10 in one form or another, notably all papers you have mentioned in your example. You don't use it. That's the problem. With respect to the equation that your code satisfies: You probably mean the eq 5 in Flanagan 1966. This equation is not specific for the phase vocoder, it's simply a sinusoidal model (See X. Serra paper on SMS). This is exactly what you do in your implementation: a sinusoidal model. Now the phase vocoder is a means to implement a sinusoidal model but not every implementation of a sinusoidal model should be called phase vocoder.

With respect to the sort. A standard phase vocoder algorithm does not require a sort it processes every bin independently of all the others. Dolson/Laroche don't even need to calculate phase for all bins, only for a small subset. That's why I don't think your implementation is efficient. Raebolex (discuss • contribs) 17:16, 27 July 2011 (UTC)


 * I admit I haven't cited Flanagan's paper or equation, and I agree it would be instructive to do so. If you have measurements of the efficiency of sorting bins by amplitude when deciding which order in which to propagate phases, please share them. If you wanted to only start propagating from the top-N amplitude bins, I think you would still need to perform a full sort, but it might be possible to optimize the inner loop of that sort. Jsalsman (discuss • contribs) 19:03, 21 August 2011 (UTC)

If you want to know how to do a very efficient and rather high quality phase vocoder just read: Laroche and Dolson, "Improved Phase Vocoder Time-Scale Modification of Audio", IEEE Transactions on Speech and Audio Processing, Vol 7, no 3, pages 323--332, 1999. You find a download ink of this paper in the reference section of the wikipedia entry for the phase vocoder. Raebolex (discuss • contribs) 20:10, 22 August 2011 (UTC)
 * I was corresponding with Dolson in 1999. The algorithm improvements described on page 7 of their paper is precisely what this MATLAB code does: "the new phase updating technique begins with a coarse peak-picking stage where vocoder channels are searched for local maxima." Jsalsman (discuss • contribs) 21:34, 26 September 2011 (UTC)