How to do audio echo cancellation

musicvid10 wrote on 3/24/2010, 1:53 PM

I thought I heard something . . .

david_f_knight wrote on 3/26/2010, 8:04 PM

I finished my proof-of-concept test program. I've uploaded it for anyone to play around with:

echo_cancellation_v1.0.zip

The results were pretty much as I anticipated, though more sensitive to any mismatch between the impulse response and the recording being de-echoed.

Bottom line: if the impulse response is recorded properly so that the timing of each echo from the source off its reflecting surface to the microphone can be accurately determined, then the echo can be completely removed. However, due to the sensitivity of this approach to any inaccuracies in the echo timing information, its use is only viable when the locations of the sound source, environment, and microphone do not change at all. Because of this inflexibility, it is not viable in most situations.

Sound travels just over one foot per millisecond in air at sea level, and one full cycle of a 1KHz tone also occurs in one millisecond. So, if the distance relationships between the impulse source, reflective surfaces, and impulse microphone differ by as much as one inch from the distance relationships between the audio source, reflective surfaces, and camcorder microphone then the phase relationships may be significantly off, especially for higher frequencies. If the phase relationships are significantly off, then subtracting an initial sound from its echo will not properly cancel the echo waveform.

The test program allows adding or subtracting echoes, so it is possible to verify its operation by starting with an audio file and adding echo/reverb to it (calculated from an impulse response peaks file), then taking that new file and subtracting echo/reverb from it (calculated from the same impulse response peaks file). When echo/reverb is added this way, it cancels perfectly.

As an aside, I discovered that you can get some pretty weird and possibly interesting effects for spooky or sci-fi or nightmare type scenes by trying to cancel echo/reverb with very small changes to the times in the impulse response peaks file used for adding it.

The test program works only with WAV files containing mono 16-bit PCM samples (any sample rate is okay). Sound Forge can be used to convert audio files in other formats to suitable WAV files. The download includes a suitable example audio file and several example impulse response peaks files.

musicvid10 wrote on 3/26/2010, 10:48 PM

An unusually responsive and imaginative undertaking; Congratulations you on your efforts!

From your readme: "However, it is also extremely sensitive to any inaccuracy of the impulse response peaks files as compared to reality. By reality, I mean the impulse response must have been recorded identically to the way the audio file to be de-echoed/de-reverberated was recorded "
Acknowledged. And, you propose to achieve predictive phase matching and cancellation, just how? (You may have a view of the forest, but I see all the trees. ;?)

"The results were pretty much as I anticipated, though more sensitive to any mismatch between the impulse response and the recording being de-echoed."
When in doubt, refer to response #1 above. Through your own efforts, you have discovered the additive factor Kelly and I both referred to.

Uhh, if you pre-emptively and intentionally create an impulse response that exactly mirrors what you either want to keep or exclude in the output, how is that predictive? The challenge was to not use the dry input to mirror what you wanted to keep in the output. That is "reality."

"As an aside, I discovered that you can get some pretty weird and possibly interesting effects for spooky or sci-fi or nightmare type scenes by trying to cancel echo/reverb with very small changes to the times in the impulse response peaks file used for adding it.'
Been there, done that. Maybe you can turn that into an effect that you can sell for some $. Maybe even more quickly than echo reduction.

But then, you may be on to something here. Keep inquiring.
I'll test your contribution more thoroughly in the coming weeks.

david_f_knight wrote on 3/27/2010, 2:46 PM

"... you propose to achieve predictive phase matching and cancellation, just how? (You may have a view of the forest, but I see all the trees. ;?)"

If you know exactly how much later an echo occurs after a sound, then you can subtract that sound (properly scaled) from the waveform that much later and phase relationships will be matched. If the calculated delay is a little bit off from the actual delay, then there will be phase mismatches, and they will be frequency-dependent.

"Uhh, if you pre-emptively and intentionally create an impulse response that exactly mirrors what you either want to keep or exclude in the output, how is that predictive?"

The impulse response is descriptive rather than predictive. It can be essentially anything (except that the test program has a delay line limit of 500,000 samples, so any impulse responses beyond that limit are presently ignored). Its purpose is to describe the acoustical interactions among all relevant factors. But given that description, it is possible to predict (i.e., simulate) what will happen in that acoustical environment given a dry source, or to infer what happened given the wet result to recover the dry source.

"The challenge was to not use the dry input to mirror what you wanted to keep in the output."

I'm not sure I understand your point. I don't use the dry input to mirror anything in the de-echoed/de-reverberated output. The program cancels echoes with no other inputs than the wet recording and the impulse response peaks file that describes the acoustical environment responsible for creating the wet recording.

Incidentally, I have just thought of a way to make the algorithm adaptive to accommodate slight inaccuracies in the impulse response peaks file, such as would be caused by movement of the microphone, sound source, or any reflective surfaces during recording.