How to do audio echo cancellation

musicvid10 wrote on 3/20/2010, 8:32 PM

There is no effective way to cancel echo.
You can reduce the effects of echo a bit using the technique posted here (use sparingly):
http://www.sonycreativesoftware.com/forums/ShowMessage.asp?ForumID=3&MessageID=545526

justmetlb777 wrote on 3/21/2010, 11:13 AM

thank you - was hoping for push-button simplicity. Seems as though such an automated process would be needed - I will try it though. Does Pro versions do this?

Chienworks wrote on 3/21/2010, 11:46 AM

It's not the software that is the issue. It's just purely an almost entirely impossible task. There's no practical way to do it with any software or hardware that's ever existed, and maybe not with any that ever will.

justmetlb777 wrote on 3/21/2010, 12:18 PM

ummmm...... don't they do this is Star Trek and CSI? Anyway, if it can be done manually with some varied results at least (using your procedure), it does seem that process can be automated into a plugin. But I do understand what you are saying of the complexity to removing all echoing. (I would be happy with some echo minimization)

musicvid10 wrote on 3/21/2010, 1:16 PM

"ummmm...... don't they do this is Star Trek and CSI?"
Oh, so you're looking for a hardware-based method like broadcasters use. Got $5,000 to spend?
http://www.izotope.com/products/audio/anrb/

"it does seem that process can be automated into a plugin."
If you're interested writing your own plugins or scripts using Sony's SDK, you are welcome to incorporate my techniques, with applicable credits.

;?)

Chienworks wrote on 3/21/2010, 1:32 PM

[l]"ummmm...... don't they do this is Star Trek and CSI?"[/]

They also fly at warp speeds and transport people through space. Do you think that's real too? ;)

Eigentor wrote on 3/22/2010, 6:08 AM

What,.......it's not? Back in high school I used to ..............

david_f_knight wrote on 3/22/2010, 2:57 PM

What is sometimes done in movies to clean up the audio is to dub it. That is, throw out the audio recorded during filming, and replace it with audio recorded in a studio after wards.

If you want to really be able to eliminate echo (or reverberation, as I imagine you really mean), you probably really need impulse response recordings made in the same environments at the same positions as you recorded your videos' audio from. With the impulse response known, the echo or reverberation can be predicted and hence subtracted (i.e., canceled) with a digital filter. Should not be a difficult task and should be quite effective, but nobody records impulse responses so it's not an option.

By the way, there's nothing magic about hardware. Lots of people seem to think that if some process is implemented in hardware, then it is somehow superior to what can be done with software. All a hardware solution offers is the potential for faster operation. Any digital process implemented in hardware can also be identically implemented in software.

Chienworks wrote on 3/22/2010, 3:57 PM

I'll also point out that for an impulse reponse recording to be useful it has to be exactly the same as the room environment recorded with the vocals. This is simply impossible. Since the vocal itself also affects the environment and the recording response, it's more like impossible squared.

Also, the concept has another flaw. It's not really the sound of the environment that one needs to remove, it's the vocal bouncing around the room that is the problem. The only recording possible of this is the original recording. Since the wanted vocal is mixed in with the reverberation there's no way to separate them in order to get just the reverberation to subtract from the original. Of course, if it was possible to separate them then we'd just keep the wanted vocal to begin with and the whole process would be moot.

MSmart wrote on 3/22/2010, 8:47 PM

I'd like to know what type of room the OP is talking about. How far was the cam/mic from the subject(s)..... closer is better.

david_f_knight wrote on 3/22/2010, 9:18 PM

No, it's not impossible. For best results you have to record the impulse response at the time of the video recording, in the same place as the camcorder's microphone (preferably with the same microphone). That has to be done for every position the camcorder is used from. If the camcorder is moved, or the acoustical environment changes during the clip's recording, you might also have a problem. Though, realistically, the acoustical environment doesn't usually change much when you move the camcorder a little. As I wrote, no one (besides audio engineers, sonar operators, geophysical scientists, or certain other specialists) records impulse responses, so the option isn't generally viable for everyday videorecordings. An impulse response recording might be as short as just a few tens of milliseconds long (however long it takes for the echo/reverberation to die out from a single short sound). Also, the sounds within an environment do not alter the acoustics of that environment. So there isn't anything impossible or impossible squared about the impulse response concept.

The whole point of the impulse response concept is to characterize the acoustical environment, not the sounds within the environment. You don't subtract the impulse response from the original recording, you use the impulse response to define a digital filter that models the acoustical characteristics of the environment, and then feed the video's audio recording into that filter and subtract the modeled response from it. That results in subtracting the modeled reverberation/echo from the source, canceling the recorded reverberation/echo. To the extent the digital filter accurately models the acoustical environment in which the audio recording was made, it will cancel the reverberation/echo. There's nothing moot about this process. A potential problem in a non-controlled setting, though, is that there will also be background noise recorded along with the impulse response and that will cause the digital filter to model that noise in addition to the reverberation/echo. That may or may not be a problem, depending on the nature of the noise.

musicvid10 wrote on 3/22/2010, 10:21 PM

David,
Theories similar (or identical) to yours have been promoted here off and on for many years; in that time, there has never been one practical demonstration of using "reverse" impulse modeling to reduce echo or any ambient, that I am aware of.

Certainly, from any knowledge of acoustic physics, this is an impractical approach. The reasons Kelly gave are only the tip of the iceberg. You simply cannot apply negative feedback unless you start with an exact inverse phase copy of the audio you wish to cancel. Anything else is additive. An acoustic impulse is a model or template for adding an effect, and cannot under any circumstances predict nor reproduce an exact waveform representation of a given audio sample, to be used for negation. The "dog chasing its tail" argument Kelly presented to you previously is an absolutely correct response, IMO.

Acoustic Mirror and more complex impulse modeling applications are useful and in some cases powerful methods of adding a layer to dry audio, when it is necessary to model an acoustic footprint. However, none of these applications have any intelligence to achieve phase matching, which would be a necessary prerequisite for anything you have said to make any sense. The notion of reverse-engineering impulse modeling to "remove" an acoustic layer is nothing more than speculation, to the best of my knowledge, education, and experience, which goes back more than thirty-five years.

That being said, we will all welcome your practical demonstration with examples and detailed methodology used. As a two-month poster on these forums, we welcome your input, but don't think for a second that this ground has not been covered before, more than once. Using the handy "Search" feature on the forums will lead you to some more enlightened discussion on the very theory you seem to be promoting.

But good luck with your practical tests, and we eagerly await your results and reproducible examples!

Chienworks wrote on 3/23/2010, 4:30 AM

I'm not sure where the idea of "negative room impulse" came from. I saw some speculation years ago as part of a discussion over Sonic Foundry's impulse products. It seems some folks got the wild idea that if impulses could be added to simulate various reverberant spaces then they could also be subtracted from reverberation as long as you had the exact impulse recording. It was purely wishful thinking with no basis whatsoever.

It's sorta like saying, "well, if i can use this can of orange spray paint to make this DaVinci master look warmer, then later on if i decide i don't like it i can spray on a can of purple paint to restore it back to the original." The subtleties and details are submerged and lost. It's not possible to bring them back.

Eigentor wrote on 3/23/2010, 7:05 AM

If you really want to eliminate noise, echo, reverberation, multi-path.... record audio in a vacuum.

musicvid10 wrote on 3/23/2010, 9:23 AM

If you really want to eliminate noise, echo, reverberation, multi-path.... record audio in a vacuum.

Acknowledging the fact that was a joke, I'm sure you know audio doesn't exist in a vacuum -- no medium, no sound waves.

Eigentor wrote on 3/23/2010, 10:49 AM

In space, no one can hear you scream.

david_f_knight wrote on 3/23/2010, 3:46 PM

musicvid: That being said, we will all welcome your practical demonstration with examples and detailed methodology used.

I might take this challenge up, just to see what can be done, because I think it is an interesting topic. Apparently, others do, as well. Obviously, if I do, it will take some time to complete because it is not trivial.

musicvid: As a two-month poster on these forums, we welcome your input, but don't think for a second that this ground has not been covered before, more than once. Using the handy "Search" feature on the forums will lead you to some more enlightened discussion on the very theory you seem to be promoting.

I didn't think for one second that this ground was or was not covered here before. That thought never crossed my mind because I guess it didn't seem relevant to me. Lots of (most?) topics have been discussed repeatedly in this forum.

My initial post actually discussed three different topics, only the first of which was intended as potentially practical advice for the original poster. The second of my topics, re: impulse response, I admitted was not a (generally) viable option, which I again reiterated in my second post. Perhaps I am wrong and it can never work under any circumstances. But in any case, it's not fair to characterize something I've said twice was not generally viable as something I am promoting. It's something I am discussing.

After reading your post, I have reviewed some of the posts previously made here re: impulse response. Of those I looked at, I haven't seen any that discussed what I've described, though I have seen some that discussed what you apparently thought I described, i.e., convolving a reversed impulse response with the original signal.

What I'm describing is in some sense analogous to how Dolby noise reduction works. And we all know that Dolby noise reduction does in fact work. Only instead of having an active encoding that Dolby uses, that task is performed passively by the acoustic environment, and (principally) in the time domain rather than in the frequency domain. The impulse response that must be recorded allows determining what that particular "encoding" was. Once the "encoding" is known, an inverse "encoding" can be designed to (hopefully) restore the signal minus the echos/reverberation added by the acoustical environment. The decoder does not need to use any feedback, just a delay line as long as the impulse response with as many taps as there are acoustical reflections. But at this point, it's only a theory in my head; I'm not saying it does work, but that it might work.

As you suggest, ultimately it doesn't matter what anyone's opinion is, only what results prove to be true.

This discussion about impulse responses has kind of hijacked the original poster's question. MSmart did make a useful suggestion for future recordings (but not applicable to existing recordings). If you record audio with a microphone much closer to the subject than the camcorder, the relative strength of the desired signal to the noise (i.e., echo/reverberation) is larger allowing for lower gain during recording, reducing the recorded volume of the noise. Ultimately, recording a clean signal will always be superior to any post-production attempt at cleanup/correction.

musicvid10 wrote on 3/23/2010, 4:39 PM

What I'm describing is in some sense analogous to how Dolby noise reduction works.

What you are suggesting is in no sense analogous to the way DNR works. Not even a little bit.
The rest of what you are promoting has no basis in acoustic physics that I know of. Your one correct statement, that the algorithm would work as a function of time, nails down the reason it doesn't work. Once you gain a better understanding of complex phase relationships, you will see why this is so. If not by feedback, how do you propose to achieve phase cancellation? If there was some magical predictive image extraction that could be accomplished by taking room impulses, I promise you it would have been done by now. And it really has nothing to do with "encoding" and "decoding."

Suggest you start with my basic technique, add your own refinements, and let us know what you come up with. Variations of that basic approach, which stem from an analog recording trick that is about forty years old, have been and are still in use by broadcasters, filmmakers, music producers, recording studios, ENG and surveillance agencies around the world since that time.

And good luck to you. You have a great imagination.

Chienworks wrote on 3/23/2010, 6:20 PM

Noise Reduction, whether it be Dolby, DBX, or any other method, is used solely to reduce the noise introduced by the recording and transmission mechanisms. It has nothing to do with reducing source noise. In that respect it's not even in the same realm as reverberation/echo reduction.

Dolby works well because the noise it's reducing is rather constant, is not affected by the signal being recorded, and the signal is initially separate from the noise. None of these apply to reverberation/echo since the unwanted material is continually varying, varies with the signal as well as varying apart from the signal, and is already mixed with the signal before being picked up by the transducer.

Impulses aren't encoding. They're additive to the original signal. They're a one-way modification that loses the original sufficiently that it can't be recovered again, not even by a "negative" impulse, should such a thing even exist.

The recipe that musicvid posted can work wonders, but it doesn't really remove the reverb. It merely modifies the sound in such a way that the things we want to hear become more pronounced while that which we don't want to here is somewhat diminished. This is done at the cost of modifying the sound, perhaps to the point where it's not as good as the original, but at least in some cases like speech it can make what we want to hear more intelligible.

The only conceivable way to remove reverberation is to have a system intelligent enough to listen to the recording, be able to generalize what the original signal must have sounded like without the reverb, map out the signal that is intended to be conveyed, and recreate that signal from scratch by imitating the original performance. I have a brother who's very good at that ... give him a bad recording from an arena or an old scratchy record, he can listen to it a few times, then grab his synthesizer and play the song, sing the words, recite the speech, etc. to make a new recording. I've used his abilities to save butts a few times.

Hardware/software? We could be generations away from anything capable of performing this task electronically. I think it will come eventually, but it won't be reverb removal; it will be regeneration of the audio from scratch. And even then my brother will probably still be able to do it cheaper and faster.

Or, put the mic up next to the performer. It's the best, simplest, fastest, and cheapest solution.

musicvid10 wrote on 3/23/2010, 6:37 PM

This is done at the cost of modifying the sound, perhaps to the point where it's not as good as the original, but at least in some cases like speech it can make what we want to hear more intelligible.

Thanks for saying what I have for years, only you said it better. Too often in areas like aural perception, the medicine is worse for the patient than the disease.

But the very notion of predictive acoustic modeling? Certainly not in my lifetime. I mean, we're barely scratching the surface with emergent adaptive technology.

Maybe when we fly at warp speeds, transport ourselves through space and time, and leap tall buildings in a single bound . . .

david_f_knight wrote on 3/23/2010, 8:31 PM

Thank you for all your constructive input. I can't respond to all the issues raised as quickly as you can raise more. As I wrote previously, ultimately, opinions don't matter, only demonstrable (and relevant) facts. The burden is on me to prove my claims, if possible. I don't believe arguing can either prove or disprove them.

I don't know whether I will have enough time to pursue this because it is not a trivial undertaking, or whether anyone will still care about it when I conclude any effort I do make.

musicvid10 wrote on 3/23/2010, 9:01 PM

Well, you could do one very simple demonstration, the outcome of which would determine the feasibility of moving forward or not. The experimental model was inferred quite directly in Kelly's last post.

1) Take a good quality vocal recording, without objectionable ambiance, background noise, echo, or reverberation.
2) Apply an Acoustic Mirror impulse to the whole file, and verify that you can hear a difference in the output.
3) You now have the original recording, the impulse, and the result at your disposal in a 100% controlled environment. Now, remove the effect from the output, using any means of your choosing, except overlaying or imaging the reference (that would be cheating).
4) If you are unable to do this (even partially) in a controlled environment, you would probably be foolish to proceed in a real-world setting, where you have only the impulse response and the acoustically wet recording, but no reference with which to compare your "solution".
5) If you are able to reverse the effect, even partially, using the impulse response and output file, you probably will win the Nobel Peace Prize for Physics, and will have plenty of dough and corporate sponsorship for you to continue your work.

Shouldn't take more than a day or two of practical experiments to determine whether your mental course is worth pursuing any further. And, we eagerly await your results (really).
;?)

Eigentor wrote on 3/24/2010, 6:34 AM

Does anyone hear me screaming.

Chienworks wrote on 3/24/2010, 7:57 AM

I wasn't in the forest when you fell down.