Acoustic Echo Cancellation (AEC): the removal of reflected sound. This is something that’s generally used in conferencing systems (although there are other applications) to prevent your microphone sending the other participant’s audio back at them. Here’s a quick overview of how it works (without the details like adaptive filtering, NLP, ELR, and so forth)
It’s important to make a distinction, especially when troubleshooting. Having an AEC system will make the experience better for the other people. If you can hear yourself and it’s due to bad/faulty/missing AEC, then it’s the other side’s issue. There are other situations however when you can hear yourself where it isn’t AEC, like an incorrectly configured mixer or Digital Signal Processor (DSP).
Another distinction that is important to make. DSPs and AEC aren’t a magic bullet to help acoustics issues; they don’t prevent reverberation or other acoustic anomalies in the room (which can only really be solved with physical treatment, like absorption, pickup patterns, placement, etc). In fact, the better the acoustics of the room, the better that AEC will work; I have seen AEC basically fail entirely in very reverberant spaces.
The Problem
Firstly the problem: when the other person speaks, it comes out our speakers. Our microphones pick up this sound from the speakers. Without any measures in place, there is nothing stopping this sound being sent back to the other person, resulting in the other end hearing themselves.
The Fix
One way we can attack this is to block the audio picked up from our microphone while our speaker is producing audio. This is basically changing the system to a half-duplex system; communication can only happen one direction at a time. This works great in theory, in practice not so much… It leads to unnatural and stilted conversations; in normal conversation there is usually some overlap, chances for people to interrupt, etc. This can’t happen when using this method.
So what does AEC do? It takes some reference audio that we don’t want to pick up in the microphone (often the person at the other end coming out our speakers), and removes that from the total audio signal picked up by our microphone. This leaves us with everything else that our microphone is picking up (e.g. our speech) which stops the other side’s audio being sent back to them (the echo). As a bonus it does this while preserving full-duplex communication.
Note that it doesn’t remove any background noise; it can only distinguish and remove the audio that it knows shouldn’t exist (the reference).
Actually setting up AEC in a DSP can be a bit to wrap your head around, and can be easy to stuff up. Let me know if there is further explanation or help I can give.