Active Speaker Localization (ASL) is the process of spatially localizing an active speaker (talker) in an environment using either audio, vision or both.