A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.
<span class="description-source">Source: Action Detection from a Robot-Car Perspective </span>