OmniLab
In order to evaluate the effectiveness of NToP in real-world scenarios, we collect a new dataset OmniLab with a top-view omnidirectional camera, mounted on the ceiling of two different rooms (bedroom, living room) at 2.5 m height. Five actors (3 males, 2 females) perform 15 actions from CMU-MoCap database (brooming, cleaning windows, down and get up, drinking, fall-on-face, in chair and stand up, pull object, push object, rugpull, turn left, turn right, upbend from knees, upbend from waist, up from ground, walk, walk-old-man) in two rooms with varying clothes. The recorded action length is 2.5 s, which results in 60 images for each scene at a frame rate of 24 FPS. The position of the camera is fixed and the resolution of the images is 1200 by 1200 pixels. A total of 4800 frames are collected. All annotations of 17 keypoints conforming to COCO conventions are estimated through a keypoint detector and subsequently refined by four different humans in two loops to ensure high annotation quality. Bottom figure shows a few examples from OmniLab with person bounding boxes and keypoint annotations.