UP-COUNT
The newly introduced UP-COUNT dataset includes drone footage captured with cameras from the DJI Mini 2 family UAV. It encompasses diverse environments, including streets, plazas, public transport stops, parks and other green recreation places. We recorded 202 unique videos and then extracted frames with a step of one second, resulting in 10,000 images with a resolution of 3840 × 2160 pixels. The recordings were taken at different altitudes and speeds of flight, and with various densities of people. Acquisition conditions vary in daytime and lighting, creating challenging shadows. Extra altitude information is provided for each image. Next, the labels of people’s heads were hand-prepared, resulting in 352,487 instances. During the labelling process, each image was marked and checked by two different people, and the continuity of labels within each sequence was reviewed. The lowest- (26.0 meters) and the highest-altitude (101.0 meters) recorded among the sequences, with an average of 60.3 meters. The bottom image presents the most crowded image (1,039 people instances), while the average object count is 35.25. Increased variability in crowd counts and different backgrounds caused by the lack of a stationary camera position better reflects real-world scenarios. The UP-COUNT dataset is divided into three subsets for training, validation and testing purposes, containing 141, 30 and 31 sequences, respectively. The described sequences’ splits are prepared using altitude-based stratified sampling, providing a comparable altitude distribution between the dataset’s splits.