Cross-View Cross-Scene Multi-View Crowd Counting Dataset
CVCS is a synthetic multi-view people dataset, containing 31 scenes, where 23 are for training and the rest 8 for testing. The scene size varies from about 10m∗20m to 90m∗80m. Each scene contains 100 multi-view frames. The ground plane map resolution is 900×800, where each grid stands for 0.1 meters in the real world. In training, 5 views are randomly selected 5 times in each iteration per scene frame, and the same view number is randomly selected 21 times in evaluation.