AntM2C

Ant-Group Multi-Scenario Multi-Modal CTR dataset

ImagesTabularTextsIntroduced 2024-04-17

We release a large-scale Multi-Scenario Multi-Modal CTR dataset named AntM2C, built from real industrial data from Alipay. This dataset offers an impressive breadth and depth of information, covering CTR data from four diverse business scenarios, including advertisements, consumer coupons, mini-programs, and videos. Unlike existing datasets, AntM2C provides not only ID-based features but also five textual features and one image feature for both users and items, supporting more delicate multi-modal CTR prediction.

This dataset is from 110 million user exposure-click samples in 5 scenarios on the Alipay APP, including 670,000 users and 180,000 items, with 37 features. For more detailed information, please see the homepage: https://www.atecup.cn/dataSetDetailOpen/1