CCPM

Chinese Classical Poetry Matching

TextsUnknownIntroduced 2021-06-03

Introduction

CCPM is a large Chinese classical poetry matching dataset that can be used for poetry matching, understanding and translation.

The main task of this dataset is: given a description in modern Chinese, the model is supposed to select one line of Chinese classical poetry from four candidates that semantically match the given description most.

Size

It contains 27,218 instances in total, which are split into training (21,778), validation (2,720) and test (2,720) sets.

Format

Each instance is composed of translation (the description in modern Chinese, a string), choice (four candidate lines of Chinese classical poetry, a list) and answer (the index of the correct line, an integer between 0 and 3).

Source: https://github.com/THUNLP-AIPoet/CCPM