Introduction
CCPM is a large Chinese classical poetry matching dataset that can be used for poetry matching, understanding and translation.
The main task of this dataset is: given a description in modern Chinese, the model is supposed to select one line of Chinese classical poetry from four candidates that semantically match the given description most.
Size
It contains 27,218 instances in total, which are split into training (21,778), validation (2,720) and test (2,720) sets.
Format
Each instance is composed of translation (the description in modern Chinese, a string), choice (four candidate lines of Chinese classical poetry, a list) and answer (the index of the correct line, an integer between 0 and 3).
Source: https://github.com/THUNLP-AIPoet/CCPMPaper | Code | Results | Date | Stars |
---|