Tianyi Tang, Junyi Li, Zhipeng Chen, Yiwen Hu, Zhuohao Yu, Wenxun Dai, Zican Dong, Xiaoxue Cheng, Yuhao Wang, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Sketch | GYAFC | Accuracy | 94.37 | BART (TextBox 2.0) |
| Sketch | GYAFC | BLEU-4 | 76.93 | BART (TextBox 2.0) |
| Sketch | GYAFC | Harmonic mean | 84.74 | BART (TextBox 2.0) |
| Dialogue | Persona-Chat | BLEU-1 | 49.581 | BART (TextBox 2.0) |
| Dialogue | Persona-Chat | BLEU-2 | 39.24 | BART (TextBox 2.0) |
| Dialogue | Persona-Chat | Distinct-1 | 1.44 | BART (TextBox 2.0) |
| Dialogue | Persona-Chat | Distinct-2 | 8.89 | BART (TextBox 2.0) |
| Dialogue | MULTIWOZ 2.0 | BLEU-4 | 20.17 | BART (TextBox 2.0) |
| Dialogue | MULTIWOZ 2.0 | Score | 100.07 | BART (TextBox 2.0) |
| Machine Translation | WMT2016 Romanian-English | BLEU-4 | 37.48 | BART (TextBox 2.0) |
| Machine Translation | WMT2016 English-Romanian | BLEU-4 | 37.2 | BART (TextBox 2.0) |
| Style Transfer | GYAFC | Accuracy | 94.37 | BART (TextBox 2.0) |
| Style Transfer | GYAFC | BLEU-4 | 76.93 | BART (TextBox 2.0) |
| Style Transfer | GYAFC | Harmonic mean | 84.74 | BART (TextBox 2.0) |
| Question Answering | SQuAD1.1 | Exact Match | 86.44 | BART (TextBox 2.0) |
| Question Answering | SQuAD1.1 | F1 | 93.04 | BART (TextBox 2.0) |
| Text Generation | ADGEN | BLEU-4 | 10.2 | BART (TextBox 2.0) |
| Text Generation | CSL | ROUGE-L | 64.34 | BART (TextBox 2.0) |
| Text Generation | LCSTS | ROUGE-L | 42.96 | BART (TextBox 2.0) |
| Text Generation | CommonGen | BLEU-4 | 28.18 | BART (TextBox 2.0) |
| Text Generation | CommonGen | CIDEr | 12.98 | BART (TextBox 2.0) |
| Text Generation | CommonGen | SPICE | 33 | BART (TextBox 2.0) |
| Text Generation | WebNLG | BLEU-4 | 67.33 | BART (TextBox 2.0) |
| Text Generation | WebNLG | METEOR | 47.78 | BART (TextBox 2.0) |
| Text Generation | WebNLG | ROUGE-L | 76.83 | BART (TextBox 2.0) |
| Text Generation | WritingPrompts | BLEU-1 | 33.79 | BART (TextBox 2.0) |
| Text Generation | WritingPrompts | BLEU-2 | 15.78 | BART (TextBox 2.0) |
| Text Generation | WritingPrompts | Distinct-4 | 78.762 | BART (TextBox 2.0) |
| Text Simplification | Wiki-Auto + Turk | BLEU-4 | 90.81 | BART (TextBox 2.0) |
| Text Simplification | Wiki-Auto + Turk | METEOR | 57.58 | BART (TextBox 2.0) |
| Text Simplification | Wiki-Auto + Turk | ROUGE-2 | 83.36 | BART (TextBox 2.0) |
| Text Summarization | CNN/Daily Mail | ROUGE-1 | 44.47 | BART (TextBox 2.0) |
| Text Summarization | CNN/Daily Mail | ROUGE-2 | 21.5 | BART (TextBox 2.0) |
| Text Summarization | CNN/Daily Mail | ROUGE-L | 41.35 | BART (TextBox 2.0) |
| Abstractive Text Summarization | CNN/Daily Mail | ROUGE-1 | 44.47 | BART (TextBox 2.0) |
| Abstractive Text Summarization | CNN/Daily Mail | ROUGE-2 | 21.5 | BART (TextBox 2.0) |
| Abstractive Text Summarization | CNN/Daily Mail | ROUGE-L | 41.35 | BART (TextBox 2.0) |
| Data-to-Text Generation | WebNLG | BLEU-4 | 67.33 | BART (TextBox 2.0) |
| Data-to-Text Generation | WebNLG | METEOR | 47.78 | BART (TextBox 2.0) |
| Data-to-Text Generation | WebNLG | ROUGE-L | 76.83 | BART (TextBox 2.0) |
| Question Generation | SQuAD1.1 | BLEU-4 | 25.08 | BART (TextBox 2.0) |
| Question Generation | SQuAD1.1 | METEOR | 26.73 | BART (TextBox 2.0) |
| Question Generation | SQuAD1.1 | ROUGE-L | 52.55 | BART (TextBox 2.0) |
| 2D Human Pose Estimation | GYAFC | Accuracy | 94.37 | BART (TextBox 2.0) |
| 2D Human Pose Estimation | GYAFC | BLEU-4 | 76.93 | BART (TextBox 2.0) |
| 2D Human Pose Estimation | GYAFC | Harmonic mean | 84.74 | BART (TextBox 2.0) |
| 2D Classification | GYAFC | Accuracy | 94.37 | BART (TextBox 2.0) |
| 2D Classification | GYAFC | BLEU-4 | 76.93 | BART (TextBox 2.0) |
| 2D Classification | GYAFC | Harmonic mean | 84.74 | BART (TextBox 2.0) |
| Task-Oriented Dialogue Systems | MULTIWOZ 2.0 | BLEU-4 | 20.17 | BART (TextBox 2.0) |
| Task-Oriented Dialogue Systems | MULTIWOZ 2.0 | Score | 100.07 | BART (TextBox 2.0) |
| Story Generation | WritingPrompts | BLEU-1 | 33.79 | BART (TextBox 2.0) |
| Story Generation | WritingPrompts | BLEU-2 | 15.78 | BART (TextBox 2.0) |
| Story Generation | WritingPrompts | Distinct-4 | 78.762 | BART (TextBox 2.0) |
| 1 Image, 2*2 Stitchi | GYAFC | Accuracy | 94.37 | BART (TextBox 2.0) |
| 1 Image, 2*2 Stitchi | GYAFC | BLEU-4 | 76.93 | BART (TextBox 2.0) |
| 1 Image, 2*2 Stitchi | GYAFC | Harmonic mean | 84.74 | BART (TextBox 2.0) |
| Drawing Pictures | GYAFC | Accuracy | 94.37 | BART (TextBox 2.0) |
| Drawing Pictures | GYAFC | BLEU-4 | 76.93 | BART (TextBox 2.0) |
| Drawing Pictures | GYAFC | Harmonic mean | 84.74 | BART (TextBox 2.0) |