Chiyu Song, Hongliang He, Haofei Yu, Pengfei Fang, Leyang Cui, Zhenzhong Lan
Sample-and-rank is a key decoding strategy for modern generation-based dialogue systems. It helps achieve diverse and high-quality responses by selecting an answer from a small pool of generated candidates. The current state-of-the-art ranking methods mainly use an encoding paradigm called Cross-Encoder, which separately encodes each context-candidate pair and ranks the candidates according to their fitness scores. However, Cross-Encoder repeatedly encodes the same lengthy context for each candidate, resulting in high computational costs. Poly-Encoder addresses the above problems by reducing the interaction between context and candidates, but with a price of performance drop. In this work, we develop a new paradigm called Uni-Encoder, that keeps the full attention over each pair as in Cross-Encoder while only encoding the context once, as in Poly-Encoder. Uni-Encoder encodes all the candidates with the context in one forward pass. We use the same positional embedding for all candidates to ensure they are treated equally and design a new attention mechanism to avoid confusion. Our Uni-Encoder can simulate other ranking paradigms using different attention and response concatenation methods. Extensive experiments show that our proposed paradigm achieves new state-of-the-art results on four benchmark datasets with high computational efficiency. For instance, it improves R10@1 by 2.9% with an approximately 4X faster inference speed on the Ubuntu V2 dataset.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Conversational Response Selection | Douban | MAP | 0.648 | Uni-Enc+BERT-FP |
| Conversational Response Selection | Douban | MRR | 0.688 | Uni-Enc+BERT-FP |
| Conversational Response Selection | Douban | P@1 | 0.518 | Uni-Enc+BERT-FP |
| Conversational Response Selection | Douban | R10@1 | 0.327 | Uni-Enc+BERT-FP |
| Conversational Response Selection | Douban | R10@2 | 0.557 | Uni-Enc+BERT-FP |
| Conversational Response Selection | Douban | R10@5 | 0.865 | Uni-Enc+BERT-FP |
| Conversational Response Selection | Douban | MAP | 0.622 | Uni-Encoder |
| Conversational Response Selection | Douban | MRR | 0.662 | Uni-Encoder |
| Conversational Response Selection | Douban | P@1 | 0.481 | Uni-Encoder |
| Conversational Response Selection | Douban | R10@1 | 0.303 | Uni-Encoder |
| Conversational Response Selection | Douban | R10@2 | 0.514 | Uni-Encoder |
| Conversational Response Selection | Douban | R10@5 | 0.852 | Uni-Encoder |
| Conversational Response Selection | Persona-Chat | MRR | 0.922 | Uni-Encoder |
| Conversational Response Selection | Persona-Chat | R20@1 | 0.869 | Uni-Encoder |
| Conversational Response Selection | Ubuntu Dialogue (v2, Ranking) | R10@1 | 0.859 | Uni-Encoder |
| Conversational Response Selection | Ubuntu Dialogue (v2, Ranking) | R10@2 | 0.938 | Uni-Encoder |
| Conversational Response Selection | Ubuntu Dialogue (v2, Ranking) | R10@5 | 0.99 | Uni-Encoder |
| Conversational Response Selection | Ubuntu Dialogue (v1, Ranking) | R10@1 | 0.916 | Uni-Enc+BERT-FP |
| Conversational Response Selection | Ubuntu Dialogue (v1, Ranking) | R10@2 | 0.965 | Uni-Enc+BERT-FP |
| Conversational Response Selection | Ubuntu Dialogue (v1, Ranking) | R10@5 | 0.994 | Uni-Enc+BERT-FP |
| Conversational Response Selection | Ubuntu Dialogue (v1, Ranking) | R10@1 | 0.886 | Uni-Encoder |
| Conversational Response Selection | Ubuntu Dialogue (v1, Ranking) | R10@2 | 0.946 | Uni-Encoder |
| Conversational Response Selection | Ubuntu Dialogue (v1, Ranking) | R10@5 | 0.989 | Uni-Encoder |