Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

Yuri Kuratov, Mikhail Arkhipov

2019-05-17Reading Comprehension Question Answering Paraphrase Identification Sentiment Analysis Natural Language Inference Transfer Learning

Paper PDF Code Code

Abstract

The paper introduces methods of adaptation of multilingual masked language models for a specific language. Pre-trained bidirectional language models show state-of-the-art performance on a wide range of tasks including reading comprehension, natural language inference, and sentiment analysis. At the moment there are two alternative approaches to train such models: monolingual and multilingual. While language specific models show superior performance, multilingual models allow to perform a transfer from one language to another and solve tasks for different languages simultaneously. This work shows that transfer learning from a multilingual model to monolingual model results in significant growth of performance on such tasks as reading comprehension, paraphrase detection, and sentiment analysis. Furthermore, multilingual initialization of monolingual model substantially reduces training time. Pre-trained models for the Russian language are open sourced.

Results

Task	Dataset	Metric	Value	Model
Question Answering	SQuAD1.1	F1	84.6	RuBERT
Sentiment Analysis	RuSentiment	Weighted F1	72.63	RuBERT

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18 From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17 Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17 Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16