RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model
Milan Straka, Jakub Náplava, Jana Straková, David Samuel
2021-05-24Semantic Parsing
Abstract
We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-the-art results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base.
Results
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Parsing | PTG (czech, MRP 2020) | F1 | 92.36 | PERIN + RobeCzech |
Related Papers
Where, What, Why: Towards Explainable Driver Attention Prediction2025-06-29Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering2025-05-20Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models2025-05-16Sigma: A dataset for text-to-code semantic parsing with statistical analysis2025-04-05Diverse In-Context Example Selection After Decomposing Programs and Aligned Utterances Improves Semantic Parsing2025-04-04ZOGRASCOPE: A New Benchmark for Property Graphs2025-03-07Geo-Semantic-Parsing: AI-powered geoparsing by traversing semantic knowledge graphs2025-03-03Disambiguate First Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing2025-02-25