TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Handwritten Mathematical Expression Recognition with Bidir...

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer

Wenqi Zhao, Liangcai Gao, Zuoyu Yan, Shuai Peng, Lin Du, Ziyin Zhang

2021-05-06Handwritten Mathmatical Expression RecognitionData AugmentationLanguage Modelling
PaperPDFCodeCode(official)

Abstract

Encoder-decoder models have made great progress on handwritten mathematical expression recognition recently. However, it is still a challenge for existing methods to assign attention to image features accurately. Moreover, those encoder-decoder models usually adopt RNN-based models in their decoder part, which makes them inefficient in processing long $\LaTeX{}$ sequences. In this paper, a transformer-based decoder is employed to replace RNN-based ones, which makes the whole model architecture very concise. Furthermore, a novel training strategy is introduced to fully exploit the potential of the transformer in bidirectional language modeling. Compared to several methods that do not use data augmentation, experiments demonstrate that our model improves the ExpRate of current state-of-the-art methods on CROHME 2014 by 2.23%. Similarly, on CROHME 2016 and CROHME 2019, we improve the ExpRate by 1.92% and 2.28% respectively.

Results

TaskDatasetMetricValueModel
Handwritten Mathmatical Expression RecognitionCROHME 2016ExpRate52.31BTTR
Handwritten Mathmatical Expression RecognitionHME100KExpRate64.1BTTR
Handwritten Mathmatical Expression RecognitionCROHME 2019ExpRate52.96BTTR
Handwritten Mathmatical Expression RecognitionCROHME 2014ExpRate53.96BTTR

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16