TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Pushing on Text Readability Assessment: A Transformer Meet...

Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features

Bruce W. Lee, Yoo Sung Jang, Jason Hyung-Jong Lee

2021-09-25EMNLP 2021 11Text Classification
PaperPDFCode(official)

Abstract

We report two essential improvements in readability assessment: 1. three novel features in advanced semantics and 2. the timely evidence that traditional ML models (e.g. Random Forest, using handcrafted features) can combine with transformers (e.g. RoBERTa) to augment model performance. First, we explore suitable transformers and traditional ML models. Then, we extract 255 handcrafted linguistic features using self-developed extraction software. Finally, we assemble those to create several hybrid models, achieving state-of-the-art (SOTA) accuracy on popular datasets in readability assessment. The use of handcrafted features help model performance on smaller datasets. Notably, our RoBERTA-RF-T1 hybrid achieves the near-perfect classification accuracy of 99%, a 20.3% increase from the previous SOTA.

Results

TaskDatasetMetricValueModel
Text ClassificationWeeBit (Readability Assessment)Accuracy (5-fold)0.905BART-RF-T1 hybrid
Text ClassificationOneStopEnglish (Readability Assessment)Accuracy (5-fold)0.99RoBERTa-RF-T1 hybrid
ClassificationWeeBit (Readability Assessment)Accuracy (5-fold)0.905BART-RF-T1 hybrid
ClassificationOneStopEnglish (Readability Assessment)Accuracy (5-fold)0.99RoBERTa-RF-T1 hybrid

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10The Trilemma of Truth in Large Language Models2025-06-30Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack2025-06-30Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems2025-06-25Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning?2025-06-21SHREC and PHEONA: Using Large Language Models to Advance Next-Generation Computational Phenotyping2025-06-19Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-Resource Languages2025-06-12