A Unified Neural Network Model for Readability Assessment with Feature Projection and Length-Balanced Loss

Wenbiao Li, Ziyang Wang, Yunfang Wu

2022-10-19Text Classification

Abstract

For readability assessment, traditional methods mainly employ machine learning classifiers with hundreds of linguistic features. Although the deep learning model has become the prominent approach for almost all NLP tasks, it is less explored for readability assessment. In this paper, we propose a BERT-based model with feature projection and length-balanced loss (BERT-FP-LBL) for readability assessment. Specially, we present a new difficulty knowledge guided semi-supervised method to extract topic features to complement the traditional linguistic features. From the linguistic features, we employ projection filtering to extract orthogonal features to supplement BERT representations. Furthermore, we design a new length-balanced loss to handle the greatly varying length distribution of data. Our model achieves state-of-the-art performances on two English benchmark datasets and one dataset of Chinese textbooks, and also achieves the near-perfect accuracy of 99\% on one English dataset. Moreover, our proposed model obtains comparable results with human experts in consistency test.

Results

Task	Dataset	Metric	Value	Model
Text Classification	WeeBit (Readability Assessment)	Accuracy (5-fold)	0.927	BERT-FP-LBL
Classification	WeeBit (Readability Assessment)	Accuracy (5-fold)	0.927	BERT-FP-LBL

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17 GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10 The Trilemma of Truth in Large Language Models2025-06-30 Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack2025-06-30 Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems2025-06-25 Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning?2025-06-21 SHREC and PHEONA: Using Large Language Models to Advance Next-Generation Computational Phenotyping2025-06-19 Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-Resource Languages2025-06-12