TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Neural CRF Model for Sentence Alignment in Text Simplifica...

Neural CRF Model for Sentence Alignment in Text Simplification

Chao Jiang, Mounica Maddela, Wuwei Lan, Yang Zhong, Wei Xu

2020-05-05ACL 2020 6Semantic SimilaritySemantic Textual SimilarityText Simplification
PaperPDFCode(official)

Abstract

The success of a text simplification system heavily depends on the quality and quantity of complex-simple sentence pairs in the training corpus, which are extracted by aligning sentences between parallel articles. To evaluate and improve sentence alignment quality, we create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia. We propose a novel neural CRF alignment model which not only leverages the sequential nature of sentences in parallel documents but also utilizes a neural sentence pair model to capture semantic similarity. Experiments demonstrate that our proposed approach outperforms all the previous work on monolingual sentence alignment task by more than 5 points in F1. We apply our CRF aligner to construct two new text simplification datasets, Newsela-Auto and Wiki-Auto, which are much larger and of better quality compared to the existing datasets. A Transformer-based seq2seq model trained on our datasets establishes a new state-of-the-art for text simplification in both automatic and human evaluation.

Results

TaskDatasetMetricValueModel
Text SimplificationNewselaSARI36.6CRF Alignment + Transformer

Related Papers

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression2025-07-08FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection2025-07-06LineRetriever: Planning-Aware Observation Reduction for Web Agents2025-06-30Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval2025-06-26DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning2025-06-26Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn't Help with MT Evaluation2025-06-25Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models2025-06-25