Segmentation en phrases : ouvrez les guillemets sans perdre le fil
Sandrine Ollinger, Denis Maurel
2024-07-29Sentence segmentation
Abstract
This paper presents a graph cascade for sentence segmentation of XML documents. Our proposal offers sentences inside sentences for cases introduced by quotation marks and hyphens, and also pays particular attention to situations involving incises introduced by parentheses and lists introduced by colons. We present how the tool works and compare the results obtained with those available in 2019 on the same dataset, together with an evaluation of the system's performance on a test corpus
Related Papers
Human Genome Book: Words, Sentences and Paragraphs2025-01-23Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation2024-06-24Opera Graeca Adnotata: Building a 34M+ Token Multilayer Corpus for Ancient Greek2024-03-31Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation2023-11-28KG-GPT: A General Framework for Reasoning on Knowledge Graphs Using Large Language Models2023-10-17GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts2023-07-11Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation2023-05-30Prosodic features improve sentence segmentation and parsing2023-02-23