TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Sequence Labeling Approach to the Task of Sentence Boundar...

Sequence Labeling Approach to the Task of Sentence Boundary Detection

The Anh Le

2020-01-20ICMLSC 2020: Proceedings of the 4th International Conference on Machine Learning and Soft Computing 2020 1Speech RecognitionAutomatic Speech RecognitionStructured PredictionTopic ClassificationAutomatic Speech Recognition (ASR)Sentiment AnalysisPart-Of-Speech Taggingspeech-recognitionnamed-entity-recognitionIntent DetectionChatbotNamed Entity RecognitionBoundary DetectionNamed Entity Recognition (NER)
PaperPDFCode

Abstract

One of the keys to enable chatbots to communicate with human in a more natural way is the ability to handle long and complex user's utterances. In order to achieve this goal, we propose to integrate the Sentence Boundary Detection (SBD) module into the chatbot architecture, whose role is to take as input a user's utterance from an automatic speech recognition device, in which sentence boundaries are not available, and output the corresponding list of punctuated sentences for downstream modules such as Intent Detection, Topic Classification, Sentiment Analysis, Named Entity Recognition, as well as Coreference Recognition. To address the SBD task, we reformulate it as a sequence labeling task. In this way, both deep neural network models (e.g., Bi-directional Long Short-Term Memory, Convolutional Neural Network) and structured prediction models (e.g., Hidden Markov Model, Maximum Entropy Model, Conditional Random Field) can be leveraged. After reformulating the SBD task, we built a hybrid deep neural network model and achieved good performance on both CornellMovie-Dialog and DailyDialog datasets.

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning2025-07-14