TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Finstreder: Simple and fast Spoken Language Understanding ...

Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models

Daniel Bermuth, Alexander Poeppel, Wolfgang Reif

2022-06-29Speech-to-TextSlot FillingSpoken Language UnderstandingIntent Classification
PaperPDFCode(official)Code

Abstract

In Spoken Language Understanding (SLU) the task is to extract important information from audio commands, like the intent of what a user wants the system to do and special entities like locations or numbers. This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training. Building those models is very fast and only takes a few seconds. It is also completely language independent. With a comparison on different benchmarks it is shown that this method can outperform multiple other, more resource demanding SLU approaches.

Results

TaskDatasetMetricValueModel
DialogueSnips-SmartSpeakerAccuracy-EN (%)87.9Finstreder (Conformer, character-based)
DialogueSnips-SmartSpeakerAccuracy-FR (%)86.5Finstreder (Conformer, character-based)
DialogueSnips-SmartSpeakerAccuracy-EN (%)80.4Finstreder (Conformer)
DialogueSnips-SmartSpeakerAccuracy-FR (%)78.3Finstreder (Conformer)
DialogueSnips-SmartSpeakerAccuracy-EN (%)77.6Finstreder (Quartznet)
DialogueSnips-SmartSpeakerAccuracy-FR (%)77.8Finstreder (Quartznet)
DialogueSnips-SmartLightsAccuracy (%)89Finstreder (Conformer, character-based)
DialogueSnips-SmartLightsAccuracy (%)88Finstreder (Conformer)
DialogueSnips-SmartLightsAccuracy (%)84.8Finstreder (Quartznet)
DialogueFluent Speech CommandsAccuracy (%)99.8Finstreder (Conformer + AMT, character-based)
DialogueFluent Speech CommandsAccuracy (%)99.7Finstreder (Quartznet + AMT)
DialogueFluent Speech CommandsAccuracy (%)99.5Finstreder (Conformer)
DialogueFluent Speech CommandsAccuracy (%)99.2Finstreder (Quartznet)
DialogueFluent Speech CommandsAccuracy (%)98.7Amazon Alexa
DialogueTimers and SuchAccuracy (%)95.4Finstreder (Conformer)
DialogueTimers and SuchAccuracy (%)90Finstreder (Quartznet)
Spoken Language UnderstandingSnips-SmartSpeakerAccuracy-EN (%)87.9Finstreder (Conformer, character-based)
Spoken Language UnderstandingSnips-SmartSpeakerAccuracy-FR (%)86.5Finstreder (Conformer, character-based)
Spoken Language UnderstandingSnips-SmartSpeakerAccuracy-EN (%)80.4Finstreder (Conformer)
Spoken Language UnderstandingSnips-SmartSpeakerAccuracy-FR (%)78.3Finstreder (Conformer)
Spoken Language UnderstandingSnips-SmartSpeakerAccuracy-EN (%)77.6Finstreder (Quartznet)
Spoken Language UnderstandingSnips-SmartSpeakerAccuracy-FR (%)77.8Finstreder (Quartznet)
Spoken Language UnderstandingSnips-SmartLightsAccuracy (%)89Finstreder (Conformer, character-based)
Spoken Language UnderstandingSnips-SmartLightsAccuracy (%)88Finstreder (Conformer)
Spoken Language UnderstandingSnips-SmartLightsAccuracy (%)84.8Finstreder (Quartznet)
Spoken Language UnderstandingFluent Speech CommandsAccuracy (%)99.8Finstreder (Conformer + AMT, character-based)
Spoken Language UnderstandingFluent Speech CommandsAccuracy (%)99.7Finstreder (Quartznet + AMT)
Spoken Language UnderstandingFluent Speech CommandsAccuracy (%)99.5Finstreder (Conformer)
Spoken Language UnderstandingFluent Speech CommandsAccuracy (%)99.2Finstreder (Quartznet)
Spoken Language UnderstandingFluent Speech CommandsAccuracy (%)98.7Amazon Alexa
Spoken Language UnderstandingTimers and SuchAccuracy (%)95.4Finstreder (Conformer)
Spoken Language UnderstandingTimers and SuchAccuracy (%)90Finstreder (Quartznet)
Intent ClassificationSLURPAccuracy (%)53.11Finstreder (Conformer)
Intent ClassificationSLURPAccuracy (%)43.15Finstreder (Quartznet)
Slot FillingSLURPF10.395Finstreder (Conformer)
Slot FillingSLURPF10.313Finstreder (Quartznet)
Dialogue UnderstandingSnips-SmartSpeakerAccuracy-EN (%)87.9Finstreder (Conformer, character-based)
Dialogue UnderstandingSnips-SmartSpeakerAccuracy-FR (%)86.5Finstreder (Conformer, character-based)
Dialogue UnderstandingSnips-SmartSpeakerAccuracy-EN (%)80.4Finstreder (Conformer)
Dialogue UnderstandingSnips-SmartSpeakerAccuracy-FR (%)78.3Finstreder (Conformer)
Dialogue UnderstandingSnips-SmartSpeakerAccuracy-EN (%)77.6Finstreder (Quartznet)
Dialogue UnderstandingSnips-SmartSpeakerAccuracy-FR (%)77.8Finstreder (Quartznet)
Dialogue UnderstandingSnips-SmartLightsAccuracy (%)89Finstreder (Conformer, character-based)
Dialogue UnderstandingSnips-SmartLightsAccuracy (%)88Finstreder (Conformer)
Dialogue UnderstandingSnips-SmartLightsAccuracy (%)84.8Finstreder (Quartznet)
Dialogue UnderstandingFluent Speech CommandsAccuracy (%)99.8Finstreder (Conformer + AMT, character-based)
Dialogue UnderstandingFluent Speech CommandsAccuracy (%)99.7Finstreder (Quartznet + AMT)
Dialogue UnderstandingFluent Speech CommandsAccuracy (%)99.5Finstreder (Conformer)
Dialogue UnderstandingFluent Speech CommandsAccuracy (%)99.2Finstreder (Quartznet)
Dialogue UnderstandingFluent Speech CommandsAccuracy (%)98.7Amazon Alexa
Dialogue UnderstandingTimers and SuchAccuracy (%)95.4Finstreder (Conformer)
Dialogue UnderstandingTimers and SuchAccuracy (%)90Finstreder (Quartznet)

Related Papers

An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments2025-07-14LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization2025-06-20End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data2025-06-19I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs2025-06-17Invocable APIs derived from NL2SQL datasets for LLM Tool-Calling Evaluation2025-06-12S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation2025-06-11Advancing STT for Low-Resource Real-World Speech2025-06-10MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark2025-06-05