Daniel Bermuth, Alexander Poeppel, Wolfgang Reif
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands, like the intent of what a user wants the system to do and special entities like locations or numbers. This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training. Building those models is very fast and only takes a few seconds. It is also completely language independent. With a comparison on different benchmarks it is shown that this method can outperform multiple other, more resource demanding SLU approaches.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Dialogue | Snips-SmartSpeaker | Accuracy-EN (%) | 87.9 | Finstreder (Conformer, character-based) |
| Dialogue | Snips-SmartSpeaker | Accuracy-FR (%) | 86.5 | Finstreder (Conformer, character-based) |
| Dialogue | Snips-SmartSpeaker | Accuracy-EN (%) | 80.4 | Finstreder (Conformer) |
| Dialogue | Snips-SmartSpeaker | Accuracy-FR (%) | 78.3 | Finstreder (Conformer) |
| Dialogue | Snips-SmartSpeaker | Accuracy-EN (%) | 77.6 | Finstreder (Quartznet) |
| Dialogue | Snips-SmartSpeaker | Accuracy-FR (%) | 77.8 | Finstreder (Quartznet) |
| Dialogue | Snips-SmartLights | Accuracy (%) | 89 | Finstreder (Conformer, character-based) |
| Dialogue | Snips-SmartLights | Accuracy (%) | 88 | Finstreder (Conformer) |
| Dialogue | Snips-SmartLights | Accuracy (%) | 84.8 | Finstreder (Quartznet) |
| Dialogue | Fluent Speech Commands | Accuracy (%) | 99.8 | Finstreder (Conformer + AMT, character-based) |
| Dialogue | Fluent Speech Commands | Accuracy (%) | 99.7 | Finstreder (Quartznet + AMT) |
| Dialogue | Fluent Speech Commands | Accuracy (%) | 99.5 | Finstreder (Conformer) |
| Dialogue | Fluent Speech Commands | Accuracy (%) | 99.2 | Finstreder (Quartznet) |
| Dialogue | Fluent Speech Commands | Accuracy (%) | 98.7 | Amazon Alexa |
| Dialogue | Timers and Such | Accuracy (%) | 95.4 | Finstreder (Conformer) |
| Dialogue | Timers and Such | Accuracy (%) | 90 | Finstreder (Quartznet) |
| Spoken Language Understanding | Snips-SmartSpeaker | Accuracy-EN (%) | 87.9 | Finstreder (Conformer, character-based) |
| Spoken Language Understanding | Snips-SmartSpeaker | Accuracy-FR (%) | 86.5 | Finstreder (Conformer, character-based) |
| Spoken Language Understanding | Snips-SmartSpeaker | Accuracy-EN (%) | 80.4 | Finstreder (Conformer) |
| Spoken Language Understanding | Snips-SmartSpeaker | Accuracy-FR (%) | 78.3 | Finstreder (Conformer) |
| Spoken Language Understanding | Snips-SmartSpeaker | Accuracy-EN (%) | 77.6 | Finstreder (Quartznet) |
| Spoken Language Understanding | Snips-SmartSpeaker | Accuracy-FR (%) | 77.8 | Finstreder (Quartznet) |
| Spoken Language Understanding | Snips-SmartLights | Accuracy (%) | 89 | Finstreder (Conformer, character-based) |
| Spoken Language Understanding | Snips-SmartLights | Accuracy (%) | 88 | Finstreder (Conformer) |
| Spoken Language Understanding | Snips-SmartLights | Accuracy (%) | 84.8 | Finstreder (Quartznet) |
| Spoken Language Understanding | Fluent Speech Commands | Accuracy (%) | 99.8 | Finstreder (Conformer + AMT, character-based) |
| Spoken Language Understanding | Fluent Speech Commands | Accuracy (%) | 99.7 | Finstreder (Quartznet + AMT) |
| Spoken Language Understanding | Fluent Speech Commands | Accuracy (%) | 99.5 | Finstreder (Conformer) |
| Spoken Language Understanding | Fluent Speech Commands | Accuracy (%) | 99.2 | Finstreder (Quartznet) |
| Spoken Language Understanding | Fluent Speech Commands | Accuracy (%) | 98.7 | Amazon Alexa |
| Spoken Language Understanding | Timers and Such | Accuracy (%) | 95.4 | Finstreder (Conformer) |
| Spoken Language Understanding | Timers and Such | Accuracy (%) | 90 | Finstreder (Quartznet) |
| Intent Classification | SLURP | Accuracy (%) | 53.11 | Finstreder (Conformer) |
| Intent Classification | SLURP | Accuracy (%) | 43.15 | Finstreder (Quartznet) |
| Slot Filling | SLURP | F1 | 0.395 | Finstreder (Conformer) |
| Slot Filling | SLURP | F1 | 0.313 | Finstreder (Quartznet) |
| Dialogue Understanding | Snips-SmartSpeaker | Accuracy-EN (%) | 87.9 | Finstreder (Conformer, character-based) |
| Dialogue Understanding | Snips-SmartSpeaker | Accuracy-FR (%) | 86.5 | Finstreder (Conformer, character-based) |
| Dialogue Understanding | Snips-SmartSpeaker | Accuracy-EN (%) | 80.4 | Finstreder (Conformer) |
| Dialogue Understanding | Snips-SmartSpeaker | Accuracy-FR (%) | 78.3 | Finstreder (Conformer) |
| Dialogue Understanding | Snips-SmartSpeaker | Accuracy-EN (%) | 77.6 | Finstreder (Quartznet) |
| Dialogue Understanding | Snips-SmartSpeaker | Accuracy-FR (%) | 77.8 | Finstreder (Quartznet) |
| Dialogue Understanding | Snips-SmartLights | Accuracy (%) | 89 | Finstreder (Conformer, character-based) |
| Dialogue Understanding | Snips-SmartLights | Accuracy (%) | 88 | Finstreder (Conformer) |
| Dialogue Understanding | Snips-SmartLights | Accuracy (%) | 84.8 | Finstreder (Quartznet) |
| Dialogue Understanding | Fluent Speech Commands | Accuracy (%) | 99.8 | Finstreder (Conformer + AMT, character-based) |
| Dialogue Understanding | Fluent Speech Commands | Accuracy (%) | 99.7 | Finstreder (Quartznet + AMT) |
| Dialogue Understanding | Fluent Speech Commands | Accuracy (%) | 99.5 | Finstreder (Conformer) |
| Dialogue Understanding | Fluent Speech Commands | Accuracy (%) | 99.2 | Finstreder (Quartznet) |
| Dialogue Understanding | Fluent Speech Commands | Accuracy (%) | 98.7 | Amazon Alexa |
| Dialogue Understanding | Timers and Such | Accuracy (%) | 95.4 | Finstreder (Conformer) |
| Dialogue Understanding | Timers and Such | Accuracy (%) | 90 | Finstreder (Quartznet) |