ESPNet

Computer VisionIntroduced 200023 papers

Description

ESPNet is a convolutional neural network for semantic segmentation of high resolution images under resource constraints. ESPNet is based on a convolutional module, efficient spatial pyramid (ESP), which is efficient in terms of computation, memory, and power.

Papers Using This Method

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning2025-05-31 ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech2024-09-24 ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration2024-09-14 The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization2024-07-23 A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system2024-04-29 EURO: ESPnet Unsupervised ASR Open-source Toolkit2022-11-30 ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding2022-07-19 TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline2022-06-27 NatiQ: An End-to-end Text-to-Speech System for Arabic2022-06-15 EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers2022-03-31 A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies2022-01-14 An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition2021-10-09 DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows2021-05-30 User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis2020-12-15 Multitask Learning and Joint Optimization for Transformer-RNN-Transducer Speech Recognition2020-11-02 A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline2020-09-22 Frame-To-Frame Consistent Semantic Segmentation2020-08-03 ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit2019-10-24 Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text2019-04-30 C3: Concentrated-Comprehensive Convolution and its application to semantic segmentation2018-12-12