TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuo...

CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition

Sarah Alyami, Hamzah Luqman

2025-04-02parameter-efficient fine-tuningSign Language Recognition
PaperPDFCode(official)

Abstract

Continuous sign language recognition (CSLR) focuses on interpreting and transcribing sequences of sign language gestures in videos. In this work, we propose CLIP sign language adaptation (CLIP-SLA), a novel CSLR framework that leverages the powerful pre-trained visual encoder from the CLIP model to sign language tasks through parameter-efficient fine-tuning (PEFT). We introduce two variants, SLA-Adapter and SLA-LoRA, which integrate PEFT modules into the CLIP visual encoder, enabling fine-tuning with minimal trainable parameters. The effectiveness of the proposed frameworks is validated on four datasets: Phoenix2014, Phoenix2014-T, CSL-Daily, and Isharah-500, where both CLIP-SLA variants outperformed several SOTA models with fewer trainable parameters. Extensive ablation studies emphasize the effectiveness and flexibility of the proposed methods with different vision-language models for CSLR. These findings showcase the potential of adapting large-scale pre-trained models for scalable and efficient CSLR, which pave the way for future advancements in sign language understanding.

Results

TaskDatasetMetricValueModel
Sign Language RecognitionRWTH-PHOENIX-Weather 2014Word Error Rate (WER)18.8SLA-Adapter
Sign Language RecognitionRWTH-PHOENIX-Weather 2014 TWord Error Rate (WER)19.4SLA-LoRA
Sign Language RecognitionCSL-DailyWord Error Rate (WER)25.8SLA-LoRA

Related Papers

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization2025-07-06Exploring Adapter Design Tradeoffs for Low Resource Music Generation2025-06-26WordCon: Word-level Typography Control in Scene Text Rendering2025-06-26Optimising Language Models for Downstream Tasks: A Post-Training Perspective2025-06-26Progtuning: Progressive Fine-tuning Framework for Transformer-based Language Models2025-06-26Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models2025-06-26Hierarchical Sub-action Tree for Continuous Sign Language Recognition2025-06-26