Isotropy Matters: Soft-ZCA Whitening of Embeddings for Semantic Code Search
Andor Diera, Lukas Galke, Ansgar Scherp
2024-11-26Code Search
Abstract
Low isotropy in an embedding space impairs performance on tasks involving semantic inference. Our study investigates the impact of isotropy on semantic code search performance and explores post-processing techniques to mitigate this issue. We analyze various code language models, examine isotropy in their embedding spaces, and its influence on search effectiveness. We propose a modified ZCA whitening technique to control isotropy levels in embeddings. Our results demonstrate that Soft-ZCA whitening improves the performance of pre-trained code language models and can complement contrastive fine-tuning.
Related Papers
MGS3: A Multi-Granularity Self-Supervised Code Search Framework2025-05-30DeepRTL2: A Versatile Model for RTL-Related Tasks2025-05-28Knowledge Graph Based Repository-Level Code Generation2025-05-20LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models2025-05-20Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks2025-04-28Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code2025-04-24A Study on Mixup-Inspired Augmentation Methods for Software Vulnerability Detection2025-04-22Zero-Shot Cross-Domain Code Search without Fine-Tuning2025-04-10