Description
A method that can recover the weights of the pre-fine-tuning model using a few low-rank (LoRA) fine-tuned models. In contrast to previous attacks that attempt to recover pre-fine-tuning capabilities, Spectral DeTuning aims to recover the exact pre-fine-tuning weights. Spectral DeTuning can exploit this vulnerability against large-scale models such as a personalized Stable Diffusion and an aligned Mistral.