Self-Supervised Vision Transformers for Malware Detection

Sachith Seneviratne, Ridwan Shariffdeen, Sanka Rasnayaka, Nuran Kasthuriarachchi

2022-08-15Malware Family Detection Binary Classification Malware Classification Self-Supervised Learning Malware Type Detection Malware Detection

Paper PDF Code(official)

Abstract

Malware detection plays a crucial role in cyber-security with the increase in malware growth and advancements in cyber-attacks. Previously unseen malware which is not determined by security vendors are often used in these attacks and it is becoming inevitable to find a solution that can self-learn from unlabeled sample data. This paper presents SHERLOCK, a self-supervision based deep learning model to detect malware based on the Vision Transformer (ViT) architecture. SHERLOCK is a novel malware detection method which learns unique features to differentiate malware from benign programs with the use of image-based binary representation. Experimental results using 1.2 million Android applications across a hierarchy of 47 types and 696 families, shows that self-supervised learning can achieve an accuracy of 97% for the binary classification of malware which is higher than existing state-of-the-art techniques. Our proposed model is also able to outperform state-of-the-art techniques for multi-class malware classification of types and family with macro-F1 score of .497 and .491 respectively.

Results

Task	Dataset	Metric	Value	Model
Malware Classification	MalNet	F1 score	0.878	SHERLOCK (family)
Malware Classification	MalNet	F1 score	0.876	SHERLOCK (type)
Malware Classification	MalNet	F1 score	0.854	SHERLOCK

Related Papers

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17 Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14 An Automated Classifier of Harmful Brain Activities for Clinical Usage Based on a Vision-Inspired Pre-trained Framework2025-07-10 Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08 World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model2025-07-01 ShapeEmbed: a self-supervised learning framework for 2D contour quantification2025-07-01 DDL: A Dataset for Interpretable Deepfake Detection and Localization in Real-World Scenarios2025-06-29 RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models2025-06-27