0/1 Deep Neural Networks via Block Coordinate Descent

HUI ZHANG, Shenglong Zhou, Geoffrey Ye Li, Naihua Xiu

2022-06-19Speech Recognition Asthmatic Lung Sound Classification Virtual Try-on Keyword Spotting Denoising Zero-Shot Video Question Answer Machine Translation EEG 4 classes Deblurring 3D Hand Pose Estimation Question Answering Text-to-Image Generation Visual Object Tracking Object Rearrangement Universal Domain Adaptation Face Recognition Railway Track Image Classification 3D Facial Expression Recognition Hateful Meme Classification Weakly Supervised Action Localization 16k Style Transfer License Plate Detection Handwritten Mathmatical Expression Recognition Novel View Synthesis Fracture detection Image Classification 3D Instance Segmentation Audio Classification 3D dense captioning Gloss-free Sign Language Translation Click-Through Rate Prediction Object Detection In Aerial Images Abstractive Text Summarization Rgb-T Tracking Common Sense Reasoning Multimodal Intent Recognition Drug Discovery Multi-Object Tracking Domain Generalization DeepFake Detection Color Image Denoising Anomaly Detection Image Dehazing 3D Lane Detection Long-range modeling Video Question Answering Real-Time Object Detection Semantic Segmentation Table-to-Text Generation Multimodal Emotion Recognition 3D Face Alignment Medical Image Segmentation Pose Estimation Image Captioning Graph Classification Highlight Detection Video deraining Object Tracking Depth Estimation Person Re-Identification Arithmetic Reasoning Text to 3D 2D Object Detection Robot Manipulation Code Generation Action Recognition Robot Task Planning Image Generation Change Detection Classification 10-shot image generation 3D Multi-Object Tracking Meme Classification Unsupervised Domain Adaptation Multi-Label Classification 3D Place Recognition Cross-Domain Few-Shot Object Detection Temporal Relation Extraction Speech Enhancement Music Source Separation Fraud Detection Fine-Grained Image Classification Object Detection Monocular Depth Estimation Language Modelling Fake Image Detection 3D Facial Landmark Localization Face Detection Phone-level pronunciation scoring Low-Light Image Enhancement Video Generation Robot Manipulation Generalization

Paper PDF

Abstract

The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs). As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for several decades. Even if there is an impressive body of work on designing DNNs with continuous activation functions that can be deemed as surrogates of the step function, it is still in the possession of some advantageous properties, such as complete robustness to outliers and being capable of attaining the best learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we aim to train DNNs with the step function used as an activation function (dubbed as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization problem and then solve it by a block coordinate descend (BCD) method. Moreover, we acquire closed-form solutions for sub-problems of BCD as well as its convergence properties. Furthermore, we also integrate $\ell_{2,0}$-regularization into 0/1 DNN to accelerate the training process and compress the network scale. As a result, the proposed algorithm has a high performance on classifying MNIST and Fashion-MNIST datasets. As a result, the proposed algorithm has a desirable performance on classifying MNIST, FashionMNIST, Cifar10, and Cifar100 datasets.

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	!(()&&!\|\|\|	0L	100	nyenye
Domain Adaptation	Office-Home	Average Accuracy	71.4	DisClusterDA
Image Enhancement	LOL	BSQ-rate over MS-SSIM	0.2	rr
3D Reconstruction	1	0L	99	STYLE
Question Answering	MultiTQ	Hits@1	72.8	TimeR4
Question Answering	NewsQA	EM	81.44	OpenAI/o1-2024-12-17-high
Question Answering	NewsQA	F1	88.72	OpenAI/o1-2024-12-17-high
Emotion Recognition	IEMOCAP-4	Weighted F1	74.1	bc-LSTM
Object Detection	COCO (Common Objects in Context)	box AP	57.1	D-FINE-L+
Object Detection	GRAZPEDWRI-DX	Fracture Sensitivity	91	YOLOv5s
Object Detection	GRAZPEDWRI-DX	Fracture Sensitivity	89	YOLOv6s
Image Classification	CUB-200-2011	Accuracy	91.8	IELT
Face Reconstruction	!(()&&!\|\|\|	0L	100	nyenye
Facial Expression Recognition (FER)	!(()&&!\|\|\|	0L	100	nyenye
3D	COCO (Common Objects in Context)	box AP	57.1	D-FINE-L+
3D	GRAZPEDWRI-DX	Fracture Sensitivity	91	YOLOv5s
3D	GRAZPEDWRI-DX	Fracture Sensitivity	89	YOLOv6s
3D	1	0L	99	STYLE
3D	T$^3$Bench	Avg	43.3	ProlificDreamer
3D	!(()&&!\|\|\|	0L	100	nyenye
3D	FaceWarehouse	0..5sec	1	face
DeepFake Detection	1	0L	99	STYLE
Fine-Grained Image Classification	CUB-200-2011	Accuracy	91.8	IELT
3D Face Modelling	!(()&&!\|\|\|	0L	100	nyenye
Contrastive Learning	10,000 People - Human Pose Recognition Data	0..5sec	1	1
3D Face Reconstruction	!(()&&!\|\|\|	0L	100	nyenye
Unsupervised Domain Adaptation	Office-Home	Average Accuracy	71.4	DisClusterDA
2D Classification	COCO (Common Objects in Context)	box AP	57.1	D-FINE-L+
2D Classification	GRAZPEDWRI-DX	Fracture Sensitivity	91	YOLOv5s
2D Classification	GRAZPEDWRI-DX	Fracture Sensitivity	89	YOLOv6s
2D Object Detection	COCO (Common Objects in Context)	box AP	57.1	D-FINE-L+
2D Object Detection	GRAZPEDWRI-DX	Fracture Sensitivity	91	YOLOv5s
2D Object Detection	GRAZPEDWRI-DX	Fracture Sensitivity	89	YOLOv6s
Robot Manipulation	The COLOSSEUM	Average decrease average across all perturbations	-14.5	RVT
Text to Image Generation	T$^3$Bench	Avg	43.3	ProlificDreamer
Text to 3D	T$^3$Bench	Avg	43.3	ProlificDreamer
Multimodal Emotion Recognition	IEMOCAP-4	Weighted F1	74.1	bc-LSTM
10-shot image generation	FlyingThings3D	0..5sec	1	1
3D Shape Reconstruction from Videos	1	0L	99	STYLE
16k	COCO (Common Objects in Context)	box AP	57.1	D-FINE-L+
16k	GRAZPEDWRI-DX	Fracture Sensitivity	91	YOLOv5s
16k	GRAZPEDWRI-DX	Fracture Sensitivity	89	YOLOv6s

0/1 Deep Neural Networks via Block Coordinate Descent

Abstract

Results

Related Papers

0/1 Deep Neural Networks via Block Coordinate Descent

Abstract

Results

Related Papers