HUI ZHANG, Shenglong Zhou, Geoffrey Ye Li, Naihua Xiu
The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs). As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for several decades. Even if there is an impressive body of work on designing DNNs with continuous activation functions that can be deemed as surrogates of the step function, it is still in the possession of some advantageous properties, such as complete robustness to outliers and being capable of attaining the best learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we aim to train DNNs with the step function used as an activation function (dubbed as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization problem and then solve it by a block coordinate descend (BCD) method. Moreover, we acquire closed-form solutions for sub-problems of BCD as well as its convergence properties. Furthermore, we also integrate $\ell_{2,0}$-regularization into 0/1 DNN to accelerate the training process and compress the network scale. As a result, the proposed algorithm has a high performance on classifying MNIST and Fashion-MNIST datasets. As a result, the proposed algorithm has a desirable performance on classifying MNIST, FashionMNIST, Cifar10, and Cifar100 datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Facial Recognition and Modelling | !(()&&!|*|*| | 0L | 100 | nyenye |
| Domain Adaptation | Office-Home | Average Accuracy | 71.4 | DisClusterDA |
| Image Enhancement | LOL | BSQ-rate over MS-SSIM | 0.2 | rr |
| 3D Reconstruction | 1 | 0L | 99 | STYLE |
| Question Answering | MultiTQ | Hits@1 | 72.8 | TimeR4 |
| Question Answering | NewsQA | EM | 81.44 | OpenAI/o1-2024-12-17-high |
| Question Answering | NewsQA | F1 | 88.72 | OpenAI/o1-2024-12-17-high |
| Emotion Recognition | IEMOCAP-4 | Weighted F1 | 74.1 | bc-LSTM |
| Object Detection | COCO (Common Objects in Context) | box AP | 57.1 | D-FINE-L+ |
| Object Detection | GRAZPEDWRI-DX | Fracture Sensitivity | 91 | YOLOv5s |
| Object Detection | GRAZPEDWRI-DX | Fracture Sensitivity | 89 | YOLOv6s |
| Image Classification | CUB-200-2011 | Accuracy | 91.8 | IELT |
| Face Reconstruction | !(()&&!|*|*| | 0L | 100 | nyenye |
| Facial Expression Recognition (FER) | !(()&&!|*|*| | 0L | 100 | nyenye |
| 3D | COCO (Common Objects in Context) | box AP | 57.1 | D-FINE-L+ |
| 3D | GRAZPEDWRI-DX | Fracture Sensitivity | 91 | YOLOv5s |
| 3D | GRAZPEDWRI-DX | Fracture Sensitivity | 89 | YOLOv6s |
| 3D | 1 | 0L | 99 | STYLE |
| 3D | T$^3$Bench | Avg | 43.3 | ProlificDreamer |
| 3D | !(()&&!|*|*| | 0L | 100 | nyenye |
| 3D | FaceWarehouse | 0..5sec | 1 | face |
| DeepFake Detection | 1 | 0L | 99 | STYLE |
| Fine-Grained Image Classification | CUB-200-2011 | Accuracy | 91.8 | IELT |
| 3D Face Modelling | !(()&&!|*|*| | 0L | 100 | nyenye |
| Contrastive Learning | 10,000 People - Human Pose Recognition Data | 0..5sec | 1 | 1 |
| 3D Face Reconstruction | !(()&&!|*|*| | 0L | 100 | nyenye |
| Unsupervised Domain Adaptation | Office-Home | Average Accuracy | 71.4 | DisClusterDA |
| 2D Classification | COCO (Common Objects in Context) | box AP | 57.1 | D-FINE-L+ |
| 2D Classification | GRAZPEDWRI-DX | Fracture Sensitivity | 91 | YOLOv5s |
| 2D Classification | GRAZPEDWRI-DX | Fracture Sensitivity | 89 | YOLOv6s |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 57.1 | D-FINE-L+ |
| 2D Object Detection | GRAZPEDWRI-DX | Fracture Sensitivity | 91 | YOLOv5s |
| 2D Object Detection | GRAZPEDWRI-DX | Fracture Sensitivity | 89 | YOLOv6s |
| Robot Manipulation | The COLOSSEUM | Average decrease average across all perturbations | -14.5 | RVT |
| Text to Image Generation | T$^3$Bench | Avg | 43.3 | ProlificDreamer |
| Text to 3D | T$^3$Bench | Avg | 43.3 | ProlificDreamer |
| Multimodal Emotion Recognition | IEMOCAP-4 | Weighted F1 | 74.1 | bc-LSTM |
| 10-shot image generation | FlyingThings3D | 0..5sec | 1 | 1 |
| 3D Shape Reconstruction from Videos | 1 | 0L | 99 | STYLE |
| 16k | COCO (Common Objects in Context) | box AP | 57.1 | D-FINE-L+ |
| 16k | GRAZPEDWRI-DX | Fracture Sensitivity | 91 | YOLOv5s |
| 16k | GRAZPEDWRI-DX | Fracture Sensitivity | 89 | YOLOv6s |