Mohammad Rezaei, Farnaz Farahanipad, Alex Dillhoff, Vassilis Athitsos
Despite the significant progress that depth-based 3D hand pose estimation methods have made in recent years, they still require a large amount of labeled training data to achieve high accuracy. However, collecting such data is both costly and time-consuming. To tackle this issue, we propose a semi-supervised method to significantly reduce the dependence on labeled training data. The proposed method consists of two identical networks trained jointly: a teacher network and a student network. The teacher network is trained using both the available labeled and unlabeled samples. It leverages the unlabeled samples via a loss formulation that encourages estimation equivariance under a set of affine transformations. The student network is trained using the unlabeled samples with their pseudo-labels provided by the teacher network. For inference at test time, only the student network is used. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art semi-supervised methods by large margins.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Hand | MSRA Hands | Average 3D Error | 7.18 | Teacher-Student |
| Hand | ICVL Hands | Average 3D Error | 5.99 | Teacher-Student |
| Hand | NYU Hands | Average 3D Error | 8.01 | Teacher-Student |
| Pose Estimation | MSRA Hands | Average 3D Error | 7.18 | Teacher-Student |
| Pose Estimation | ICVL Hands | Average 3D Error | 5.99 | Teacher-Student |
| Pose Estimation | NYU Hands | Average 3D Error | 8.01 | Teacher-Student |
| Hand Pose Estimation | MSRA Hands | Average 3D Error | 7.18 | Teacher-Student |
| Hand Pose Estimation | ICVL Hands | Average 3D Error | 5.99 | Teacher-Student |
| Hand Pose Estimation | NYU Hands | Average 3D Error | 8.01 | Teacher-Student |
| 3D | MSRA Hands | Average 3D Error | 7.18 | Teacher-Student |
| 3D | ICVL Hands | Average 3D Error | 5.99 | Teacher-Student |
| 3D | NYU Hands | Average 3D Error | 8.01 | Teacher-Student |
| 1 Image, 2*2 Stitchi | MSRA Hands | Average 3D Error | 7.18 | Teacher-Student |
| 1 Image, 2*2 Stitchi | ICVL Hands | Average 3D Error | 5.99 | Teacher-Student |
| 1 Image, 2*2 Stitchi | NYU Hands | Average 3D Error | 8.01 | Teacher-Student |