Chengde Wan, Thomas Probst, Luc van Gool, Angela Yao
We present a simple and effective method for 3D hand pose estimation from a single depth frame. As opposed to previous state-of-the-art methods based on holistic 3D regression, our method works on dense pixel-wise estimation. This is achieved by careful design choices in pose parameterization, which leverages both 2D and 3D properties of depth map. Specifically, we decompose the pose parameters into a set of per-pixel estimations, i.e., 2D heat maps, 3D heat maps and unit 3D directional vector fields. The 2D/3D joint heat maps and 3D joint offsets are estimated via multi-task network cascades, which is trained end-to-end. The pixel-wise estimations can be directly translated into a vote casting scheme. A variant of mean shift is then used to aggregate local votes while enforcing consensus between the the estimated 3D pose and the pixel-wise 2D and 3D estimations by design. Our method is efficient and highly accurate. On MSRA and NYU hand dataset, our method outperforms all previous state-of-the-art approaches by a large margin. On the ICVL hand dataset, our method achieves similar accuracy compared to the currently proposed nearly saturated result and outperforms various other proposed methods. Code is available $\href{"https://github.com/melonwan/denseReg"}{\text{online}}$.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Hand | MSRA Hands | Average 3D Error | 7.2 | Dense Pixel-wise Estimation |
| Hand | ICVL Hands | Average 3D Error | 7.3 | Dense Pixel-wise Estimation |
| Hand | NYU Hands | Average 3D Error | 10.2 | Dense Pixel-wise Estimation |
| Pose Estimation | MSRA Hands | Average 3D Error | 7.2 | Dense Pixel-wise Estimation |
| Pose Estimation | ICVL Hands | Average 3D Error | 7.3 | Dense Pixel-wise Estimation |
| Pose Estimation | NYU Hands | Average 3D Error | 10.2 | Dense Pixel-wise Estimation |
| Hand Pose Estimation | MSRA Hands | Average 3D Error | 7.2 | Dense Pixel-wise Estimation |
| Hand Pose Estimation | ICVL Hands | Average 3D Error | 7.3 | Dense Pixel-wise Estimation |
| Hand Pose Estimation | NYU Hands | Average 3D Error | 10.2 | Dense Pixel-wise Estimation |
| 3D | MSRA Hands | Average 3D Error | 7.2 | Dense Pixel-wise Estimation |
| 3D | ICVL Hands | Average 3D Error | 7.3 | Dense Pixel-wise Estimation |
| 3D | NYU Hands | Average 3D Error | 10.2 | Dense Pixel-wise Estimation |
| 1 Image, 2*2 Stitchi | MSRA Hands | Average 3D Error | 7.2 | Dense Pixel-wise Estimation |
| 1 Image, 2*2 Stitchi | ICVL Hands | Average 3D Error | 7.3 | Dense Pixel-wise Estimation |
| 1 Image, 2*2 Stitchi | NYU Hands | Average 3D Error | 10.2 | Dense Pixel-wise Estimation |