Lahav Lipson, Zachary Teed, Jia Deng
We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT. We introduce multi-level convolutional GRUs, which more efficiently propagate information across the image. A modified version of RAFT-Stereo can perform accurate real-time inference. RAFT-stereo ranks first on the Middlebury leaderboard, outperforming the next best method on 1px error by 29% and outperforms all published work on the ETH3D two-view stereo benchmark. Code is available at https://github.com/princeton-vl/RAFT-Stereo.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Depth Estimation | Spring | 1px total | 15.273 | RAFT-Stereo |
| 3D | Spring | 1px total | 15.273 | RAFT-Stereo |
| Stereo Disparity Estimation | Middlebury 2014 | D1 Error (2px) | 4.74 | RAFT-Stereo |
| Stereo Depth Estimation | Spring | 1px total | 15.273 | RAFT-Stereo |