Shreyas S. Shivakumar, Neil Rodrigues, Alex Zhou, Ian D. Miller, Vijay Kumar, Camillo J. Taylor
In this work we propose long wave infrared (LWIR) imagery as a viable supporting modality for semantic segmentation using learning-based techniques. We first address the problem of RGB-thermal camera calibration by proposing a passive calibration target and procedure that is both portable and easy to use. Second, we present PST900, a dataset of 894 synchronized and calibrated RGB and Thermal image pairs with per pixel human annotations across four distinct classes from the DARPA Subterranean Challenge. Lastly, we propose a CNN architecture for fast semantic segmentation that combines both RGB and Thermal imagery in a way that leverages RGB imagery independently. We compare our method against the state-of-the-art and show that our method outperforms them in our dataset.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | PST900 | mIoU | 68.4 | PSTNet |
| Semantic Segmentation | MFN Dataset | mIOU | 48.4 | PST900 |
| Scene Segmentation | PST900 | mIoU | 68.4 | PSTNet |
| Scene Segmentation | MFN Dataset | mIOU | 48.4 | PST900 |
| 2D Object Detection | PST900 | mIoU | 68.4 | PSTNet |
| 2D Object Detection | MFN Dataset | mIOU | 48.4 | PST900 |
| 10-shot image generation | PST900 | mIoU | 68.4 | PSTNet |
| 10-shot image generation | MFN Dataset | mIOU | 48.4 | PST900 |