FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

Hongje Seong, Junhyuk Hyun, Euntai Kim

2019-07-17Scene Recognition

Abstract

Scene recognition is an image recognition problem aimed at predicting the category of the place at which the image is taken. In this paper, a new scene recognition method using the convolutional neural network (CNN) is proposed. The proposed method is based on the fusion of the object and the scene information in the given image and the CNN framework is named as FOS (fusion of object and scene) Net. In addition, a new loss named scene coherence loss (SCL) is developed to train the FOSNet and to improve the scene recognition performance. The proposed SCL is based on the unique traits of the scene that the 'sceneness' spreads and the scene class does not change all over the image. The proposed FOSNet was experimented with three most popular scene recognition datasets, and their state-of-the-art performance is obtained in two sets: 60.14% on Places 2 and 90.37% on MIT indoor 67. The second highest performance of 77.28% is obtained on SUN 397.

Results

Task	Dataset	Metric	Value	Model
Scene Parsing	MIT Indoor Scenes	Accuracy	90.3	FOSNet
Scene Parsing	Places365	Top 1 Accuracy	60.14	FOSNet
Scene Parsing	Places365	Top 5 Accuracy	88.86	FOSNet
Scene Parsing	SUN397	Accuracy	77.28	FOSNet
Animation	MIT Indoor Scenes	Accuracy	90.3	FOSNet
Animation	Places365	Top 1 Accuracy	60.14	FOSNet
Animation	Places365	Top 5 Accuracy	88.86	FOSNet
Animation	SUN397	Accuracy	77.28	FOSNet
3D Character Animation From A Single Photo	MIT Indoor Scenes	Accuracy	90.3	FOSNet
3D Character Animation From A Single Photo	Places365	Top 1 Accuracy	60.14	FOSNet
3D Character Animation From A Single Photo	Places365	Top 5 Accuracy	88.86	FOSNet
3D Character Animation From A Single Photo	SUN397	Accuracy	77.28	FOSNet
2D Semantic Segmentation	MIT Indoor Scenes	Accuracy	90.3	FOSNet
2D Semantic Segmentation	Places365	Top 1 Accuracy	60.14	FOSNet
2D Semantic Segmentation	Places365	Top 5 Accuracy	88.86	FOSNet
2D Semantic Segmentation	SUN397	Accuracy	77.28	FOSNet

FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

Abstract

Results

Related Papers

FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

Abstract

Results

Related Papers