Semantic-Aware Scene Recognition

Alejandro López-Cifuentes, Marcos Escudero-Viñolo, Jesús Bescós, Álvaro García-Martín

2019-09-05Scene Classification Scene Recognition Semantic Segmentation

Abstract

Scene recognition is currently one of the top-challenging research fields in computer vision. This may be due to the ambiguity between classes: images of several scene classes may share similar objects, which causes confusion among them. The problem is aggravated when images of a particular scene class are notably different. Convolutional Neural Networks (CNNs) have significantly boosted performance in scene recognition, albeit it is still far below from other recognition tasks (e.g., object or image recognition). In this paper, we describe a novel approach for scene recognition based on an end-to-end multi-modal CNN that combines image and context information by means of an attention module. Context information, in the shape of semantic segmentation, is used to gate features extracted from the RGB image by leveraging on information encoded in the semantic representation: the set of scene objects and stuff, and their relative locations. This gating process reinforces the learning of indicative scene content and enhances scene disambiguation by refocusing the receptive fields of the CNN towards them. Experimental results on four publicly available datasets show that the proposed approach outperforms every other state-of-the-art method while significantly reducing the number of network parameters. All the code and data used along this paper is available at https://github.com/vpulab/Semantic-Aware-Scene-Recognition

Results

Task	Dataset	Metric	Value	Model
Scene Parsing	MIT Indoor Scenes	Accuracy	87.1	Semantic-Aware Scene Recognition (ResNet-50)
Scene Parsing	Places365	Top 1 Accuracy	56.51	Semantic-Aware Scene Recognition (ResNet-18)
Scene Parsing	Places365	Top 5 Accuracy	86	Semantic-Aware Scene Recognition (ResNet-18)
Scene Parsing	ADE20K	Top 1 Accuracy	62.55	Semantic-Aware Scene Recogniton (ResNet-18)
Scene Parsing	SUN397	Accuracy	74.04	Semantic-Aware Scene Recognition (ResNet-50)
Animation	MIT Indoor Scenes	Accuracy	87.1	Semantic-Aware Scene Recognition (ResNet-50)
Animation	Places365	Top 1 Accuracy	56.51	Semantic-Aware Scene Recognition (ResNet-18)
Animation	Places365	Top 5 Accuracy	86	Semantic-Aware Scene Recognition (ResNet-18)
Animation	ADE20K	Top 1 Accuracy	62.55	Semantic-Aware Scene Recogniton (ResNet-18)
Animation	SUN397	Accuracy	74.04	Semantic-Aware Scene Recognition (ResNet-50)
3D Character Animation From A Single Photo	MIT Indoor Scenes	Accuracy	87.1	Semantic-Aware Scene Recognition (ResNet-50)
3D Character Animation From A Single Photo	Places365	Top 1 Accuracy	56.51	Semantic-Aware Scene Recognition (ResNet-18)
3D Character Animation From A Single Photo	Places365	Top 5 Accuracy	86	Semantic-Aware Scene Recognition (ResNet-18)
3D Character Animation From A Single Photo	ADE20K	Top 1 Accuracy	62.55	Semantic-Aware Scene Recogniton (ResNet-18)
3D Character Animation From A Single Photo	SUN397	Accuracy	74.04	Semantic-Aware Scene Recognition (ResNet-50)
2D Semantic Segmentation	MIT Indoor Scenes	Accuracy	87.1	Semantic-Aware Scene Recognition (ResNet-50)
2D Semantic Segmentation	Places365	Top 1 Accuracy	56.51	Semantic-Aware Scene Recognition (ResNet-18)
2D Semantic Segmentation	Places365	Top 5 Accuracy	86	Semantic-Aware Scene Recognition (ResNet-18)
2D Semantic Segmentation	ADE20K	Top 1 Accuracy	62.55	Semantic-Aware Scene Recogniton (ResNet-18)
2D Semantic Segmentation	SUN397	Accuracy	74.04	Semantic-Aware Scene Recognition (ResNet-50)

Abstract

Results

Task	Dataset	Metric	Value	Model
Scene Parsing	MIT Indoor Scenes	Accuracy	87.1	Semantic-Aware Scene Recognition (ResNet-50)
Scene Parsing	Places365	Top 1 Accuracy	56.51	Semantic-Aware Scene Recognition (ResNet-18)
Scene Parsing	Places365	Top 5 Accuracy	86	Semantic-Aware Scene Recognition (ResNet-18)
Scene Parsing	ADE20K	Top 1 Accuracy	62.55	Semantic-Aware Scene Recogniton (ResNet-18)
Scene Parsing	SUN397	Accuracy	74.04	Semantic-Aware Scene Recognition (ResNet-50)
Animation	MIT Indoor Scenes	Accuracy	87.1	Semantic-Aware Scene Recognition (ResNet-50)
Animation	Places365	Top 1 Accuracy	56.51	Semantic-Aware Scene Recognition (ResNet-18)
Animation	Places365	Top 5 Accuracy	86	Semantic-Aware Scene Recognition (ResNet-18)
Animation	ADE20K	Top 1 Accuracy	62.55	Semantic-Aware Scene Recogniton (ResNet-18)
Animation	SUN397	Accuracy	74.04	Semantic-Aware Scene Recognition (ResNet-50)
3D Character Animation From A Single Photo	MIT Indoor Scenes	Accuracy	87.1	Semantic-Aware Scene Recognition (ResNet-50)
3D Character Animation From A Single Photo	Places365	Top 1 Accuracy	56.51	Semantic-Aware Scene Recognition (ResNet-18)
3D Character Animation From A Single Photo	Places365	Top 5 Accuracy	86	Semantic-Aware Scene Recognition (ResNet-18)
3D Character Animation From A Single Photo	ADE20K	Top 1 Accuracy	62.55	Semantic-Aware Scene Recogniton (ResNet-18)
3D Character Animation From A Single Photo	SUN397	Accuracy	74.04	Semantic-Aware Scene Recognition (ResNet-50)
2D Semantic Segmentation	MIT Indoor Scenes	Accuracy	87.1	Semantic-Aware Scene Recognition (ResNet-50)
2D Semantic Segmentation	Places365	Top 1 Accuracy	56.51	Semantic-Aware Scene Recognition (ResNet-18)
2D Semantic Segmentation	Places365	Top 5 Accuracy	86	Semantic-Aware Scene Recognition (ResNet-18)
2D Semantic Segmentation	ADE20K	Top 1 Accuracy	62.55	Semantic-Aware Scene Recogniton (ResNet-18)
2D Semantic Segmentation	SUN397	Accuracy	74.04	Semantic-Aware Scene Recognition (ResNet-50)

Semantic-Aware Scene Recognition

Abstract

Results

Related Papers

Semantic-Aware Scene Recognition

Abstract

Results

Related Papers