TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/FOSNet: An End-to-End Trainable Deep Neural Network for Sc...

FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

Hongje Seong, Junhyuk Hyun, Euntai Kim

2019-07-17Scene Recognition
PaperPDF

Abstract

Scene recognition is an image recognition problem aimed at predicting the category of the place at which the image is taken. In this paper, a new scene recognition method using the convolutional neural network (CNN) is proposed. The proposed method is based on the fusion of the object and the scene information in the given image and the CNN framework is named as FOS (fusion of object and scene) Net. In addition, a new loss named scene coherence loss (SCL) is developed to train the FOSNet and to improve the scene recognition performance. The proposed SCL is based on the unique traits of the scene that the 'sceneness' spreads and the scene class does not change all over the image. The proposed FOSNet was experimented with three most popular scene recognition datasets, and their state-of-the-art performance is obtained in two sets: 60.14% on Places 2 and 90.37% on MIT indoor 67. The second highest performance of 77.28% is obtained on SUN 397.

Results

TaskDatasetMetricValueModel
Scene ParsingMIT Indoor ScenesAccuracy90.3FOSNet
Scene ParsingPlaces365Top 1 Accuracy60.14FOSNet
Scene ParsingPlaces365Top 5 Accuracy88.86FOSNet
Scene ParsingSUN397Accuracy77.28FOSNet
AnimationMIT Indoor ScenesAccuracy90.3FOSNet
AnimationPlaces365Top 1 Accuracy60.14FOSNet
AnimationPlaces365Top 5 Accuracy88.86FOSNet
AnimationSUN397Accuracy77.28FOSNet
3D Character Animation From A Single PhotoMIT Indoor ScenesAccuracy90.3FOSNet
3D Character Animation From A Single PhotoPlaces365Top 1 Accuracy60.14FOSNet
3D Character Animation From A Single PhotoPlaces365Top 5 Accuracy88.86FOSNet
3D Character Animation From A Single PhotoSUN397Accuracy77.28FOSNet
2D Semantic SegmentationMIT Indoor ScenesAccuracy90.3FOSNet
2D Semantic SegmentationPlaces365Top 1 Accuracy60.14FOSNet
2D Semantic SegmentationPlaces365Top 5 Accuracy88.86FOSNet
2D Semantic SegmentationSUN397Accuracy77.28FOSNet

Related Papers

Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments2025-03-29Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition2025-03-10Contrastive Visual Data Augmentation2025-02-24Advancing ALS Applications with Large-Scale Pre-training: Dataset Development and Downstream Assessment2025-01-09Seeing with Partial Certainty: Conformal Prediction for Robotic Scene Recognition in Built Environments2025-01-09Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding2025-01-09SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining2025-01-01Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues2025-01-01