TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Mix-FFN

Mix-FFN

GeneralIntroduced 200047 papers
Source Paper

Description

Mix-FFN is a feedforward layer used in the SegFormer architecture. ViT uses positional encoding (PE) to introduce the location information. However, the resolution of PE\mathrm{PE}PE is fixed. Therefore, when the test resolution is different from the training one, the positional code needs to be interpolated and this often leads to dropped accuracy. To alleviate this problem, CPVT uses 3×33 \times 33×3 Conv together with the PE to implement a data-driven PE. The authors of Mix-FFN argue that positional encoding is actually not necessary for semantic segmentation. Instead, they use Mix-FFN which considers the effect of zero padding to leak location information, by directly using a 3×33 \times 33×3 Conv in the feed-forward network (FFN). Mix-FFN can be formulated as:

x_out =MLP⁡(GELU⁡(Conv⁡_3×3(MLP⁡(x_in))))+x_in\mathbf{x}\_{\text {out }}=\operatorname{MLP}\left(\operatorname{GELU}\left(\operatorname{Conv}\_{3 \times 3}\left(\operatorname{MLP}\left(\mathbf{x}\_{i n}\right)\right)\right)\right)+\mathbf{x}\_{i n}x_out =MLP(GELU(Conv_3×3(MLP(x_in))))+x_in

where x_in\mathbf{x}\_{i n}x_in is the feature from a self-attention module. Mix-FFN mixes a 3×33 \times 33×3 convolution and an MLP into each FFN.

Papers Using This Method

From Pixels to Damage Severity: Estimating Earthquake Impacts Using Semantic Segmentation of Social Media Images2025-07-03Leveraging Modified Ex Situ Tomography Data for Segmentation of In Situ Synchrotron X-Ray Computed Tomography2025-04-27SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling2025-04-17Improving underwater semantic segmentation with underwater image quality attention and muti-scale aggregation attention2025-03-30Comprehensive Evaluation of OCT-based Automated Segmentation of Retinal Layer, Fluid and Hyper-Reflective Foci: Impact on Diabetic Retinopathy Severity Assessment2025-03-03Cross-Model Transferability of Adversarial Patches in Real-time Segmentation for Autonomous Driving2025-02-22Global Average Feature Augmentation for Robust Semantic Segmentation with Transformers2024-12-02SynDiff-AD: Improving Semantic Segmentation and End-to-End Autonomous Driving with Synthetic Data from Latent Diffusion Models2024-11-25Habaek: High-performance water segmentation through dataset expansion and inductive bias optimization2024-10-21Risk Assessment for Autonomous Landing in Urban Environments using Semantic Segmentation2024-10-16Adapting Segment Anything Model to Melanoma Segmentation in Microscopy Slide Images2024-10-03Semantic Segmentation of Unmanned Aerial Vehicle Remote Sensing Images using SegFormer2024-10-01AMBER -- Advanced SegFormer for Multi-Band Image Segmentation: an application to Hyperspectral Imaging2024-09-14SHARP-Net: A Refined Pyramid Network for Deficiency Segmentation in Culverts and Sewer Pipes2024-08-02MagicBathyNet: A Multimodal Remote Sensing Dataset for Bathymetry Prediction and Pixel-based Classification in Shallow Waters2024-05-24Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation2024-05-23wmh_seg: Transformer based U-Net for Robust and Automatic White Matter Hyperintensity Segmentation across 1.5T, 3T and 7T2024-02-20Kitchen Food Waste Image Segmentation and Classification for Compost Nutrients Estimation2024-01-26CaBuAr: California Burned Areas dataset for delineation2024-01-21U-MixFormer: UNet-like Transformer with Mix-Attention for Efficient Semantic Segmentation2023-12-11