TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Scale Aggregation Block

Scale Aggregation Block

Computer VisionIntroduced 20007 papers
Source Paper

Description

A Scale Aggregation Block concatenates feature maps at a wide range of scales. Feature maps for each scale are generated by a stack of downsampling, convolution and upsampling operations. The proposed scale aggregation block is a standard computational module which readily replaces any given transformation Y=T(X)\mathbf{Y}=\mathbf{T}(\mathbf{X})Y=T(X), where X∈RH×W×C\mathbf{X}\in \mathbb{R}^{H\times W\times C}X∈RH×W×C, Y∈RH×W×Co\mathbf{Y}\in \mathbb{R}^{H\times W\times C_o}Y∈RH×W×Co​ with CCC and CoC_oCo​ being the input and output channel number respectively. T\mathbf{T}T is any operator such as a convolution layer or a series of convolution layers. Assume we have LLL scales. Each scale lll is generated by sequentially conducting a downsampling Dl\mathbf{D}_lDl​, a transformation Tl\mathbf{T}_lTl​ and an unsampling operator Ul\mathbf{U}_lUl​:

Xl′=Dl(X),\labeleq:eqd\mathbf{X}^{'}_l=\mathbf{D}_l(\mathbf{X}), \label{eq:eq_d}Xl′​=Dl​(X),\labeleq:eqd​ Yl′=Tl(Xl′),\labeleq:eqtl\mathbf{Y}^{'}_l=\mathbf{T}_l(\mathbf{X}^{'}_l), \label{eq:eq_tl}Yl′​=Tl​(Xl′​),\labeleq:eqt​l Yl=Ul(Yl′),\labeleq:equ\mathbf{Y}_l=\mathbf{U}_l(\mathbf{Y}^{'}_l), \label{eq:eq_u}Yl​=Ul​(Yl′​),\labeleq:equ​

where Xl′∈RHl×Wl×C\mathbf{X}^{'}_l\in \mathbb{R}^{H_l\times W_l\times C}Xl′​∈RHl​×Wl​×C, Yl′∈RHl×Wl×Cl\mathbf{Y}^{'}_l\in \mathbb{R}^{H_l\times W_l\times C_l}Yl′​∈RHl​×Wl​×Cl​, and Yl∈RH×W×Cl\mathbf{Y}_l\in \mathbb{R}^{H\times W\times C_l}Yl​∈RH×W×Cl​. Notably, Tl\mathbf{T}_lTl​ has the similar structure as T\mathbf{T}T. We can concatenate all LLL scales together, getting

Y′=∥1LUl(Tl(Dl(X))),\labeleq:eqall\mathbf{Y}^{'}=\Vert^L_1\mathbf{U}_l(\mathbf{T}_l(\mathbf{D}_l(\mathbf{X}))), \label{eq:eq_all}Y′=∥1L​Ul​(Tl​(Dl​(X))),\labeleq:eqa​ll

where ∥\Vert∥ indicates concatenating feature maps along the channel dimension, and Y′∈RH×W×∑1LCl\mathbf{Y}^{'} \in \mathbb{R}^{H\times W\times \sum^L_1 C_l}Y′∈RH×W×∑1L​Cl​ is the final output feature maps of the scale aggregation block.

In the reference implementation, the downsampling Dl\mathbf{D}_lDl​ with factor sss is implemented by a max pool layer with s×ss\times ss×s kernel size and sss stride. The upsampling Ul\mathbf{U}_lUl​ is implemented by resizing with the nearest neighbor interpolation.

Papers Using This Method

Scale Invariance of Graph Neural Networks2024-11-28ScaleNet: Scale Invariance Learning in Directed Graphs2024-11-13ScaleNet: An Unsupervised Representation Learning Method for Limited Information2023-10-03ScaleNet: Searching for the Model to Scale2022-07-15ScaleNet: A Shallow Architecture for Scale Estimation2021-12-09ScaleNAS: One-Shot Learning of Scale-Aware Representations for Visual Recognition2020-11-30Data-Driven Neuron Allocation for Scale Aggregation Networks2019-04-20