Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

A Ghost Module is an image block for convolutional neural network that aims to generate more features by using fewer parameters. Specifically, an ordinary convolutional layer in deep neural networks is split into two parts. The first part involves ordinary convolutions but their total number is controlled. Given the intrinsic feature maps from the first part, a series of simple linear operations are applied for generating more feature maps.

Given the widely existing redundancy in intermediate feature maps calculated by mainstream CNNs, ghost modules aim to reduce them. In practice, given the input data $X\in\mathbb{R}^{c\times h\times w}$ , where $c$ is the number of input channels and $h$ and $w$ are the height and width of the input data, respectively, the operation of an arbitrary convolutional layer for producing $n$ feature maps can be formulated as

Y = X*f+b,

where $*$ is the convolution operation, $b$ is the bias term, $Y\in\mathbb{R}^{h'\times w'\times n}$ is the output feature map with $n$ channels, and $f\in\mathbb{R}^{c\times k\times k \times n}$ is the convolution filters in this layer. In addition, $h'$ and $w'$ are the height and width of the output data, and $k\times k$ is the kernel size of convolution filters $f$ , respectively. During this convolution procedure, the required number of FLOPs can be calculated as $n\cdot h'\cdot w'\cdot c\cdot k\cdot k$ , which is often as large as hundreds of thousands since the number of filters $n$ and the channel number $c$ are generally very large (e.g. 256 or 512).

Here, the number of parameters (in $f$ and $b$ ) to be optimized is explicitly determined by the dimensions of input and output feature maps. The output feature maps of convolutional layers often contain much redundancy, and some of them could be similar with each other. We point out that it is unnecessary to generate these redundant feature maps one by one with large number of FLOPs and parameters. Suppose that the output feature maps are ghosts of a handful of intrinsic feature maps with some cheap transformations. These intrinsic feature maps are often of smaller size and produced by ordinary convolution filters. Specifically, $m$ intrinsic feature maps $Y'\in\mathbb{R}^{h'\times w'\times m}$ are generated using a primary convolution:

Y' = X*f',

where $f'\in\mathbb{R}^{c\times k\times k \times m}$ is the utilized filters, $m\leq n$ and the bias term is omitted for simplicity. The hyper-parameters such as filter size, stride, padding, are the same as those in the ordinary convolution to keep the spatial size (ie $h'$ and $w'$ ) of the output feature maps consistent. To further obtain the desired $n$ feature maps, we apply a series of cheap linear operations on each intrinsic feature in $Y'$ to generate $s$ ghost features according to the following function:

y_{ij} = \Phi_{i,j}(y'_i),\quad \forall\; i = 1,...,m,\;\; j = 1,...,s,

where $y'\_i$ is the $i$ -th intrinsic feature map in $Y'$ , $\Phi\_{i,j}$ in the above function is the $j$ -th (except the last one) linear operation for generating the $j$ -th ghost feature map $y_{ij}$ , that is to say, $y'\_i$ can have one or more ghost feature maps $\{y\_{ij}\}\_{j=1}^{s}$ . The last $\Phi\_{i,s}$ is the identity mapping for preserving the intrinsic feature maps. we can obtain $n=m\cdot s$ feature maps $Y=[y\_{11},y\_{12},\cdots,y\_{ms}]$ as the output data of a Ghost module. Note that the linear operations $\Phi$ operate on each channel whose computational cost is much less than the ordinary convolution. In practice, there could be several different linear operations in a Ghost module, eg $3\times 3$ and $5\times5$ linear kernels, which will be analyzed in the experiment part.

Description

Y = X*f+b,

Y' = X*f',

y_{ij} = \Phi_{i,j}(y'_i),\quad \forall\; i = 1,...,m,\;\; j = 1,...,s,

Ghost Module

Description

Papers Using This Method

Ghost Module

Description

Papers Using This Method