Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

Based on the theoretical analyses in RAN paper, a novel multi-scale backbone structure is designed in the paper. This structure enables the network to efficiently predict motion patterns with larger separable upper bounds by using optimized dilation convolution on high-resolution feature maps, while maintaining a capturable range of motion with low computational complexity.

To quantify the network's capacity for large deformation capturing, the accessible motion capture range is defined as:

Definition 1: Accessible Motion Range

The radius of capture range of the $k^{\text{th}}$ -level registration by the registration module $\mathcal{R}_k$ is defined as the smallest upper bound of its accessible Deformation Displacement Field:

a_k := \min_{\mathbf{x}}(\{\sup(\|\varphi_{k}[\mathbf{x}]\|_{\infty})\})

where $\|\cdot\|_{\infty}$ denotes the L- $\infty$ norm of a vector, $\sup(\cdot)$ denotes the supremum or the maximum value of a given function with varying inputs and trainable weights of networks, and $\mathbf{x}$ denotes one coordinate entry of the images or Deformation Displacement Fields.

To quantify the Degree-of-Freedom limitation in the discontinuity of the estimated Deformation Displacement Field, we define the separability of the predicted motion:

Definition 2: Separability Bottleneck of Predicted Motion

The motion separability bottleneck is defined as the minimum value of the upper bound of the Chebyshev difference of a network's predicted DDF $\phi$ between two locations $\mathbf{x}, \mathbf{y} \in \mathbb{Z}^d$ with the specific Chebyshev distance $p \in \mathbb{Z}^d$ :

\Delta_\infty(p) := \min_{\mathbf{x}, \mathbf{y}}\left\{\sup(\|\phi[\mathbf{x}] - \phi[\mathbf{y}]\|_{\infty}) : \|\mathbf{x} - \mathbf{y}\|_{\infty} = p\right\}

where $p$ denotes the L- $\infty$ distance between the two pixels.

Theorem: Regional Dependency

The upper boundary of motion difference is related to $a_k$ and $p_k$ :

\begin{align*} \forall \mathbf{x}, \mathbf{y} \in \mathbb{Z}^d, \|\mathbf{x} - \mathbf{y}\|_\infty \geq p_{k''} + 2\sum_{k'=k''+1}^{k} a_{k'}, &\quad \sup(\|\phi_{k}[\mathbf{x}] - \phi_{k}[\mathbf{y}]\|_\infty) \geq 2\sum_{k'=k''}^{k} a_{k'}; \\ \exists \mathbf{x}, \mathbf{y} \in \mathbb{Z}^d, \|\mathbf{x} - \mathbf{y}\|_\infty < p_{k''-1} + 2\sum_{k'=k''}^{k} a_{k'}, &\quad \sup(\|\phi_{k}[\mathbf{x}] - \phi_{k}[\mathbf{y}]\|_\infty) = 2\sum_{k'=k''}^{k} a_{k'}; \end{align*}

where $k'', k,$ denote two recursive numbers satisfying $0 \leq k'' < k$ , and $\mathbf{x}, \mathbf{y}$ denote two coordinate entries of images or DDFs.

Thus a Motion-Separable structure is designed with the upsampled feature maps processed by the corresponding atrous convolution layers.

Description

To quantify the network's capacity for large deformation capturing, the accessible motion capture range is defined as:

Definition 1: Accessible Motion Range

a_k := \min_{\mathbf{x}}(\{\sup(\|\varphi_{k}[\mathbf{x}]\|_{\infty})\})

To quantify the Degree-of-Freedom limitation in the discontinuity of the estimated Deformation Displacement Field, we define the separability of the predicted motion:

Definition 2: Separability Bottleneck of Predicted Motion

\Delta_\infty(p) := \min_{\mathbf{x}, \mathbf{y}}\left\{\sup(\|\phi[\mathbf{x}] - \phi[\mathbf{y}]\|_{\infty}) : \|\mathbf{x} - \mathbf{y}\|_{\infty} = p\right\}

where $p$ denotes the L- $\infty$ distance between the two pixels.

Theorem: Regional Dependency

The upper boundary of motion difference is related to $a_k$ and $p_k$ :

\begin{align*} \forall \mathbf{x}, \mathbf{y} \in \mathbb{Z}^d, \|\mathbf{x} - \mathbf{y}\|_\infty \geq p_{k''} + 2\sum_{k'=k''+1}^{k} a_{k'}, &\quad \sup(\|\phi_{k}[\mathbf{x}] - \phi_{k}[\mathbf{y}]\|_\infty) \geq 2\sum_{k'=k''}^{k} a_{k'}; \\ \exists \mathbf{x}, \mathbf{y} \in \mathbb{Z}^d, \|\mathbf{x} - \mathbf{y}\|_\infty < p_{k''-1} + 2\sum_{k'=k''}^{k} a_{k'}, &\quad \sup(\|\phi_{k}[\mathbf{x}] - \phi_{k}[\mathbf{y}]\|_\infty) = 2\sum_{k'=k''}^{k} a_{k'}; \end{align*}

where $k'', k,$ denote two recursive numbers satisfying $0 \leq k'' < k$ , and $\mathbf{x}, \mathbf{y}$ denote two coordinate entries of images or DDFs.

Thus a Motion-Separable structure is designed with the upsampled feature maps processed by the corresponding atrous convolution layers.

M-S structure

Description

Papers Using This Method

M-S structure

Description

Papers Using This Method