Based on the theoretical analyses in RAN paper, a novel multi-scale backbone structure is designed in the paper. This structure enables the network to efficiently predict motion patterns with larger separable upper bounds by using optimized dilation convolution on high-resolution feature maps, while maintaining a capturable range of motion with low computational complexity.
To quantify the network's capacity for large deformation capturing, the accessible motion capture range is defined as:
Definition 1: Accessible Motion Range
The radius of capture range of the -level registration by the registration module is defined as the smallest upper bound of its accessible Deformation Displacement Field:
where denotes the L- norm of a vector, denotes the supremum or the maximum value of a given function with varying inputs and trainable weights of networks, and denotes one coordinate entry of the images or Deformation Displacement Fields.
To quantify the Degree-of-Freedom limitation in the discontinuity of the estimated Deformation Displacement Field, we define the separability of the predicted motion:
Definition 2: Separability Bottleneck of Predicted Motion
The motion separability bottleneck is defined as the minimum value of the upper bound of the Chebyshev difference of a network's predicted DDF between two locations with the specific Chebyshev distance :
where denotes the L- distance between the two pixels.
Theorem: Regional Dependency
The upper boundary of motion difference is related to and :
where denote two recursive numbers satisfying , and denote two coordinate entries of images or DDFs.
Thus a Motion-Separable structure is designed with the upsampled feature maps processed by the corresponding atrous convolution layers.