TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods

8,725 machine learning methods and techniques

AllAudioComputer VisionGeneralGraphsNatural Language ProcessingReinforcement LearningSequential

Triplet Loss

The goal of Triplet loss, in the context of Siamese Networks, is to maximize the joint probability among all score-pairs i.e. the product of all probabilities. By using its negative logarithm, we can get the loss formulation as follows: where the balance weight is used to keep the loss with the same scale for different number of instance sets.

GeneralIntroduced 2000424 papers

CARLA

CARLA: An Open Urban Driving Simulator

CARLA is an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. Source: Dosovitskiy et al. Image source: Dosovitskiy et al.

Reinforcement LearningIntroduced 2000422 papers

GAN Least Squares Loss

GAN Least Squares Loss is a least squares loss function for generative adversarial networks. Minimizing this objective function is equivalent to minimizing the Pearson divergence. The objective function (here for LSGAN) can be defined as: where and are the labels for fake data and real data and denotes the value that wants to believe for fake data.

GeneralIntroduced 2000421 papers

Local Response Normalization

Local Response Normalization is a normalization layer that implements the idea of lateral inhibition. Lateral inhibition is a concept in neurobiology that refers to the phenomenon of an excited neuron inhibiting its neighbours: this leads to a peak in the form of a local maximum, creating contrast in that area and increasing sensory perception. In practice, we can either normalize within the same channel or normalize across channels when we apply LRN to convolutional neural networks. Where the size is the number of neighbouring channels used for normalization, is multiplicative factor, an exponent and an additive factor

GeneralIntroduced 2000421 papers

Mask R-CNN

Mask R-CNN extends Faster R-CNN to solve instance segmentation tasks. It achieves this by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. In principle, Mask R-CNN is an intuitive extension of Faster R-CNN, but constructing the mask branch properly is critical for good results. Most importantly, Faster R-CNN was not designed for pixel-to-pixel alignment between network inputs and outputs. This is evident in how RoIPool, the de facto core operation for attending to instances, performs coarse spatial quantization for feature extraction. To fix the misalignment, Mask R-CNN utilises a simple, quantization-free layer, called RoIAlign, that faithfully preserves exact spatial locations. Secondly, Mask R-CNN decouples mask and class prediction: it predicts a binary mask for each class independently, without competition among classes, and relies on the network's RoI classification branch to predict the category. In contrast, an FCN usually perform per-pixel multi-class categorization, which couples segmentation and classification.

Computer VisionIntroduced 2000420 papers

SFT

Shrink and Fine-Tune

Shrink and Fine-Tune, or SFT, is a type of distillation that avoids explicit distillation by copying parameters to a student student model and then fine-tuning. Specifically it extracts a student model from the maximally spaced layers of a fine-tuned teacher. Each layer is copied fully from . For example, when creating a BART student with 3 decoder layers from the 12 encoder layer 12 decoder layer teacher, we copy the teacher’s full and decoder layers 0, 6, and 11 to the student. When deciding which layers to copy, we break ties arbitrarily; copying layers 0, 5, and 11 might work just as well. When copy only 1 decoder layer, we copy layer 0. This was found this to work better than copying layer 11. The impact of initialization on performance is measured experimentally in Section 6.1. After initialization, the student model continues to fine-tune on the summarization dataset, with the objective of minimizing .

GeneralIntroduced 2000415 papers

DPO

Direct Preference Optimization

Reinforcement LearningIntroduced 2000409 papers

Counterfactuals

Counterfactuals Explanations

Reinforcement LearningIntroduced 2000400 papers

ADMM

Alternating Direction Method of Multipliers

The alternating direction method of multipliers (ADMM) is an algorithm that solves convex optimization problems by breaking them into smaller pieces, each of which are then easier to handle. It takes the form of a decomposition-coordination procedure, in which the solutions to small local subproblems are coordinated to find a solution to a large global problem. ADMM can be viewed as an attempt to blend the benefits of dual decomposition and augmented Lagrangian methods for constrained optimization. It turns out to be equivalent or closely related to many other algorithms as well, such as Douglas-Rachford splitting from numerical analysis, Spingarn’s method of partial inverses, Dykstra’s alternating projections method, Bregman iterative algorithms for l1 problems in signal processing, proximal methods, and many others. Text Source: https://stanford.edu/boyd/papers/pdf/admmdistrstats.pdf Image Source: here

GeneralIntroduced 2000398 papers

Auxiliary Classifier

Auxiliary Classifiers are type of architectural component that seek to improve the convergence of very deep networks. They are classifier heads we attach to layers before the end of the network. The motivation is to push useful gradients to the lower layers to make them immediately useful and improve the convergence during training by combatting the vanishing gradient problem. They are notably used in the Inception family of convolutional neural networks.

GeneralIntroduced 2000395 papers

Non Maximum Suppression

Non Maximum Suppression is a computer vision method that selects a single entity out of many overlapping entities (for example bounding boxes in object detection). The criteria is usually discarding entities that are below a given probability bound. With remaining entities we repeatedly pick the entity with the highest probability, output that as the prediction, and discard any remaining box where a with the box output in the previous step. Image Credit: Martin Kersner

Computer VisionIntroduced 2000389 papers

LIME

Local Interpretable Model-Agnostic Explanations

LIME, or Local Interpretable Model-Agnostic Explanations, is an algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model. It modifies a single data sample by tweaking the feature values and observes the resulting impact on the output. It performs the role of an "explainer" to explain predictions from each data sample. The output of LIME is a set of explanations representing the contribution of each feature to a prediction for a single sample, which is a form of local interpretability. Interpretable models in LIME can be, for instance, linear regression or decision trees, which are trained on small perturbations (e.g. adding noise, removing words, hiding parts of the image) of the original model to provide a good local approximation.

GeneralIntroduced 2000378 papers

Latent Diffusion Model

Diffusion models applied to latent spaces, which are normally built with (Variational) Autoencoders.

GeneralIntroduced 2000366 papers

MoE

Mixture of Experts

GeneralIntroduced 2000366 papers

SNN

Spiking Neural Networks

Spiking Neural Networks (SNNs) are a class of artificial neural networks inspired by the structure and functioning of the brain's neural networks. Unlike traditional artificial neural networks that operate based on continuous firing rates, SNNs simulate the behavior of individual neurons through discrete spikes or action potentials. These spikes are triggered when the neuron's membrane potential reaches a certain threshold, and they propagate through the network, communicating information and triggering subsequent neuron activations. This spike-based communication allows SNNs to capture the temporal dynamics of information processing and exhibit asynchronous, event-driven behavior, making them well-suited for tasks such as temporal pattern recognition, event detection, and real-time processing. SNNs have gained attention due to their potential in efficiently processing and encoding information, offering advantages in energy efficiency, robustness, and compatibility with neuromorphic hardware architectures.

Introduced 2000363 papers

GloVe

GloVe Embeddings

GloVe Embeddings are a type of word embedding that encode the co-occurrence probability ratio between two words as vector differences. GloVe uses a weighted least squares objective that minimizes the difference between the dot product of the vectors of two words and the logarithm of their number of co-occurrences: where and are the word vector and bias respectively of word , and are the context word vector and bias respectively of word , is the number of times word occurs in the context of word , and is a weighting function that assigns lower weights to rare and frequent co-occurrences.

Natural Language ProcessingIntroduced 2000357 papers

LapEigen

Laplacian EigenMap

GraphsIntroduced 2000297 papers

Laplacian PE

Laplacian Positional Encodings

Laplacian eigenvectors represent a natural generalization of the Transformer positional encodings (PE) for graphs as the eigenvectors of a discrete line (NLP graph) are the cosine and sinusoidal functions. They help encode distance-aware information (i.e., nearby nodes have similar positional features and farther nodes have dissimilar positional features). Hence, Laplacian Positional Encoding (PE) is a general method to encode node positions in a graph. For each node, its Laplacian PE is the k smallest non-trivial eigenvectors.

GraphsIntroduced 2000296 papers

AE

Autoencoders

An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise”. Along with the reduction side, a reconstructing side is learnt, where the autoencoder tries to generate from the reduced encoding a representation as close as possible to its original input, hence its name. Extracted from: Wikipedia Image source: Wikipedia

GeneralIntroduced 2000293 papers

OPT

OPT is a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters. The model uses an AdamW optimizer and weight decay of 0.1. It follows a linear learning rate schedule, warming up from 0 to the maximum learning rate over the first 2000 steps in OPT-175B, or over 375M tokens in the smaller models, and decaying down to 10% of the maximum LR over 300B tokens. The batch sizes range from 0.5M to 4M depending on the model size and is kept constant throughout the course of training.

Natural Language ProcessingIntroduced 2000285 papers

Spatial Pyramid Pooling

Spatial Pyramid Pooling (SPP) is a pooling layer that removes the fixed-size constraint of the network, i.e. a CNN does not require a fixed-size input image. Specifically, we add an SPP layer on top of the last convolutional layer. The SPP layer pools the features and generates fixed-length outputs, which are then fed into the fully-connected layers (or other classifiers). In other words, we perform some information aggregation at a deeper stage of the network hierarchy (between convolutional layers and fully-connected layers) to avoid the need for cropping or warping at the beginning.

Computer VisionIntroduced 2000285 papers

FCN

Fully Convolutional Network

Fully Convolutional Networks, or FCNs, are an architecture used mainly for semantic segmentation. They employ solely locally connected layers, such as convolution, pooling and upsampling. Avoiding the use of dense layers means less parameters (making the networks faster to train). It also means an FCN can work for variable image sizes given all connections are local. The network consists of a downsampling path, used to extract and interpret the context, and an upsampling path, which allows for localization. FCNs also employ skip connections to recover the fine-grained spatial information lost in the downsampling path.

Computer VisionIntroduced 2000285 papers

How do I speak to a person at Expedia?-/+/

To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media.ggfdf How do I speak to a person at Expedia?How do I speak to a person at Expedia?To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media.To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media. To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media.To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media.To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media.chgd

Computer VisionIntroduced 2000283 papers

SSD

SSD is a single-stage object detection method that discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. The fundamental improvement in speed comes from eliminating bounding box proposals and the subsequent pixel or feature resampling stage. Improvements over competing single-stage methods include using a small convolutional filter to predict object categories and offsets in bounding box locations, using separate predictors (filters) for different aspect ratio detections, and applying these filters to multiple feature maps from the later stages of a network in order to perform detection at multiple scales.

Computer VisionIntroduced 2000278 papers

DCNN

Diffusion-Convolutional Neural Networks

Diffusion-convolutional neural networks (DCNN) is a model for graph-structured data. Through the introduction of a diffusion-convolution operation, diffusion-based representations can be learned from graph structured data and used as an effective basis for node classification. Description and image from: Diffusion-Convolutional Neural Networks

GraphsIntroduced 2000277 papers

AM

Attention Model

Reinforcement LearningIntroduced 2000274 papers

CAM

Class-activation map

Class activation maps could be used to interpret the prediction decision made by the convolutional neural network (CNN). Image source: Learning Deep Features for Discriminative Localization

Natural Language ProcessingIntroduced 2000267 papers

ICA

Independent Component Analysis

Independent component analysis (ICA) is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals. ICA defines a generative model for the observed multivariate data, which is typically given as a large database of samples. In the model, the data variables are assumed to be linear mixtures of some unknown latent variables, and the mixing system is also unknown. The latent variables are assumed nongaussian and mutually independent, and they are called the independent components of the observed data. These independent components, also called sources or factors, can be found by ICA. ICA is superficially related to principal component analysis and factor analysis. ICA is a much more powerful technique, however, capable of finding the underlying factors or sources when these classic methods fail completely. Extracted from (https://www.cs.helsinki.fi/u/ahyvarin/whatisica.shtml) Source papers: Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture90079-X) Independent component analysis, A new concept?90029-9) Independent component analysis: algorithms and applications00026-5)

GeneralIntroduced 2000261 papers

Random Gaussian Blur

Random Gaussian Blur is an image data augmentation technique where we randomly blur the image using a Gaussian distribution. Image Source: Wikipedia

Computer VisionIntroduced 2000260 papers

GA

Genetic Algorithms

Genetic Algorithms are search algorithms that mimic Darwinian biological evolution in order to select and propagate better solutions.

Reinforcement LearningIntroduced 2000259 papers

YOLOv3

YOLOv3 is a real-time, single-stage object detection model that builds on YOLOv2 with several improvements. Improvements include the use of a new backbone network, Darknet-53 that utilises residual connections, or in the words of the author, "those newfangled residual network stuff", as well as some improvements to the bounding box prediction step, and use of three different scales from which to extract features (similar to an FPN).

Computer VisionIntroduced 2000258 papers

Random Search

Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly. This can be simply applied to the discrete setting described above, but also generalizes to continuous and mixed spaces. It can outperform Grid search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm. In this case, the optimization problem is said to have a low intrinsic dimensionality. Random Search is also embarrassingly parallel, and additionally allows the inclusion of prior knowledge by specifying the distribution from which to sample. Extracted from Wikipedia Source Paper Image Source: BERGSTRA AND BENGIO

GeneralIntroduced 2000256 papers

YOLOv8

You Only Look Once

Computer VisionIntroduced 2000254 papers

NT-Xent

Normalized Temperature-scaled Cross Entropy Loss

NT-Xent, or Normalized Temperature-scaled Cross Entropy Loss, is a loss function. Let denote the cosine similarity between two vectors and . Then the loss function for a positive pair of examples is : where {} is an indicator function evaluating to iff and denotes a temperature parameter. The final loss is computed across all positive pairs, both and , in a mini-batch. Source: SimCLR

GeneralIntroduced 2000251 papers

(FiLe@Against@Claim)How do I file a claim against Expedia?

How do I file a claim against Expedia? How Do I File a Claim Against Expedia? – Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 for Fast Help & Exclusive Travel Discounts!Need to file a claim with Expedia? Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 now for expert assistance and unlock exclusive best deal offers on hotels, flights, and vacation packages. Get fast resolution on your travel issues while enjoying limited-time discounts that make your next trip smoother, more affordable, and stress-free. Call today—don’t miss out! How do I file a claim against Expedia? How Do I File a Claim Against Expedia? – Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 for Fast Help & Exclusive Travel Discounts!Need to file a claim with Expedia? Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 now for expert assistance and unlock exclusive best deal offers on hotels, flights, and vacation packages. Get fast resolution on your travel issues while enjoying limited-time discounts that make your next trip smoother, more affordable, and stress-free. Call today—don’t miss out!

GeneralIntroduced 2000245 papers

TS

Spatio-temporal stability analysis

Spatio-temporal features extraction that measure the stabilty. The proposed method is based on a compression algorithm named Run Length Encoding. The workflow of the method is presented bellow.

Computer VisionIntroduced 2000242 papers

fastText

fastText embeddings exploit subword information to construct word embeddings. Representations are learnt of character -grams, and words represented as the sum of the -gram vectors. This extends the word2vec type models with subword information. This helps the embeddings understand suffixes and prefixes. Once a word is represented using character -grams, a skipgram model is trained to learn the embeddings.

Natural Language ProcessingIntroduced 2000240 papers

HyperNetwork

A HyperNetwork is a network that generates weights for a main network. The behavior of the main network is the same with any usual neural network: it learns to map some raw inputs to their desired targets; whereas the hypernetwork takes a set of inputs that contain information about the structure of the weights and generates the weight for that layer.

GeneralIntroduced 2000239 papers

FLIP

https://developer.nvidia.com/blog/flip-a-difference-evaluator-for-alternating-images/

GeneralIntroduced 2000237 papers

ELMo

Embeddings from Language Models, or ELMo, is a type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. A biLM combines both a forward and backward LM. ELMo jointly maximizes the log likelihood of the forward and backward directions. To add ELMo to a supervised model, we freeze the weights of the biLM and then concatenate the ELMo vector with and pass the ELMO enhanced representation into the task RNN. Here is a context-independent token representation for each token position. Image Source: here

Natural Language ProcessingIntroduced 2000234 papers

NTK

Neural Tangent Kernel

GeneralIntroduced 2000233 papers

(TravEL!!Guide)How Do I File a Claim with Expedia?

How Do I File a Claim with Expedia? Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 for Fast Help & Exclusive Travel Discounts!Need to file a claim with Expedia? Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 now for immediate support and unlock exclusive best deal offers on hotels, flights, and vacation packages. Resolve your issue quickly while enjoying limited-time travel discounts that make your next trip smoother, more affordable, and worry-free. Don’t miss out—call today and save! .How do I get a full refund from Expedia? How Do I Communicate with Expedia? – Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 for 24/7 Support & Exclusive Travel Discounts!Need to reach Expedia fast? Call now to speak directly with a live agent and unlock exclusive best deal discounts on flights, hotels, and vacation packages. Get personalized assistance while enjoying limited-time travel offers that make your next journey smoother, more affordable, and stress-free. Don’t wait—call today and save!

GeneralIntroduced 2000225 papers

Spectral Normalization

Spectral Normalization is a normalization technique used for generative adversarial networks, used to stabilize training of the discriminator. Spectral normalization has the convenient property that the Lipschitz constant is the only hyper-parameter to be tuned. It controls the Lipschitz constant of the discriminator by constraining the spectral norm of each layer . The Lipschitz norm is equal to , where is the spectral norm of the matrix ( matrix norm of ): which is equivalent to the largest singular value of . Therefore for a linear layer the norm is given by . Spectral normalization normalizes the spectral norm of the weight matrix so it satisfies the Lipschitz constraint :

GeneralIntroduced 2000220 papers

DDPG

Deep Deterministic Policy Gradient

DDPG, or Deep Deterministic Policy Gradient, is an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. It combines the actor-critic approach with insights from DQNs: in particular, the insights that 1) the network is trained off-policy with samples from a replay buffer to minimize correlations between samples, and 2) the network is trained with a target Q network to give consistent targets during temporal difference backups. DDPG makes use of the same ideas along with batch normalization.

Reinforcement LearningIntroduced 2000218 papers

Colorization

Colorization is a self-supervision approach that relies on colorization as the pretext task in order to learn image representations.

GeneralIntroduced 2000213 papers

Discrete Cosine Transform

Discrete Cosine Transform (DCT) is an orthogonal transformation method that decomposes an image to its spatial frequency spectrum. It expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. It is used a lot in compression tasks, e..g image compression where for example high-frequency components can be discarded. It is a type of Fourier-related Transform, similar to discrete fourier transforms (DFTs), but only using real numbers. Image Credit: Wikipedia

GeneralIntroduced 2000212 papers

RetinaNet

RetinaNet is a one-stage object detection model that utilizes a focal loss function to address class imbalance during training. Focal loss applies a modulating term to the cross entropy loss in order to focus learning on hard negative examples. RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks. The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-shelf convolutional network. The first subnet performs convolutional object classification on the backbone's output; the second subnet performs convolutional bounding box regression. The two subnetworks feature a simple design that the authors propose specifically for one-stage, dense detection. We can see the motivation for focal loss by comparing with two-stage object detectors. Here class imbalance is addressed by a two-stage cascade and sampling heuristics. The proposal stage (e.g., Selective Search, EdgeBoxes, DeepMask, RPN) rapidly narrows down the number of candidate object locations to a small number (e.g., 1-2k), filtering out most background samples. In the second classification stage, sampling heuristics, such as a fixed foreground-to-background ratio, or online hard example mining (OHEM), are performed to maintain a manageable balance between foreground and background. In contrast, a one-stage detector must process a much larger set of candidate object locations regularly sampled across an image. To tackle this, RetinaNet uses a focal loss function, a dynamically scaled cross entropy loss, where the scaling factor decays to zero as confidence in the correct class increases. Intuitively, this scaling factor can automatically down-weight the contribution of easy examples during training and rapidly focus the model on hard examples. Formally, the Focal Loss adds a factor to the standard cross entropy criterion. Setting reduces the relative loss for well-classified examples (), putting more focus on hard, misclassified examples. Here there is tunable focusing parameter .

Computer VisionIntroduced 2000210 papers

DINO

self-DIstillation with NO labels

DINO (self-distillation with no labels) is a self-supervised learning method that directly predicts the output of a teacher network - built with a momentum encoder - using a standard cross-entropy loss. In the example to the right, DINO is illustrated in the case of one single pair of views for simplicity. The model passes two different random transformations of an input image to the student and teacher networks. Both networks have the same architecture but other parameters. The output of the teacher network is centered with a mean computed over the batch. Each network outputs a dimensional feature normalized with a temperature softmax over the feature dimension. Their similarity is then measured with a cross-entropy loss. A stop-gradient (sg) operator is applied to the teacher to propagate gradients only through the student. The teacher parameters are updated with the student parameters' exponential moving average (ema).

Computer VisionIntroduced 2000208 papers

CutMix

CutMix is an image data augmentation strategy. Instead of simply removing pixels as in Cutout, we replace the removed regions with a patch from another image. The ground truth labels are also mixed proportionally to the number of pixels of combined images. The added patches further enhance localization ability by requiring the model to identify the object from a partial view.

Computer VisionIntroduced 2000208 papers

3D Convolution

A 3D Convolution is a type of convolution where the kernel slides in 3 dimensions as opposed to 2 dimensions with 2D convolutions. One example use case is medical imaging where a model is constructed using 3D image slices. Additionally video based data has an additional temporal dimension over images making it suitable for this module. Image: Lung nodule detection based on 3D convolutional neural networks, Fan et al

Computer VisionIntroduced 2015208 papers
PreviousPage 3 of 175Next