TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods

8,725 machine learning methods and techniques

AllAudioComputer VisionGeneralGraphsNatural Language ProcessingReinforcement LearningSequential

CBHG

CBHG is a building block used in the Tacotron text-to-speech model. It consists of a bank of 1-D convolutional filters, followed by highway networks and a bidirectional gated recurrent unit (BiGRU). The module is used to extract representations from sequences. The input sequence is first convolved with sets of 1-D convolutional filters, where the -th set contains filters of width (i.e. ). These filters explicitly model local and contextual information (akin to modeling unigrams, bigrams, up to K-grams). The convolution outputs are stacked together and further max pooled along time to increase local invariances. A stride of 1 is used to preserve the original time resolution. The processed sequence is further passed to a few fixed-width 1-D convolutions, whose outputs are added with the original input sequence via residual connections. Batch normalization is used for all convolutional layers. The convolution outputs are fed into a multi-layer highway network to extract high-level features. Finally, a bidirectional GRU RNN is stacked on top to extract sequential features from both forward and backward context.

GeneralIntroduced 200065 papers

Tacotron

Tacotron is an end-to-end generative text-to-speech model that takes a character sequence as input and outputs the corresponding spectrogram. The backbone of Tacotron is a seq2seq model with attention. The Figure depicts the model, which includes an encoder, an attention-based decoder, and a post-processing net. At a high-level, the model takes characters as input and produces spectrogram frames, which are then converted to waveforms.

SequentialIntroduced 200065 papers

WGAN-GP Loss

Wasserstein Gradient Penalty Loss, or WGAN-GP Loss, is a loss used for generative adversarial networks that augments the Wasserstein loss with a gradient norm penalty for random samples to achieve Lipschitz continuity: It was introduced as part of the WGAN-GP overall model.

GeneralIntroduced 200065 papers

How do i ask a question at Expedia?*AskExpertService

To ask a question on Expedia, you can utilize their Help Center +1-888-829-0881, call customer service, use live chat, or reach out via social media. You can also find answers to frequently asked questions on their website. Detailed Methods: Expedia Help Center: Visit the Expedia website or app and navigate to the Help Center. This section contains FAQs, articles, and troubleshooting tips. Call Customer Service: Dial +1-844-Expedia or +1-888-829-0881 to speak with a representative How do I ask questions on Expedia? To ask a question on Expedia, you can visit their Help Center on the website or app, call them at +1-888-829-0881use the live chat, or reach out via social media. You can also find answers to common questions in the FAQ section. Ways to Ask a Question on Expedia · Phone Support: Call +1-888-829-0881or +1-888-829-0881to speak with a live Expedia agent. · Live Chat: Use the chat feature To ask a question on Expedia +1-888-829-0881, go to their Help Center, use the search bar, or start a live chat. You can also call their customer. To ask a question at Expedia, visit their Help Center on the website or app. You can also call +1-888-829-0881) , use the live chat feature, or reach out via To communicate with Expedia, you can call their customer service hotline (+1-888-829-0881), use their live chat feature on their website To ask questions on Expedia at +1-888-829-0881) (OTA) // +1-888-829-0881 [LIVE PERSON], visit their Help Center to search for answers

GeneralIntroduced 200065 papers

Cutout

Cutout is an image augmentation and regularization technique that randomly masks out square regions of input during training. and can be used to improve the robustness and overall performance of convolutional neural networks. The main motivation for cutout comes from the problem of object occlusion, which is commonly encountered in many computer vision tasks, such as object recognition, tracking, or human pose estimation. By generating new images which simulate occluded examples, we not only better prepare the model for encounters with occlusions in the real world, but the model also learns to take more of the image context into consideration when making decisions

Computer VisionIntroduced 200064 papers

Embedding Dropout

Embedding Dropout is equivalent to performing dropout on the embedding matrix at a word level, where the dropout is broadcast across all the word vector’s embedding. The remaining non-dropped-out word embeddings are scaled by where is the probability of embedding dropout. As the dropout occurs on the embedding matrix that is used for a full forward and backward pass, this means that all occurrences of a specific word will disappear within that pass, equivalent to performing variational dropout on the connection between the one-hot embedding and the embedding lookup. Source: Merity et al, Regularizing and Optimizing LSTM Language Models

GeneralIntroduced 200064 papers

SBERT

Sentence-BERT

Natural Language ProcessingIntroduced 200064 papers

Wizard

Wizard: Unsupervised goats tracking algorithm

Computer vision is an interesting tool for animal behavior monitoring, mainly because it limits animal handling and it can be used to record various traits using only one sensor. From previous studies, this technic has shown to be suitable for various species and behavior. However it remains challenging to collect individual information, i.e. not only to detect animals and behavior on the video frames, but also to identify them. Animal identification is a prerequisite to gather individual information in order to characterize individuals and compare them. A common solution to this problem, known as multiple objects tracking, consists in detecting the animals on each video frame, and then associate detections to a unique animal ID. Association of detections between two consecutive frames are generally made to maintain coherence of the detection locations and appearances. To extract appearance information, a common solution is to use a convolutional neural network (CNN), trained on a large dataset before running the tracking algorithm. For farmed animals, designing such network is challenging as far as large training dataset are still lacking. In this article, we proposed an innovative solution, where the CNN used to extract appearance information is parameterized using offline unsupervised training. The algorithm, named Wizard, was evaluated for the purpose of goats monitoring in outdoor conditions. 17 annotated videos were used, for a total of 4H30, with various number of animals on the video (from 3 to 8) and different level of color differences between animals. First, the ability of the algorithm to track the detected animals was evaluated. When animals were detected, the algorithm found the correct animal ID in 94.82% of the frames. When tracking and detection were evaluated together, we found that Wizard found the correct animal ID in 86.18% of the video length. In situations where the animal detection rate could be high, Wizard seems to be a suitable solution for individual behavior analysis experiments based on computer vision.

Computer VisionIntroduced 200064 papers

Transformer-XL

Transformer-XL (meaning extra long) is a Transformer architecture that introduces the notion of recurrence to the deep self-attention network. Instead of computing the hidden states from scratch for each new segment, Transformer-XL reuses the hidden states obtained in previous segments. The reused hidden states serve as memory for the current segment, which builds up a recurrent connection between the segments. As a result, modeling very long-term dependency becomes possible because information can be propagated through the recurrent connections. As an additional contribution, the Transformer-XL uses a new relative positional encoding formulation that generalizes to attention lengths longer than the one observed during training.

Natural Language ProcessingIntroduced 200064 papers

Hierarchical Feature Fusion

Hierarchical Feature Fusion (HFF) is a feature fusion method employed in ESP and EESP image model blocks for degridding. In the ESP module, concatenating the outputs of dilated convolutions gives the ESP module a large effective receptive field, but it introduces unwanted checkerboard or gridding artifacts. To address the gridding artifact in ESP, the feature maps obtained using kernels of different dilation rates are hierarchically added before concatenating them (HFF). This solution is simple and effective and does not increase the complexity of the ESP module.

Computer VisionIntroduced 200064 papers

Weight Tying

Weight Tying improves the performance of language models by tying (sharing) the weights of the embedding and softmax layers. This method also massively reduces the total number of parameters in the language models that it is applied to. Language models are typically comprised of an embedding layer, followed by a number of Transformer or LSTM layers, which are finally followed by a softmax layer. Embedding layers learn word representations, such that similar words (in meaning) are represented by vectors that are near each other (in cosine distance). [Press & Wolf, 2016] showed that the softmax matrix, in which every word also has a vector representation, also exhibits this property. This leads them to propose to share the softmax and embedding matrices, which is done today in nearly all language models. This method was independently introduced by Press & Wolf, 2016 and Inan et al, 2016. Additionally, the Press & Wolf paper proposes Three-way Weight Tying, a method for NMT models in which the embedding matrix for the source language, the embedding matrix for the target language, and the softmax matrix for the target language are all tied. That method has been adopted by the Attention Is All You Need model and many other neural machine translation models.

GeneralIntroduced 200063 papers

PGM

Probability Guided Maxout

A regularization criterion that, differently from dropout and its variants, is deterministic rather than random. It grounds on the empirical evidence that feature descriptors with larger L2-norm and highly-active nodes are strongly correlated to confident class predictions. Thus, the criterion guides towards dropping a percentage of the most active nodes of the descriptors, proportionally to the estimated class probability

GeneralIntroduced 200063 papers

UNet++

UNet++ is an architecture for semantic segmentation based on the U-Net. Through the use of densely connected nested decoder sub-networks, it enhances extracted feature processing and was reported by its authors to outperform the U-Net in Electron Microscopy (EM), Cell, Nuclei, Brain Tumor, Liver and Lung Nodule medical image segmentation tasks.

Computer VisionIntroduced 200061 papers

Activation Normalization

Activation Normalization is a type of normalization used for flow-based generative models; specifically it was introduced in the GLOW architecture. An ActNorm layer performs an affine transformation of the activations using a scale and bias parameter per channel, similar to batch normalization. These parameters are initialized such that the post-actnorm activations per-channel have zero mean and unit variance given an initial minibatch of data. This is a form of data dependent initilization. After initialization, the scale and bias are treated as regular trainable parameters that are independent of the data.

GeneralIntroduced 200061 papers

Pythia

Pythia is a suite of decoder-only autoregressive language models all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. The model architecture and hyperparameters largely follow GPT-3, with a few notable deviations based on recent advances in best practices for large scale language modeling.

Natural Language ProcessingIntroduced 200060 papers

Consistency Models

GeneralIntroduced 200059 papers

Soft Actor Critic

Soft Actor Critic, or SAC, is an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. SAC combines off-policy updates with a stable stochastic actor-critic formulation. The SAC objective has a number of advantages. First, the policy is incentivized to explore more widely, while giving up on clearly unpromising avenues. Second, the policy can capture multiple modes of near-optimal behavior. In problem settings where multiple actions seem equally attractive, the policy will commit equal probability mass to those actions. Lastly, the authors present evidence that it improves learning speed over state-of-art methods that optimize the conventional RL objective function.

Reinforcement LearningIntroduced 200058 papers

GAM

Generalized additive models

SequentialIntroduced 200057 papers

Denoising Score Matching

Training a denoiser on signals gives you a powerful prior over this signal that you can then use to sample examples of this signal.

Computer VisionIntroduced 200057 papers

XLM

XLM is a Transformer based architecture that is pre-trained using one of three language modelling objectives: 1. Causal Language Modeling - models the probability of a word given the previous words in a sentence. 2. Masked Language Modeling - the masked language modeling objective of BERT. 3. Translation Language Modeling - a (new) translation language modeling objective for improving cross-lingual pre-training. The authors find that both the CLM and MLM approaches provide strong cross-lingual features that can be used for pretraining models.

Natural Language ProcessingIntroduced 200057 papers

A3C

A3C, Asynchronous Advantage Actor Critic, is a policy gradient algorithm in reinforcement learning that maintains a policy and an estimate of the value function . It operates in the forward view and uses a mix of -step returns to update both the policy and the value-function. The policy and the value function are updated after every actions or when a terminal state is reached. The update performed by the algorithm can be seen as where is an estimate of the advantage function given by: where can vary from state to state and is upper-bounded by . The critics in A3C learn the value function while multiple actors are trained in parallel and get synced with global parameters every so often. The gradients are accumulated as part of training for stability - this is like parallelized stochastic gradient descent. Note that while the parameters of the policy and of the value function are shown as being separate for generality, we always share some of the parameters in practice. We typically use a convolutional neural network that has one softmax output for the policy and one linear output for the value function , with all non-output layers shared.

Reinforcement LearningIntroduced 200057 papers

Sparse Autoencoder

A Sparse Autoencoder is a type of autoencoder that employs sparsity to achieve an information bottleneck. Specifically the loss function is constructed so that activations are penalized within a layer. The sparsity constraint can be imposed with L1 regularization or a KL divergence between expected average neuron activation to an ideal distribution . Image: Jeff Jordan. Read his blog post (click) for a detailed summary of autoencoders.

Computer VisionIntroduced 200057 papers

Activation Regularization

Activation Regularization (AR), or activation regularization, is regularization performed on activations as opposed to weights. It is usually used in conjunction with RNNs. It is defined as: where is a dropout mask used by later parts of the model, is the norm, and is the output of an RNN at timestep , and is a scaling coefficient. When applied to the output of a dense layer, AR penalizes activations that are substantially away from 0, encouraging activations to remain small.

GeneralIntroduced 200056 papers

Sarsa

Sarsa is an on-policy TD control algorithm: This update is done after every transition from a nonterminal state . if is terminal, then is defined as zero. To design an on-policy control algorithm using Sarsa, we estimate for a behaviour policy and then change towards greediness with respect to . Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Reinforcement LearningIntroduced 199456 papers

DFA

Direct Feedback Alignment

GeneralIntroduced 200055 papers

YOLOv2

YOLOv2, or YOLO9000, is a single-stage real-time object detection model. It improves upon YOLOv1 in several ways, including the use of Darknet-19 as a backbone, batch normalization, use of a high-resolution classifier, and the use of anchor boxes to predict bounding boxes, and more.

Computer VisionIntroduced 200055 papers

Apollo

Adaptive Parameter-wise Diagonal Quasi-Newton Method

Please enter a description about the method here

GeneralIntroduced 200055 papers

Group Normalization

Group Normalization is a normalization layer that divides channels into groups and normalizes the features within each group. GN does not exploit the batch dimension, and its computation is independent of batch sizes. In the case where the group size is 1, it is equivalent to Instance Normalization. As motivation for the method, many classical features like SIFT and HOG had group-wise features and involved group-wise normalization. For example, a HOG vector is the outcome of several spatial cells where each cell is represented by a normalized orientation histogram. Formally, Group Normalization is defined as: Here is the feature computed by a layer, and is an index. Formally, a Group Norm layer computes and in a set defined as: {}. Here is the number of groups, which is a pre-defined hyper-parameter ( by default). is the number of channels per group. is the floor operation, and the final term means that the indexes and are in the same group of channels, assuming each group of channels are stored in a sequential order along the axis.

GeneralIntroduced 200055 papers

SwAV

Swapping Assignments between Views

SwaV, or Swapping Assignments Between Views, is a self-supervised learning approach that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, it simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or views) of the same image, instead of comparing features directly as in contrastive learning. Simply put, SwaV uses a swapped prediction mechanism where we predict the cluster assignment of a view from the representation of another view.

GeneralIntroduced 200055 papers

PCB

Part-based Convolutional Baseline

Computer VisionIntroduced 200054 papers

ERNIE

ERNIE is a transformer-based model consisting of two stacked modules: 1) textual encoder and 2) knowledgeable encoder, which is responsible to integrate extra token-oriented knowledge information into textual information. This layer consists of stacked aggregators, designed for encoding both tokens and entities as well as fusing their heterogeneous features. To integrate this layer of enhancing representations via knowledge, a special pre-training task is adopted for ERNIE - it involves randomly masking token-entity alignments and training the model to predict all corresponding entities based on aligned tokens (aka denoising entity auto-encoder).

Natural Language ProcessingIntroduced 200054 papers

PEGASUS

PEGASUS proposes a transformer-based model for abstractive summarization. It uses a special self-supervised pre-training objective called gap-sentences generation (GSG) that's designed to perform well on summarization-related downstream tasks. As reported in the paper, "both GSG and MLM are applied simultaneously to this example as pre-training objectives. Originally there are three sentences. One sentence is masked with [MASK1] and used as target generation text (GSG). The other two sentences remain in the input, but some tokens are randomly masked by [MASK2]."

Natural Language ProcessingIntroduced 200053 papers

SimCSE

SimCSE is a contrastive learning framework for generating sentence embeddings. It utilizes an unsupervised approach, which takes an input sentence and predicts itself in contrastive objective, with only standard dropout used as noise. The authors find that dropout acts as minimal “data augmentation” of hidden representations, while removing it leads to a representation collapse. Afterwards a supervised approach is used, which incorporates annotated pairs from natural language inference datasets into the contrastive framework, by using “entailment” pairs as positives and “contradiction

Natural Language ProcessingIntroduced 200052 papers

AWD-LSTM

ASGD Weight-Dropped LSTM

ASGD Weight-Dropped LSTM, or AWD-LSTM, is a type of recurrent neural network that employs DropConnect for regularization, as well as NT-ASGD for optimization - non-monotonically triggered averaged SGD - which returns an average of last iterations of weights. Additional regularization techniques employed include variable length backpropagation sequences, variational dropout, embedding dropout, weight tying, independent embedding/hidden size, activation regularization and temporal activation regularization.

SequentialIntroduced 200052 papers

ShuffleNet

ShuffleNet is a convolutional neural network designed specially for mobile devices with very limited computing power. The architecture utilizes two new operations, pointwise group convolution and channel shuffle, to reduce computation cost while maintaining accuracy.

Computer VisionIntroduced 200051 papers

ShuffleNet Block

A ShuffleNet Block is an image model block that utilises a channel shuffle operation, along with depthwise convolutions, for an efficient architectural design. It was proposed as part of the ShuffleNet architecture. The starting point is the Residual Block unit from ResNets, which is then modified with a pointwise group convolution and a channel shuffle operation.

GeneralIntroduced 200051 papers

How to file a complaint against Expedia?

guidence~How to file a complaint against Expedia?

To file a complaint with Expedia, call their customer support at +1-(805)-330-4056. You can also send a written complaint using the contact form on the Expedia website. Calling +1-(805)-330-4056 gives you direct access to their trained representatives, who can escalate your issue if needed. Make sure to provide all relevant booking information, screenshots (if applicable), and a clear explanation of your complaint. You may also post your concern on Expedia's official Twitter or Facebook pages to receive attention from their social media support team.

GeneralIntroduced 201750 papers

Maxout

The Maxout Unit is a generalization of the ReLU and the leaky ReLU functions. It is a piecewise linear function that returns the maximum of the inputs, designed to be used in conjunction with dropout. Both ReLU and leaky ReLU are special cases of Maxout. The main drawback of Maxout is that it is computationally expensive as it doubles the number of parameters for each neuron.

GeneralIntroduced 200050 papers

OASIS

OASIS is a GAN-based model to translate semantic label maps into realistic-looking images. The model builds on preceding work such as Pix2Pix and SPADE. OASIS introduces the following innovations: 1. The method is not dependent on the perceptual loss, which is commonly used for the semantic image synthesis task. A VGG network trained on ImageNet is routinely employed as the perceptual loss to strongly improve the synthesis quality. The authors show that this perceptual loss also has negative effects: First, it reduces the diversity of the generated images. Second, it negatively influences the color distribution to be more biased towards ImageNet. OASIS eliminates the dependence on the perceptual loss by changing the common discriminator design: The OASIS discriminator segments an image into one of the real classes or an additional fake class. In doing so, it makes more efficient use of the label maps that the discriminator normally receives. This distinguishes the discriminator from the commonly used encoder-shaped discriminators, which concatenate the label maps to the input image and predict a single score per image. With the more fine-grained supervision through the loss of the OASIS discriminator, the perceptual loss is shown to become unnecessary. 2. A user can generate a diverse set of images per label map by simply resampling noise. This is achieved by conditioning the spatially-adaptive denormalization module in each layer of the GAN generator directly on spatially replicated input noise. A side effect of this conditioning is that at inference time an image can be resampled either globally or locally (either the complete image changes or a restricted region in the image).

Computer VisionIntroduced 200050 papers

UNETR

UNet Transformer

UNETR, or UNet Transformer, is a Transformer-based architecture for medical image segmentation that utilizes a pure transformer as the encoder to learn sequence representations of the input volume -- effectively capturing the global multi-scale information. The transformer encoder is directly connected to a decoder via skip connections at different resolutions like a U-Net to compute the final semantic segmentation output.

Computer VisionIntroduced 200049 papers

StyleGAN2

StyleGAN2 is a generative adversarial network that builds on StyleGAN with several improvements. First, adaptive instance normalization is redesigned and replaced with a normalization technique called weight demodulation. Secondly, an improved training scheme upon progressively growing is introduced, which achieves the same goal - training starts by focusing on low-resolution images and then progressively shifts focus to higher and higher resolutions - without changing the network topology during training. Additionally, new types of regularization like lazy regularization and path length regularization are proposed.

Computer VisionIntroduced 200049 papers

HypE

Hyperboloid Embeddings

Hyperboloid Embeddings (HypE) is a novel self-supervised dynamic reasoning framework, that utilizes positive first-order existential queries on a KG to learn representations of its entities and relations as hyperboloids in a Poincaré ball. HypE models the positive first-order queries as geometrical translation (t), intersection (), and union (). For the problem of KG reasoning in real-world datasets, the proposed HypE model significantly outperforms the state-of-the art results. HypE is also applied to an anomaly detection task on a popular e-commerce website product taxonomy as well as hierarchically organized web articles and demonstrate significant performance improvements compared to existing baseline methods. Finally, HypE embeddings can also be visualized in a Poincaré ball to clearly interpret and comprehend the representation space.

GraphsIntroduced 200049 papers

AMSGrad

AMSGrad is a stochastic optimization method that seeks to fix a convergence issue with Adam based optimizers. AMSGrad uses the maximum of past squared gradients rather than the exponential average to update the parameters:

GeneralIntroduced 200049 papers

OpenPose

Computer VisionIntroduced 200048 papers

BiFPN

A BiFPN, or Weighted Bi-directional Feature Pyramid Network, is a type of feature pyramid network which allows easy and fast multi-scale feature fusion. It incorporates the multi-level feature fusion idea from FPN, PANet and NAS-FPN that enables information to flow in both the top-down and bottom-up directions, while using regular and efficient connections. It also utilizes a fast normalized fusion technique. Traditional approaches usually treat all features input to the FPN equally, even those with different resolutions. However, input features at different resolutions often have unequal contributions to the output features. Thus, the BiFPN adds an additional weight for each input feature allowing the network to learn the importance of each. All regular convolutions are also replaced with less expensive depthwise separable convolutions. Comparing with PANet, PANet added an extra bottom-up path for information flow at the expense of more computational cost. Whereas BiFPN optimizes these cross-scale connections by removing nodes with a single input edge, adding an extra edge from the original input to output node if they are on the same level, and treating each bidirectional path as one feature network layer (repeating it several times for more high-level future fusion).

Computer VisionIntroduced 200048 papers

REM

Random Ensemble Mixture

Random Ensemble Mixture (REM) is an easy to implement extension of DQN inspired by Dropout. The key intuition behind REM is that if one has access to multiple estimates of Q-values, then a weighted combination of the Q-value estimates is also an estimate for Q-values. Accordingly, in each training step, REM randomly combines multiple Q-value estimates and uses this random combination for robust training.

Reinforcement LearningIntroduced 200048 papers

SPS

Semi-Pseudo-Label

GeneralIntroduced 200047 papers

SENet

A SENet is a convolutional neural network architecture that employs squeeze-and-excitation blocks to enable the network to perform dynamic channel-wise feature recalibration.

Computer VisionIntroduced 200047 papers

Copy-Paste

simple Copy-Paste

Computer VisionIntroduced 200047 papers

SegFormer

SegFormer is a Transformer-based framework for semantic segmentation that unifies Transformers with lightweight multilayer perceptron (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from different layers, and thus combining both local attention and global attention to render powerful representations.

Computer VisionIntroduced 200047 papers
PreviousPage 7 of 175Next