Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods

8,725 machine learning methods and techniques

All Audio Computer Vision General Graphs Natural Language Processing Reinforcement Learning Sequential

PP-YOLO

PP-YOLO is an object detector based on YOLOv3. It mainly tries to combine various existing tricks that almost not increase the number of model parameters and FLOPs, to achieve the goal of improving the accuracy of detector as much as possible while ensuring that the speed is almost unchanged. Some of these changes include: - Changing the DarkNet-53 backbone with ResNet50-vd. Some of the convolutional layers in ResNet50-vd are also replaced with deformable convolutional layers. - A larger batch size is used - changing from 64 to 192. - An exponentially moving average is used for the parameters. - DropBlock is applied to the FPN. - An IoU loss is used. - An IoU prediction branch is added to measure the accuracy of localization. - Grid Sensitive is used, similar to YOLOv4. - Matrix NMS is used. - CoordConv is used for the FPN, replacing the 1x1 convolution layer, and also the first convolution layer in the detection head. - Spatial Pyramid Pooling is used for the top feature map.

Computer VisionIntroduced 20004 papers

m-arcsinh

modified arcsinh

GeneralIntroduced 20003 papers

FreeAnchor

FreeAnchor is an anchor supervision method for object detection. Many CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Unit (IoU). In contrast, FreeAnchor is a learning-to-match approach that breaks the IoU restriction, allowing objects to match anchors in a flexible manner. It updates hand-crafted anchor assignment to free anchor matching by formulating detector training as a maximum likelihood estimation (MLE) procedure. FreeAnchor targets at learning features which best explain a class of objects in terms of both classification and localization.

Computer VisionIntroduced 20003 papers

Vokenization

Vokenization is an approach for extrapolating multimodal alignments to language-only data by contextually mapping language tokens to their related images ("vokens") by retrieval. Instead of directly supervising the language model with visually grounded language datasets (e.g., MS COCO) these relative small datasets are used to train the vokenization processor (i.e. the vokenizer). Vokens are generated for large language corpora (e.g., English Wikipedia), and the visually-supervised language model takes the input supervision from these large datasets, thus bridging the gap between different data sources.

Computer VisionIntroduced 20003 papers

TAPEX

Table Pre-training via Execution

TAPEX is a conceptually simple and empirically powerful pre-training approach to empower existing models with table reasoning skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesising executable SQL queries.

Natural Language ProcessingIntroduced 20003 papers

QPT

Quantum Process Tomography

Reinforcement LearningIntroduced 20003 papers

Composite Fields

Represent and associate with a composite of primitive fields.

Computer VisionIntroduced 20003 papers

L-GCN

Learnable adjacency matrix GCN

Graph structure is learnable

GraphsIntroduced 20003 papers

MNMF

Modularity preserving NMF

GraphsIntroduced 20003 papers

DeCLUTR

DeCLUTR is an approach for learning universal sentence embeddings that utilizes a self-supervised objective that does not require labelled training data. The objective learns universal sentence embeddings by training an encoder to minimize the distance between the embeddings of textual segments randomly sampled from nearby in the same document.

Natural Language ProcessingIntroduced 20003 papers

PFGM

Poisson Flow Generative Models

Computer VisionIntroduced 20003 papers

Blended Diffusion

Blended Diffusion enables a zero-shot local text-guided image editing of natural images. Given an input image , an input mask and a target guiding text - the method enables to change the masked area within the image corresponding the the guiding text s.t. the unmasked area is left unchanged.

Computer VisionIntroduced 20003 papers

S-GCN

Spherical Graph Convolutional Network

GraphsIntroduced 20003 papers

Rational Activation function

GeneralIntroduced 20003 papers

Subformer

Subformer is a Transformer that combines sandwich-style parameter sharing, which overcomes naive cross-layer parameter sharing in generative models, and self-attentive embedding factorization (SAFE). In SAFE, a small self-attention layer is used to reduce embedding parameter count.

Natural Language ProcessingIntroduced 20003 papers

Ape-X DQN

Ape-X DQN is a variant of a DQN with some components of Rainbow-DQN that utilizes distributed prioritized experience replay through the Ape-X architecture.

Reinforcement LearningIntroduced 20003 papers

Tree-structured Parzen Estimator Approach (TPE)

GeneralIntroduced 20003 papers

TridentNet

TridentNet is an object detection architecture that aims to generate scale-specific feature maps with a uniform representational power. A parallel multi-branch architecture is constructed in which each branch shares the same transformation parameters but with different receptive fields. A scale-aware training scheme is used to specialize each branch by sampling object instances of proper scales for training.

Computer VisionIntroduced 20003 papers

Meena

Meena is a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. A seq2seq model is used with the Evolved Transformer as the main architecture. The model is trained on multi-turn conversations where the input sequence is all turns of the context and the output sequence is the response.

Natural Language ProcessingIntroduced 20003 papers

Guided Anchoring

Guided Anchoring is an anchoring scheme for object detection which leverages semantic features to guide the anchoring. The method is motivated by the observation that objects are not distributed evenly over the image. The scale of an object is also closely related to the imagery content, its location and geometry of the scene. Following this intuition, the method generates sparse anchors in two steps: first identifying sub-regions that may contain objects and then determining the shapes at different locations.

Computer VisionIntroduced 20003 papers

LIVE~AGENT|||How do I get to Expedia agent?

How do I get to Expedia agent? How do I speak to a person at Expedia? To speak with a human representative at Expedia, you can call their customer service directly at "+1-888-829-16.25 (Quick connect) or 888-829-0881 EXPEDIA-LINE (Live Person)".Contact Expedia's Customer Service: Dial the Expedia customer service hotline at +1-888-829-16.25 (Quick connect) or 888-829-0881 -EXPEDIA-LINE (Live Person). Asking a question directly to Expedia can be straightforward if you know how. [["☎️+1-888-829-0881"]] is the best number.

GeneralIntroduced 20003 papers

InfoGraph

GraphsIntroduced 20003 papers

GPFL

Graph Path Feature Learning

Graph Path Feature Learning is a probabilistic rule learner optimized to mine instantiated first-order logic rules from knowledge graphs. Instantiated rules contain constants extracted from KGs. Compared to abstract rules that contain no constants, instantiated rules are capable of explaining and expressing concepts in more detail. GPFL utilizes a novel two-stage rule generation mechanism that first generalizes extracted paths into templates that are acyclic abstract rules until a certain degree of template saturation is achieved, then specializes the generated templates into instantiated rules.

GeneralIntroduced 20003 papers

How do I resolve a dispute with Expedia?*ResolveFastService

How do I resolve a dispute with Expedia? To resolve a dispute with Expedia, contact customer service at +1(888) (829) (0881) OR +1(805) (330) (4056), or use their Help Center. Explain your issue clearly and provide booking details. While resolving the dispute, ask about possible discounts or travel credits—Expedia often offers special deals to maintain customer loyalty. How do I resolve a dispute with Expedia? To resolve a dispute with Expedia, contact customer service at +1(888) (829) (0881) OR +1(805) (330) (4056), or use their Help Center. Explain your issue clearly and provide booking details. While resolving the dispute, ask about possible discounts or travel credits—Expedia often offers special deals to maintain customer loyalty. How do I resolve a dispute with Expedia? To resolve a dispute with Expedia, contact customer service at +1(888) (829) (0881) OR +1(805) (330) (4056), or use their Help Center. Explain your issue clearly and provide booking details. While resolving the dispute, ask about possible discounts or travel credits—Expedia often offers special deals to maintain customer loyalty. How do I resolve a dispute with Expedia? To resolve a dispute with Expedia, contact customer service at +1(888) (829) (0881) OR +1(805) (330) (4056), or use their Help Center. Explain your issue clearly and provide booking details. While resolving the dispute, ask about possible discounts or travel credits—Expedia often offers special deals to maintain customer loyalty. How do I resolve a dispute with Expedia? To resolve a dispute with Expedia, contact customer service at +1(888) (829) (0881) OR +1(805) (330) (4056), or use their Help Center. Explain your issue clearly and provide booking details. While resolving the dispute, ask about possible discounts or travel credits—Expedia often offers special deals to maintain customer loyalty.

GeneralIntroduced 20003 papers

CrossTransformers

CrossTransformers is a Transformer-based neural network architecture which can take a small number of labeled images and an unlabeled query, find coarse spatial correspondence between the query and the labeled images, and then infer class membership by computing distances between spatially-corresponding features.

Computer VisionIntroduced 20003 papers

MixText

MixText is a semi-supervised learning method for text classification, which uses a new data augmentation method called TMix. TMix creates a large amount of augmented training samples by interpolating text in hidden space. The technique leverages advances in data augmentation to guess low-entropy labels for unlabeled data, making them as easy to use as labeled data.

Natural Language ProcessingIntroduced 20003 papers

Two-Way Dense Layer

Two-Way Dense Layer is an image model block used in the PeleeNet architectures. Motivated by GoogLeNet, the 2-way dense layer is used to get different scales of receptive fields. One way of the layer uses a 3x3 kernel size. The other way of the layer uses two stacked 3x3 convolution to learn visual patterns for large objects.

Computer VisionIntroduced 20003 papers

EESP

Extremely Efficient Spatial Pyramid of Depth-wise Dilated Separable Convolutions

An EESP Unit, or Extremely Efficient Spatial Pyramid of Depth-wise Dilated Separable Convolutions, is an image model block designed for edge devices. It was proposed as part of the ESPNetv2 CNN architecture. This building block is based on a reduce-split-transform-merge strategy. The EESP unit first projects the high-dimensional input feature maps into low-dimensional space using groupwise pointwise convolutions and then learns the representations in parallel using depthwise dilated separable convolutions with different dilation rates. Different dilation rates in each branch allow the EESP unit to learn the representations from a large effective receptive field. To remove the gridding artifacts caused by dilated convolutions, the EESP fuses the feature maps using hierarchical feature fusion (HFF).

GeneralIntroduced 20003 papers

PeleeNet

PeleeNet is a convolutional neural network and object detection backbone that is a variation of DenseNet with optimizations to meet a memory and computational budget. Unlike competing networks, it does not use depthwise convolutions and instead relies on regular convolutions.

Computer VisionIntroduced 20003 papers

VoiceFilter-Lite

VoiceFilter-Lite is a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system. In this architecture, the voice filtering model operates as a frame-by-frame frontend signal processor to enhance the features consumed by the speech recognizer, without reconstructing audio signals from the features. The key contributions are (1) A system to perform speech separation directly on ASR input features; (2) An asymmetric loss function to penalize oversuppression during training, to make the model harmless under various acoustic environments, (3) An adaptive suppression strength mechanism to adapt to different noise conditions.

AudioIntroduced 20003 papers

U2-Net

U2-Net is a two-level nested U-structure architecture that is designed for salient object detection (SOD). The architecture allows the network to go deeper, attain high resolution, without significantly increasing the memory and computation cost. This is achieved by a nested U-structure: on the bottom level, with a novel ReSidual U-block (RSU) module, which is able to extract intra-stage multi-scale features without degrading the feature map resolution; on the top level, there is a U-Net like structure, in which each stage is filled by a RSU block.

Computer VisionIntroduced 20003 papers

Conditional Instance Normalization

Conditional Instance Normalization is a normalization technique where all convolutional weights of a style transfer network are shared across many styles. The goal of the procedure is transform a layer’s activations into a normalized activation specific to painting style . Building off instance normalization, we augment the and parameters so that they’re matrices, where is the number of styles being modeled and is the number of output feature maps. Conditioning on a style is achieved as follows: where and are ’s mean and standard deviation taken across spatial axes and and are obtained by selecting the row corresponding to in the and matrices. One added benefit of this approach is that one can stylize a single image into painting styles with a single feed forward pass of the network with a batch size of .

GeneralIntroduced 20003 papers

Random Grayscale

Random Grayscale is an image data augmentation that converts an image to grayscale with probability .

Computer VisionIntroduced 20003 papers

Spatial & Temporal Attention

Spatial & temporal attention combines the advantages of spatial attention and temporal attention as it adaptively selects both important regions and key frames. Some works compute temporal attention and spatial attention separately, while others produce joint spatio & temporal attention maps. Further works focusing on capturing pairwise relations.

GeneralIntroduced 20003 papers

RDNet

Please enter a description about the method here

Computer VisionIntroduced 20003 papers

PMLM

Probabilistically Masked Language Model

Probabilistically Masked Language Model, or PMLM, is a type of language model that utilizes a probabilistic masking scheme, aiming to bridge the gap between masked and autoregressive language models. The basic idea behind the connection of two categories of models is similar to MADE by Germain et al (2015). PMLM is a masked language model with a probabilistic masking scheme, which defines the way sequences are masked by following a probabilistic distribution. The authors employ a simple uniform distribution of the masking ratio and name the model as u-PMLM.

Natural Language ProcessingIntroduced 20003 papers

RReLU

Randomized Leaky Rectified Linear Units

Randomized Leaky Rectified Linear Units, or RReLU, are an activation function that randomly samples the negative slope for activation values. It was first proposed and used in the Kaggle NDSB Competition. During training, is a random number sampled from a uniform distribution . Formally: where In the test phase, we take average of all the in training similar to dropout, and thus set to to get a deterministic result. As suggested by the NDSB competition winner, is sampled from . At test time, we use:

GeneralIntroduced 20003 papers

TaBERT

TaBERT is a pretrained language model (LM) that jointly learns representations for natural language sentences and (semi-)structured tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts. In summary, TaBERT's process for learning representations for NL sentences is as follows: Given an utterance and a table , TaBERT first creates a content snapshot of . This snapshot consists of sampled rows that summarize the information in most relevant to the input utterance. The model then linearizes each row in the snapshot, concatenates each linearized row with the utterance, and uses the concatenated string as input to a Transformer model, which outputs row-wise encoding vectors of utterance tokens and cells. The encodings for all the rows in the snapshot are fed into a series of vertical self-attention layers, where a cell representation (or an utterance token representation) is computed by attending to vertically-aligned vectors of the same column (or the same NL token). Finally, representations for each utterance token and column are generated from a pooling layer.

GeneralIntroduced 20003 papers

SqueezeNeXt Block

A SqueezeNeXt Block is a two-stage bottleneck module used in the SqueezeNeXt architecture to reduce the number of input channels to the 3 × 3 convolution. We decompose with separable convolutions to further reduce the number of parameters (orange parts), followed by a 1 × 1 expansion module.

Computer VisionIntroduced 20003 papers

TURL

TURL: Table Understanding through Representation Learning

Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such tables, there has been tremendous progress on a variety of tasks in the area of table understanding. However, existing work generally relies on heavily-engineered task- specific features and model architectures. In this paper, we present TURL, a novel framework that introduces the pre-training/fine- tuning paradigm to relational Web tables. During pre-training, our framework learns deep contextualized representations on relational tables in an unsupervised manner. Its universal model design with pre-trained representations can be applied to a wide range of tasks with minimal task-specific fine-tuning. Specifically, we propose a structure-aware Transformer encoder to model the row-column structure of relational tables, and present a new Masked Entity Recovery (MER) objective for pre-training to capture the semantics and knowledge in large-scale unlabeled data. We systematically evaluate TURL with a benchmark consisting of 6 different tasks for table understanding (e.g., relation extraction, cell filling). We show that TURL generalizes well to all tasks and substantially outperforms existing methods in almost all instances.

GeneralIntroduced 20003 papers

CrossViT

CrossViT is a type of vision transformer that uses a dual-branch architecture to extract multi-scale feature representations for image classification. The architecture combines image patches (i.e. tokens in a transformer) of different sizes to produce stronger visual features for image classification. It processes small and large patch tokens with two separate branches of different computational complexities and these tokens are fused together multiple times to complement each other. Fusion is achieved by an efficient cross-attention module, in which each transformer branch creates a non-patch token as an agent to exchange information with the other branch by attention. This allows for linear-time generation of the attention map in fusion instead of quadratic time otherwise.

Computer VisionIntroduced 20003 papers

ISPL

Implicit Subspace Prior Learning

Implicit Subspace Prior Learning, or ISPL, is a framework to approach dual-blind face restoration, with two major distinctions from previous restoration methods: 1) Instead of assuming an explicit degradation function between LQ and HQ domain, it establishes an implicit correspondence between both domains via a mutual embedding space, thus avoid solving the pathological inverse problem directly. 2) A subspace prior decomposition and fusion mechanism to dynamically handle inputs at varying degradation levels with consistent high-quality restoration results.

Computer VisionIntroduced 20003 papers

Polyak Averaging

Polyak Averaging is an optimization technique that sets final parameters to an average of (recent) parameters visited in the optimization trajectory. Specifically if in iterations we have parameters , then Polyak Averaging suggests setting Image Credit: Shubhendu Trivedi & Risi Kondor

GeneralIntroduced 19913 papers

PolarMask

PolarMask is an anchor-box free and single-shot instance segmentation method. Specifically, PolarMask takes an image as input and predicts the distance from a sampled positive location (ie a candidate object's center) with respect to the object's contour at each angle, and then assembles the predicted points to produce the final mask. There are several benefits to the system: (1) The polar representation unifies instance segmentation (masks) and object detection (bounding boxes) into a single framework (2) Two modules are designed (i.e. soft polar centerness and polar IoU loss) to sample high-quality center examples and optimize polar contour regression, making the performance of PolarMask does not depend on the bounding box prediction results and more efficient in training. (3) PolarMask is fully convolutional and can be embedded into most off-the-shelf detection methods.

Computer VisionIntroduced 20003 papers

DiffAugment

Differentiable Augmentation (DiffAugment) is a set of differentiable image transformations used to augment data during GAN training. The transformations are applied to the real and generated images. It enables the gradients to be propagated through the augmentation back to the generator, regularizes the discriminator without manipulating the target distribution, and maintains the balance of training dynamics. Three choices of transformation are preferred by the authors in their experiments: Translation, CutOut, and Color.

Computer VisionIntroduced 20003 papers

ControlVAE

ControlVAE is a variational autoencoder (VAE) framework that combines the automatic control theory with the basic VAE to stabilize the KL-divergence of VAE models to a specified value. It leverages a non-linear PI controller, a variant of the proportional-integral-derivative (PID) control, to dynamically tune the weight of the KL-divergence term in the evidence lower bound (ELBO) using the output KL-divergence as feedback. This allows for control of the KL-divergence to a desired value (set point), which is effective in avoiding posterior collapse and learning disentangled representations.

Computer VisionIntroduced 20003 papers

I-BERT

I-BERT is a quantized version of BERT that quantizes the entire inference with integer-only arithmetic. Based on lightweight integer only approximation methods for nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, it performs an end-to-end integer-only BERT inference without any floating point calculation. In particular, GELU and Softmax are approximated with lightweight second-order polynomials, which can be evaluated with integer-only arithmetic. For LayerNorm, integer-only computation is performed by leveraging a known algorithm for integer calculation of square root.

Natural Language ProcessingIntroduced 20003 papers

ZoomNet

ZoomNet is a 2D human whole-body pose estimation technique. It aims to localize dense landmarks on the entire human body including face, hands, body, and feet. ZoomNet follows the top-down paradigm. Given a human bounding box of each person, ZoomNet first localizes the easy-to-detect body keypoints and estimates the rough position of hands and face. Then it zooms in to focus on the hand/face areas and predicts keypoints using features with higher resolution for accurate localization. Unlike previous approaches which usually assemble multiple networks, ZoomNet has a single network that is end-to-end trainable. It unifies five network heads including the human body pose estimator, hand and face detectors, and hand and face pose estimators into a single network with shared low-level features.

Computer VisionIntroduced 20003 papers

BCA-Segmentation

Segmentation of patchy areas in biomedical images based on local edge density estimation

An effective approach to the quantification of patchiness in biomedical images according to their local edge densities.

Computer VisionIntroduced 20003 papers

DetNAS

DetNAS is a neural architecture search algorithm for the design of better backbones for object detection. It is based on the technique of one-shot supernet, which contains all possible networks in the search space. The supernet is trained under the typical detector training schedule: ImageNet pre-training and detection fine-tuning. Then, the architecture search is performed on the trained supernet, using the detection task as the guidance. DetNAS uses evolutionary search as opposed to RL-based methods or gradient-based methods.

GeneralIntroduced 20003 papers

PreviousPage 21 of 175Next