8,725 machine learning methods and techniques
The goal of Triplet loss, in the context of Siamese Networks, is to maximize the joint probability among all score-pairs i.e. the product of all probabilities. By using its negative logarithm, we can get the loss formulation as follows: where the balance weight is used to keep the loss with the same scale for different number of instance sets.
CARLA: An Open Urban Driving Simulator
CARLA is an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. Source: Dosovitskiy et al. Image source: Dosovitskiy et al.
GAN Least Squares Loss is a least squares loss function for generative adversarial networks. Minimizing this objective function is equivalent to minimizing the Pearson divergence. The objective function (here for LSGAN) can be defined as: where and are the labels for fake data and real data and denotes the value that wants to believe for fake data.
Local Response Normalization is a normalization layer that implements the idea of lateral inhibition. Lateral inhibition is a concept in neurobiology that refers to the phenomenon of an excited neuron inhibiting its neighbours: this leads to a peak in the form of a local maximum, creating contrast in that area and increasing sensory perception. In practice, we can either normalize within the same channel or normalize across channels when we apply LRN to convolutional neural networks. Where the size is the number of neighbouring channels used for normalization, is multiplicative factor, an exponent and an additive factor
Mask R-CNN extends Faster R-CNN to solve instance segmentation tasks. It achieves this by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. In principle, Mask R-CNN is an intuitive extension of Faster R-CNN, but constructing the mask branch properly is critical for good results. Most importantly, Faster R-CNN was not designed for pixel-to-pixel alignment between network inputs and outputs. This is evident in how RoIPool, the de facto core operation for attending to instances, performs coarse spatial quantization for feature extraction. To fix the misalignment, Mask R-CNN utilises a simple, quantization-free layer, called RoIAlign, that faithfully preserves exact spatial locations. Secondly, Mask R-CNN decouples mask and class prediction: it predicts a binary mask for each class independently, without competition among classes, and relies on the network's RoI classification branch to predict the category. In contrast, an FCN usually perform per-pixel multi-class categorization, which couples segmentation and classification.
Shrink and Fine-Tune
Shrink and Fine-Tune, or SFT, is a type of distillation that avoids explicit distillation by copying parameters to a student student model and then fine-tuning. Specifically it extracts a student model from the maximally spaced layers of a fine-tuned teacher. Each layer is copied fully from . For example, when creating a BART student with 3 decoder layers from the 12 encoder layer 12 decoder layer teacher, we copy the teacher’s full and decoder layers 0, 6, and 11 to the student. When deciding which layers to copy, we break ties arbitrarily; copying layers 0, 5, and 11 might work just as well. When copy only 1 decoder layer, we copy layer 0. This was found this to work better than copying layer 11. The impact of initialization on performance is measured experimentally in Section 6.1. After initialization, the student model continues to fine-tune on the summarization dataset, with the objective of minimizing .
Direct Preference Optimization
Counterfactuals Explanations
Alternating Direction Method of Multipliers
The alternating direction method of multipliers (ADMM) is an algorithm that solves convex optimization problems by breaking them into smaller pieces, each of which are then easier to handle. It takes the form of a decomposition-coordination procedure, in which the solutions to small local subproblems are coordinated to find a solution to a large global problem. ADMM can be viewed as an attempt to blend the benefits of dual decomposition and augmented Lagrangian methods for constrained optimization. It turns out to be equivalent or closely related to many other algorithms as well, such as Douglas-Rachford splitting from numerical analysis, Spingarn’s method of partial inverses, Dykstra’s alternating projections method, Bregman iterative algorithms for l1 problems in signal processing, proximal methods, and many others. Text Source: https://stanford.edu/boyd/papers/pdf/admmdistrstats.pdf Image Source: here
Auxiliary Classifiers are type of architectural component that seek to improve the convergence of very deep networks. They are classifier heads we attach to layers before the end of the network. The motivation is to push useful gradients to the lower layers to make them immediately useful and improve the convergence during training by combatting the vanishing gradient problem. They are notably used in the Inception family of convolutional neural networks.
Non Maximum Suppression is a computer vision method that selects a single entity out of many overlapping entities (for example bounding boxes in object detection). The criteria is usually discarding entities that are below a given probability bound. With remaining entities we repeatedly pick the entity with the highest probability, output that as the prediction, and discard any remaining box where a with the box output in the previous step. Image Credit: Martin Kersner
Local Interpretable Model-Agnostic Explanations
LIME, or Local Interpretable Model-Agnostic Explanations, is an algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model. It modifies a single data sample by tweaking the feature values and observes the resulting impact on the output. It performs the role of an "explainer" to explain predictions from each data sample. The output of LIME is a set of explanations representing the contribution of each feature to a prediction for a single sample, which is a form of local interpretability. Interpretable models in LIME can be, for instance, linear regression or decision trees, which are trained on small perturbations (e.g. adding noise, removing words, hiding parts of the image) of the original model to provide a good local approximation.
Diffusion models applied to latent spaces, which are normally built with (Variational) Autoencoders.
Mixture of Experts
Spiking Neural Networks
Spiking Neural Networks (SNNs) are a class of artificial neural networks inspired by the structure and functioning of the brain's neural networks. Unlike traditional artificial neural networks that operate based on continuous firing rates, SNNs simulate the behavior of individual neurons through discrete spikes or action potentials. These spikes are triggered when the neuron's membrane potential reaches a certain threshold, and they propagate through the network, communicating information and triggering subsequent neuron activations. This spike-based communication allows SNNs to capture the temporal dynamics of information processing and exhibit asynchronous, event-driven behavior, making them well-suited for tasks such as temporal pattern recognition, event detection, and real-time processing. SNNs have gained attention due to their potential in efficiently processing and encoding information, offering advantages in energy efficiency, robustness, and compatibility with neuromorphic hardware architectures.
GloVe Embeddings
GloVe Embeddings are a type of word embedding that encode the co-occurrence probability ratio between two words as vector differences. GloVe uses a weighted least squares objective that minimizes the difference between the dot product of the vectors of two words and the logarithm of their number of co-occurrences: where and are the word vector and bias respectively of word , and are the context word vector and bias respectively of word , is the number of times word occurs in the context of word , and is a weighting function that assigns lower weights to rare and frequent co-occurrences.
Laplacian EigenMap
Laplacian Positional Encodings
Laplacian eigenvectors represent a natural generalization of the Transformer positional encodings (PE) for graphs as the eigenvectors of a discrete line (NLP graph) are the cosine and sinusoidal functions. They help encode distance-aware information (i.e., nearby nodes have similar positional features and farther nodes have dissimilar positional features). Hence, Laplacian Positional Encoding (PE) is a general method to encode node positions in a graph. For each node, its Laplacian PE is the k smallest non-trivial eigenvectors.
Autoencoders
An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise”. Along with the reduction side, a reconstructing side is learnt, where the autoencoder tries to generate from the reduced encoding a representation as close as possible to its original input, hence its name. Extracted from: Wikipedia Image source: Wikipedia
OPT is a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters. The model uses an AdamW optimizer and weight decay of 0.1. It follows a linear learning rate schedule, warming up from 0 to the maximum learning rate over the first 2000 steps in OPT-175B, or over 375M tokens in the smaller models, and decaying down to 10% of the maximum LR over 300B tokens. The batch sizes range from 0.5M to 4M depending on the model size and is kept constant throughout the course of training.
Spatial Pyramid Pooling (SPP) is a pooling layer that removes the fixed-size constraint of the network, i.e. a CNN does not require a fixed-size input image. Specifically, we add an SPP layer on top of the last convolutional layer. The SPP layer pools the features and generates fixed-length outputs, which are then fed into the fully-connected layers (or other classifiers). In other words, we perform some information aggregation at a deeper stage of the network hierarchy (between convolutional layers and fully-connected layers) to avoid the need for cropping or warping at the beginning.
Fully Convolutional Network
Fully Convolutional Networks, or FCNs, are an architecture used mainly for semantic segmentation. They employ solely locally connected layers, such as convolution, pooling and upsampling. Avoiding the use of dense layers means less parameters (making the networks faster to train). It also means an FCN can work for variable image sizes given all connections are local. The network consists of a downsampling path, used to extract and interpret the context, and an upsampling path, which allows for localization. FCNs also employ skip connections to recover the fine-grained spatial information lost in the downsampling path.
To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media.ggfdf How do I speak to a person at Expedia?How do I speak to a person at Expedia?To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media.To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media. To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media.To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media.To make a reservation or communicate with Expedia, the quickest option is typically to call their customer service at +1-805-330-4056 or +1-805-330-4056. You can also use the live chat feature on their website or app, or contact them via social media.chgd
SSD is a single-stage object detection method that discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. The fundamental improvement in speed comes from eliminating bounding box proposals and the subsequent pixel or feature resampling stage. Improvements over competing single-stage methods include using a small convolutional filter to predict object categories and offsets in bounding box locations, using separate predictors (filters) for different aspect ratio detections, and applying these filters to multiple feature maps from the later stages of a network in order to perform detection at multiple scales.
Diffusion-Convolutional Neural Networks
Diffusion-convolutional neural networks (DCNN) is a model for graph-structured data. Through the introduction of a diffusion-convolution operation, diffusion-based representations can be learned from graph structured data and used as an effective basis for node classification. Description and image from: Diffusion-Convolutional Neural Networks
Attention Model
Class-activation map
Class activation maps could be used to interpret the prediction decision made by the convolutional neural network (CNN). Image source: Learning Deep Features for Discriminative Localization
Independent Component Analysis
Independent component analysis (ICA) is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals. ICA defines a generative model for the observed multivariate data, which is typically given as a large database of samples. In the model, the data variables are assumed to be linear mixtures of some unknown latent variables, and the mixing system is also unknown. The latent variables are assumed nongaussian and mutually independent, and they are called the independent components of the observed data. These independent components, also called sources or factors, can be found by ICA. ICA is superficially related to principal component analysis and factor analysis. ICA is a much more powerful technique, however, capable of finding the underlying factors or sources when these classic methods fail completely. Extracted from (https://www.cs.helsinki.fi/u/ahyvarin/whatisica.shtml) Source papers: Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture90079-X) Independent component analysis, A new concept?90029-9) Independent component analysis: algorithms and applications00026-5)
Random Gaussian Blur is an image data augmentation technique where we randomly blur the image using a Gaussian distribution. Image Source: Wikipedia
Genetic Algorithms
Genetic Algorithms are search algorithms that mimic Darwinian biological evolution in order to select and propagate better solutions.
YOLOv3 is a real-time, single-stage object detection model that builds on YOLOv2 with several improvements. Improvements include the use of a new backbone network, Darknet-53 that utilises residual connections, or in the words of the author, "those newfangled residual network stuff", as well as some improvements to the bounding box prediction step, and use of three different scales from which to extract features (similar to an FPN).
Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly. This can be simply applied to the discrete setting described above, but also generalizes to continuous and mixed spaces. It can outperform Grid search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm. In this case, the optimization problem is said to have a low intrinsic dimensionality. Random Search is also embarrassingly parallel, and additionally allows the inclusion of prior knowledge by specifying the distribution from which to sample. Extracted from Wikipedia Source Paper Image Source: BERGSTRA AND BENGIO
You Only Look Once
Normalized Temperature-scaled Cross Entropy Loss
NT-Xent, or Normalized Temperature-scaled Cross Entropy Loss, is a loss function. Let denote the cosine similarity between two vectors and . Then the loss function for a positive pair of examples is : where {} is an indicator function evaluating to iff and denotes a temperature parameter. The final loss is computed across all positive pairs, both and , in a mini-batch. Source: SimCLR
How do I file a claim against Expedia? How Do I File a Claim Against Expedia? – Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 for Fast Help & Exclusive Travel Discounts!Need to file a claim with Expedia? Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 now for expert assistance and unlock exclusive best deal offers on hotels, flights, and vacation packages. Get fast resolution on your travel issues while enjoying limited-time discounts that make your next trip smoother, more affordable, and stress-free. Call today—don’t miss out! How do I file a claim against Expedia? How Do I File a Claim Against Expedia? – Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 for Fast Help & Exclusive Travel Discounts!Need to file a claim with Expedia? Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 now for expert assistance and unlock exclusive best deal offers on hotels, flights, and vacation packages. Get fast resolution on your travel issues while enjoying limited-time discounts that make your next trip smoother, more affordable, and stress-free. Call today—don’t miss out!
Spatio-temporal stability analysis
Spatio-temporal features extraction that measure the stabilty. The proposed method is based on a compression algorithm named Run Length Encoding. The workflow of the method is presented bellow.
fastText embeddings exploit subword information to construct word embeddings. Representations are learnt of character -grams, and words represented as the sum of the -gram vectors. This extends the word2vec type models with subword information. This helps the embeddings understand suffixes and prefixes. Once a word is represented using character -grams, a skipgram model is trained to learn the embeddings.
A HyperNetwork is a network that generates weights for a main network. The behavior of the main network is the same with any usual neural network: it learns to map some raw inputs to their desired targets; whereas the hypernetwork takes a set of inputs that contain information about the structure of the weights and generates the weight for that layer.
https://developer.nvidia.com/blog/flip-a-difference-evaluator-for-alternating-images/
Embeddings from Language Models, or ELMo, is a type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. A biLM combines both a forward and backward LM. ELMo jointly maximizes the log likelihood of the forward and backward directions. To add ELMo to a supervised model, we freeze the weights of the biLM and then concatenate the ELMo vector with and pass the ELMO enhanced representation into the task RNN. Here is a context-independent token representation for each token position. Image Source: here
Neural Tangent Kernel
How Do I File a Claim with Expedia? Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 for Fast Help & Exclusive Travel Discounts!Need to file a claim with Expedia? Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 now for immediate support and unlock exclusive best deal offers on hotels, flights, and vacation packages. Resolve your issue quickly while enjoying limited-time travel discounts that make your next trip smoother, more affordable, and worry-free. Don’t miss out—call today and save! .How do I get a full refund from Expedia? How Do I Communicate with Expedia? – Call ☎️ +1-(888) 829 (0881) or +1-805-330-4056 or +1-805-330-4056 for 24/7 Support & Exclusive Travel Discounts!Need to reach Expedia fast? Call now to speak directly with a live agent and unlock exclusive best deal discounts on flights, hotels, and vacation packages. Get personalized assistance while enjoying limited-time travel offers that make your next journey smoother, more affordable, and stress-free. Don’t wait—call today and save!
Spectral Normalization is a normalization technique used for generative adversarial networks, used to stabilize training of the discriminator. Spectral normalization has the convenient property that the Lipschitz constant is the only hyper-parameter to be tuned. It controls the Lipschitz constant of the discriminator by constraining the spectral norm of each layer . The Lipschitz norm is equal to , where is the spectral norm of the matrix ( matrix norm of ): which is equivalent to the largest singular value of . Therefore for a linear layer the norm is given by . Spectral normalization normalizes the spectral norm of the weight matrix so it satisfies the Lipschitz constraint :
Deep Deterministic Policy Gradient
DDPG, or Deep Deterministic Policy Gradient, is an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. It combines the actor-critic approach with insights from DQNs: in particular, the insights that 1) the network is trained off-policy with samples from a replay buffer to minimize correlations between samples, and 2) the network is trained with a target Q network to give consistent targets during temporal difference backups. DDPG makes use of the same ideas along with batch normalization.
Colorization is a self-supervision approach that relies on colorization as the pretext task in order to learn image representations.
Discrete Cosine Transform (DCT) is an orthogonal transformation method that decomposes an image to its spatial frequency spectrum. It expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. It is used a lot in compression tasks, e..g image compression where for example high-frequency components can be discarded. It is a type of Fourier-related Transform, similar to discrete fourier transforms (DFTs), but only using real numbers. Image Credit: Wikipedia
RetinaNet is a one-stage object detection model that utilizes a focal loss function to address class imbalance during training. Focal loss applies a modulating term to the cross entropy loss in order to focus learning on hard negative examples. RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks. The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-shelf convolutional network. The first subnet performs convolutional object classification on the backbone's output; the second subnet performs convolutional bounding box regression. The two subnetworks feature a simple design that the authors propose specifically for one-stage, dense detection. We can see the motivation for focal loss by comparing with two-stage object detectors. Here class imbalance is addressed by a two-stage cascade and sampling heuristics. The proposal stage (e.g., Selective Search, EdgeBoxes, DeepMask, RPN) rapidly narrows down the number of candidate object locations to a small number (e.g., 1-2k), filtering out most background samples. In the second classification stage, sampling heuristics, such as a fixed foreground-to-background ratio, or online hard example mining (OHEM), are performed to maintain a manageable balance between foreground and background. In contrast, a one-stage detector must process a much larger set of candidate object locations regularly sampled across an image. To tackle this, RetinaNet uses a focal loss function, a dynamically scaled cross entropy loss, where the scaling factor decays to zero as confidence in the correct class increases. Intuitively, this scaling factor can automatically down-weight the contribution of easy examples during training and rapidly focus the model on hard examples. Formally, the Focal Loss adds a factor to the standard cross entropy criterion. Setting reduces the relative loss for well-classified examples (), putting more focus on hard, misclassified examples. Here there is tunable focusing parameter .
self-DIstillation with NO labels
DINO (self-distillation with no labels) is a self-supervised learning method that directly predicts the output of a teacher network - built with a momentum encoder - using a standard cross-entropy loss. In the example to the right, DINO is illustrated in the case of one single pair of views for simplicity. The model passes two different random transformations of an input image to the student and teacher networks. Both networks have the same architecture but other parameters. The output of the teacher network is centered with a mean computed over the batch. Each network outputs a dimensional feature normalized with a temperature softmax over the feature dimension. Their similarity is then measured with a cross-entropy loss. A stop-gradient (sg) operator is applied to the teacher to propagate gradients only through the student. The teacher parameters are updated with the student parameters' exponential moving average (ema).
CutMix is an image data augmentation strategy. Instead of simply removing pixels as in Cutout, we replace the removed regions with a patch from another image. The ground truth labels are also mixed proportionally to the number of pixels of combined images. The added patches further enhance localization ability by requiring the model to identify the object from a partial view.
A 3D Convolution is a type of convolution where the kernel slides in 3 dimensions as opposed to 2 dimensions with 2D convolutions. One example use case is medical imaging where a model is constructed using 3D image slices. Additionally video based data has an additional temporal dimension over images making it suitable for this module. Image: Lung nodule detection based on 3D convolutional neural networks, Fan et al