VirTex

Computer VisionIntroduced 20006 papers

Description

VirText, or Visual representations from Textual annotations is a pretraining approach using semantically dense captions to learn visual representations. First a ConvNet and Transformer are jointly trained from scratch to generate natural language captions for images. Then, the learned features are transferred to downstream visual recognition tasks.

Papers Using This Method

HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis2024-04-29 Chaos-Based Bitwise Dynamical Pseudorandom Number Generator on FPGA2024-01-26 Real-time FPGA Design for OMP Targeting 8K Image Reconstruction2021-10-10 Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks2021-08-21 FPGA Implementation of Simplified Spiking Neural Network2020-10-02 VirTex: Learning Visual Representations from Textual Annotations2020-06-11