Description
VirText, or Visual representations from Textual annotations is a pretraining approach using semantically dense captions to learn visual representations. First a ConvNet and Transformer are jointly trained from scratch to generate natural language captions for images. Then, the learned features are transferred to downstream visual recognition tasks.
Papers Using This Method
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis2024-04-29Chaos-Based Bitwise Dynamical Pseudorandom Number Generator on FPGA2024-01-26Real-time FPGA Design for OMP Targeting 8K Image Reconstruction2021-10-10Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks2021-08-21FPGA Implementation of Simplified Spiking Neural Network2020-10-02VirTex: Learning Visual Representations from Textual Annotations2020-06-11