SimVLM

Simple Visual Language Model

Computer VisionIntroduced 20003 papers

Description

SimVLM is a minimalist pretraining framework to reduce training complexity by exploiting large-scale weak supervision. It is trained end-to-end with a single prefix language modeling (PrefixLM) objective. PrefixLM enables bidirectional attention within the prefix sequence, and thus it is applicable for both decoder-only and encoder-decoder sequence-to-sequence language models.

Papers Using This Method

CoCa: Contrastive Captioners are Image-Text Foundation Models2022-05-04 MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning2021-12-09 SimVLM: Simple Visual Language Model Pretraining with Weak Supervision2021-08-24