OSCAR

Computer VisionIntroduced 200036 papers

Description

OSCAR is a new learning method that uses object tags detected in images as anchor points to ease the learning of image-text alignment. The model take a triple as input (word-tag-region) and pre-trained with two losses (masked token loss over words and tags, and a contrastive loss between tags and others). OSCAR represents an image-text pair into semantic space via dictionary lookup. Object tags are used as anchor points to align image regions with word embeddings of pre-trained language models. The model is then fine-tuned for understanding and generation tasks.

Papers Using This Method

OSCAR: One-Step Diffusion Codec for Image Compression Across Multiple Bit-rates2025-05-22OSCAR: Online Soft Compression And Reranking2025-03-17OSCAR: Object Status and Contextual Awareness for Recipes to Support Non-Visual Cooking2025-03-07One-Shot Federated Learning with Classifier-Free Diffusion Models2025-02-12Longitudinal Abuse and Sentiment Analysis of Hollywood Movie Dialogues using LLMs2025-01-20Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training2024-10-28OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning2024-10-24Towards Fine-Grained Webpage Fingerprinting at Scale2024-09-06IKUN for WMT24 General MT Task: LLMs Are here for Multilingual Machine Translation2024-08-21Tropical Expressivity of Neural Networks2024-05-30Strong Screening Rules for Group-based SLOPE Models2024-05-24Building a Large Japanese Web Corpus for Large Language Models2024-04-27Automated Model Selection for Generalized Linear Models2024-04-25Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages2024-03-13OSCaR: Object State Captioning and State Change Representation2024-02-27GlotScript: A Resource and Tool for Low Resource Writing System Identification2023-09-23A Unified Framework for Pattern Recovery in Penalized and Thresholded Estimation and its Geometry2023-07-19One-Stage Cascade Refinement Networks for Infrared Small Target Detection2022-12-16RobBERT-2022: Updating a Dutch Language Model to Account for Evolving Language Use2022-11-15VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations2022-07-01