OSCAR

Computer VisionIntroduced 200036 papers

Description

OSCAR is a new learning method that uses object tags detected in images as anchor points to ease the learning of image-text alignment. The model take a triple as input (word-tag-region) and pre-trained with two losses (masked token loss over words and tags, and a contrastive loss between tags and others). OSCAR represents an image-text pair into semantic space via dictionary lookup. Object tags are used as anchor points to align image regions with word embeddings of pre-trained language models. The model is then fine-tuned for understanding and generation tasks.

Papers Using This Method

OSCAR: One-Step Diffusion Codec for Image Compression Across Multiple Bit-rates2025-05-22 OSCAR: Online Soft Compression And Reranking2025-03-17 OSCAR: Object Status and Contextual Awareness for Recipes to Support Non-Visual Cooking2025-03-07 One-Shot Federated Learning with Classifier-Free Diffusion Models2025-02-12 Longitudinal Abuse and Sentiment Analysis of Hollywood Movie Dialogues using LLMs2025-01-20 Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training2024-10-28 OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning2024-10-24 Towards Fine-Grained Webpage Fingerprinting at Scale2024-09-06 IKUN for WMT24 General MT Task: LLMs Are here for Multilingual Machine Translation2024-08-21 Tropical Expressivity of Neural Networks2024-05-30 Strong Screening Rules for Group-based SLOPE Models2024-05-24 Building a Large Japanese Web Corpus for Large Language Models2024-04-27 Automated Model Selection for Generalized Linear Models2024-04-25 Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages2024-03-13 OSCaR: Object State Captioning and State Change Representation2024-02-27 GlotScript: A Resource and Tool for Low Resource Writing System Identification2023-09-23 A Unified Framework for Pattern Recovery in Penalized and Thresholded Estimation and its Geometry2023-07-19 One-Stage Cascade Refinement Networks for Infrared Small Target Detection2022-12-16 RobBERT-2022: Updating a Dutch Language Model to Account for Evolving Language Use2022-11-15 VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations2022-07-01