Papers With Code 2 | ML Benchmarks, SotA Results & Code

Object HalBench is a benchmark used to evaluate the performance of Language Models, particularly those that are multimodal (i.e., they can process and generate both text and images). It's designed to test how well these models can avoid "hallucinations" - generating text that is not factually grounded in the images they're processing¹.

For instance, the OmniLMM-12B model, which is a state-of-the-art open-source Language Model, has been reported to outperform GPT-4V on the Object HalBench¹. This model is aligned via a technique called multimodal RLHF (Reinforcement Learning from Human Feedback) for trustworthy behavior¹. This means it's designed to generate outputs that are more reliable and factually accurate, particularly when dealing with multimodal inputs¹.

(1) openbmb/OmniLMM-12B · Hugging Face. https://huggingface.co/openbmb/OmniLMM-12B. (2) OmniLMM：准确、高效的开源多模态大模型 - 知乎. https://zhuanlan.zhihu.com/p/681251797. (3) RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from - arXiv.org. https://arxiv.org/html/2312.00849v2. (4) README.md · openbmb/MiniCPM-V-2 at main - Hugging Face. https://huggingface.co/openbmb/MiniCPM-V-2/blob/main/README.md. (5) undefined. https://github.com/OpenBMB/OmniLMM.git.