TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/LILE: Look In-Depth before Looking Elsewhere -- A Dual Att...

LILE: Look In-Depth before Looking Elsewhere -- A Dual Attention Network using Transformers for Cross-Modal Information Retrieval in Histopathology Archives

Danial Maleki, H. R Tizhoosh

2022-03-02Cross-Modal RetrievalInformation RetrievalCross-Modal Information RetrievalRetrieval
PaperPDF

Abstract

The volume of available data has grown dramatically in recent years in many applications. Furthermore, the age of networks that used multiple modalities separately has practically ended. Therefore, enabling bidirectional cross-modality data retrieval capable of processing has become a requirement for many domains and disciplines of research. This is especially true in the medical field, as data comes in a multitude of types, including various types of images and reports as well as molecular data. Most contemporary works apply cross attention to highlight the essential elements of an image or text in relation to the other modalities and try to match them together. However, regardless of their importance in their own modality, these approaches usually consider features of each modality equally. In this study, self-attention as an additional loss term will be proposed to enrich the internal representation provided into the cross attention module. This work suggests a novel architecture with a new loss term to help represent images and texts in the joint latent space. Experiment results on two benchmark datasets, i.e. MS-COCO and ARCH, show the effectiveness of the proposed method.

Results

TaskDatasetMetricValueModel
Image Retrieval with Multi-Modal QueryCOCO 2014Image-to-text R@155.6LILE
Image Retrieval with Multi-Modal QueryCOCO 2014Image-to-text R@1091LILE
Image Retrieval with Multi-Modal QueryCOCO 2014Image-to-text R@582.4LILE
Image Retrieval with Multi-Modal QueryCOCO 2014Text-to-image R@141.5LILE
Image Retrieval with Multi-Modal QueryCOCO 2014Text-to-image R@1082.2LILE
Image Retrieval with Multi-Modal QueryCOCO 2014Text-to-image R@572.1LILE
Cross-Modal Information RetrievalCOCO 2014Image-to-text R@155.6LILE
Cross-Modal Information RetrievalCOCO 2014Image-to-text R@1091LILE
Cross-Modal Information RetrievalCOCO 2014Image-to-text R@582.4LILE
Cross-Modal Information RetrievalCOCO 2014Text-to-image R@141.5LILE
Cross-Modal Information RetrievalCOCO 2014Text-to-image R@1082.2LILE
Cross-Modal Information RetrievalCOCO 2014Text-to-image R@572.1LILE
Cross-Modal RetrievalCOCO 2014Image-to-text R@155.6LILE
Cross-Modal RetrievalCOCO 2014Image-to-text R@1091LILE
Cross-Modal RetrievalCOCO 2014Image-to-text R@582.4LILE
Cross-Modal RetrievalCOCO 2014Text-to-image R@141.5LILE
Cross-Modal RetrievalCOCO 2014Text-to-image R@1082.2LILE
Cross-Modal RetrievalCOCO 2014Text-to-image R@572.1LILE

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Context-Aware Search and Retrieval Over Erasure Channels2025-07-16