Description
The composed video retrieval (CoVR) task is a new task, where the goal is to find a video that matches both a query image and a query text. The query image represents a visual concept that the user is interested in, and the query text specifies how the concept should be modified or refined. For example, given an image of a fountain and the text during show at night, the CoVR task is to retrieve a video that shows the fountain at night with a show.
Papers Using This Method
From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos2025-06-05VDebugger: Harnessing Execution Feedback for Debugging Visual Programs2024-06-19Composed Video Retrieval via Enriched Context and Discriminative Embeddings2024-03-25CoVR-2: Automatic Data Construction for Composed Video Retrieval2023-08-28