TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Hierarchically Merging and Agent Refinement

Hierarchically Merging and Agent Refinement

Reported on 8 benchmarks across 1 task · 1 paper · 7 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Knowledge Base8 results

  • Text SummarizationonMENSA
    ROUGE-1· 2025-01-17
    31.31
    best: 44.91 (NexusSum (Mistral Large))
    SOTA
    Agent-as-Judge for Factual Summarization of Long NarrativesarXiv:2501.09993
  • Text SummarizationonMENSA
    ROUGE-2· 2025-01-17
    8.81
    best: 11.43 (NexusSum (Mistral Large))
    SOTA
    Agent-as-Judge for Factual Summarization of Long NarrativesarXiv:2501.09993
  • Text SummarizationonMENSA
    ROUGE-L· 2025-01-17
    18.62
    best: 21.52 (Zero-Shot (Mistral Large))
    SOTA
    Agent-as-Judge for Factual Summarization of Long NarrativesarXiv:2501.09993
  • Text SummarizationonMovieSum
    BERTScore (F1)· 2025-01-17
    59.32
    best: 63.53 (NexusSum (Mistral Large))
    SOTA
    Agent-as-Judge for Factual Summarization of Long NarrativesarXiv:2501.09993
  • Text SummarizationonMovieSum
    ROUGE-1· 2025-01-17
    31.31
    best: 44.91 (NexusSum (Mistral Large))
    SOTA
    Agent-as-Judge for Factual Summarization of Long NarrativesarXiv:2501.09993
  • Text SummarizationonMovieSum
    ROUGE-2· 2025-01-17
    8.81
    best: 11.43 (NexusSum (Mistral Large))
    SOTA
    Agent-as-Judge for Factual Summarization of Long NarrativesarXiv:2501.09993
  • Text SummarizationonMovieSum
    ROUGE-L· 2025-01-17
    18.62
    best: 22.55 (Zero-Shot (Mistral Large))
    SOTA
    Agent-as-Judge for Factual Summarization of Long NarrativesarXiv:2501.09993
  • Text SummarizationonMENSA
    BERTScore (F1)· 2025-01-17
    60.22
    best: 65.73 (NexusSum (Mistral Large))
    Agent-as-Judge for Factual Summarization of Long NarrativesarXiv:2501.09993