A Benchmark for General Tool Agents
A benchmark to evaluate the tool-use capabilities of LLM-based agents in real-world scenarios.