SwissNYF: Tool Grounded LLM Agents for Black Box Setting

Somnath Sendhil Kumar, Dhruv Jain, Eshaan Agarwal, Raunak Pandey

2024-02-15Trajectory Planning Program Synthesis

Abstract

While Large Language Models (LLMs) have demonstrated enhanced capabilities in function-calling, these advancements primarily rely on accessing the functions' responses. This methodology is practical for simpler APIs but faces scalability issues with irreversible APIs that significantly impact the system, such as a database deletion API. Similarly, processes requiring extensive time for each API call and those necessitating forward planning, like automated action pipelines, present complex challenges. Furthermore, scenarios often arise where a generalized approach is needed because algorithms lack direct access to the specific implementations of these functions or secrets to use them. Traditional tool planning methods are inadequate in these cases, compelling the need to operate within black-box environments. Unlike their performance in tool manipulation, LLMs excel in black-box tasks, such as program synthesis. Therefore, we harness the program synthesis capabilities of LLMs to strategize tool usage in black-box settings, ensuring solutions are verified prior to implementation. We introduce TOPGUN, an ingeniously crafted approach leveraging program synthesis for black box tool planning. Accompanied by SwissNYF, a comprehensive suite that integrates black-box algorithms for planning and verification tasks, addressing the aforementioned challenges and enhancing the versatility and effectiveness of LLMs in complex API interactions. The public code for SwissNYF is available at https://github.com/iclr-dummy-user/SwissNYF.

Results

Task	Dataset	Metric	Value	Model
Industrial Robots	ToolBench	Win rate	86.54	GPT4-TOPGUN
Trajectory Planning	ToolBench	Win rate	86.54	GPT4-TOPGUN

Related Papers

Hierarchical Task Offloading for UAV-Assisted Vehicular Edge Computing via Deep Reinforcement Learning2025-07-08 CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs2025-07-08 Epona: Autoregressive Diffusion World Model for Autonomous Driving2025-06-30 Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning2025-06-30 AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning2025-06-16 Structured Program Synthesis using LLMs: Results and Insights from the IPARC Challenge2025-06-15 "What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)2025-06-11 Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning2025-06-11