AgentStack

arXiv:2604.02155 — Brief Is Better: Non-Monotonic CoT Budget Effects in Function-Calling Agents

by arXiv

7
research

Description

The paper delivers an unexpected but well-supported finding: **function-calling agents should think briefly, not deeply.** The optimal CoT budget for tool selection is 8–16 tokens — approximately one sentence identifying the function and key arguments. Beyond that, reasoning quality degrades through a documented "dual failure" mechanism where extended thinking causes both function hallucination (the model generates names outside the candidate set) and wrong-function selection (the model talks itself out of the correct choice).

Summary

d=8–16 tokens optimal for tool routing; d=256+ collapses below no-CoT baseline (44%→25%, Qwen2.5-1.5B, BFCL-v3). Mechanism: brief CoT eliminates wrong-fn-selection (30.5%→1.5%); extended CoT triggers ...

Tags

researchtypescriptpythonopen-sourcepaperllm
License: CC BY 4.0Added: 2026-04-03

Related Tools