arXiv:2604.02155 — Brief Is Better: Non-Monotonic CoT Budget Effects in Function-Calling Agents
by arXiv
Description
The paper delivers an unexpected but well-supported finding: **function-calling agents should think briefly, not deeply.** The optimal CoT budget for tool selection is 8–16 tokens — approximately one sentence identifying the function and key arguments. Beyond that, reasoning quality degrades through a documented "dual failure" mechanism where extended thinking causes both function hallucination (the model generates names outside the candidate set) and wrong-function selection (the model talks itself out of the correct choice).
Summary
d=8–16 tokens optimal for tool routing; d=256+ collapses below no-CoT baseline (44%→25%, Qwen2.5-1.5B, BFCL-v3). Mechanism: brief CoT eliminates wrong-fn-selection (30.5%→1.5%); extended CoT triggers ...
Tags
Related Tools
ARIS: Auto-Claude Code Research in Sleep — Deep Analysis
ARIS:
**ARIS** is a methodology-first, Markdown-driven skill system for autonomous ML research workflows. It orchestrates **cross-model collaboration** — Claude Code executes research while an external LLM (Codex, Gemini, or other) reviews work as an adversarial critic. The entire system is files + plain Markdown skills (no database, no framework), making it portable across Claude Code, Cursor, Trae, Codex CLI, and other agents.
arXiv:2603.03329 — AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness
arXiv
AutoHarness tackles a critical LLM agent failure mode: **agents making illegal/invalid actions**.
HN Multi-Agent Framework Link Triage
HN
**47 unique URLs extracted** across 6 categories from 6 HN threads (1,100+ combined points, 418 comments). The HN multi-agent community is skeptical of framework proliferation but hungry for: