AgentStack

arXiv:2603.03329 — AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness

by arXiv

8
research

Description

AutoHarness tackles a critical LLM agent failure mode: **agents making illegal/invalid actions**.

Summary

Auto-synthesizes code wrappers (harnesses) that prevent LLM agents from making invalid actions. Key finding: 78% of Gemini-2.5-Flash chess losses were illegal moves — not strategy failures. Three mode...

Tags

researchtypescriptpythonmulti-agentopen-sourcepaperllm
License: CC BY 4.0Added: 2026-03-09

Related Tools