Agentic Critical Training (ACT)
by Agentic Critical Training
Description
ACT is a **two-stage RL training paradigm** that fixes the fundamental weakness of imitation learning (IL): IL teaches agents *what* expert actions look like, but never forces models to understand *why* those actions are better than alternatives.
Steal Patterns
**GRPO training infrastructure** — Requires GPU cluster; inapplicable to API-only Forge stack
**DeepSpeed ZeRO-3 configuration** — Distributed training framework, not applicable
**Phase 1/Phase 2 training pipeline** — Fully inaccessible without local model weights and training hardware
**K-sample alternative collection** — Only relevant if training; sampling from API models for training data violates most provider ToS
Tags
Related Tools
ARIS: Auto-Claude Code Research in Sleep — Deep Analysis
ARIS:
**ARIS** is a methodology-first, Markdown-driven skill system for autonomous ML research workflows. It orchestrates **cross-model collaboration** — Claude Code executes research while an external LLM (Codex, Gemini, or other) reviews work as an adversarial critic. The entire system is files + plain Markdown skills (no database, no framework), making it portable across Claude Code, Cursor, Trae, Codex CLI, and other agents.
arXiv:2603.03329 — AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness
arXiv
AutoHarness tackles a critical LLM agent failure mode: **agents making illegal/invalid actions**.
HN Multi-Agent Framework Link Triage
HN
**47 unique URLs extracted** across 6 categories from 6 HN threads (1,100+ combined points, 418 comments). The HN multi-agent community is skeptical of framework proliferation but hungry for: