ARES

ARES (Agentic Research and Evaluation Suite) is an RL-first framework for training and evaluating LLM agents, especially coding agents. It treats LLM requests as observations and LLM responses as actions within the environment, so you can focus on training the LLM within the agent—not the entire agent as a black box.

Overview

Unlike traditional frameworks that treat the entire code agent as the optimization target, ARES enables reinforcement learning on the LLM within the agent. This provides fine-grained control over long-horizon tasks and opens up new possibilities for mechanistic interpretability research.

The interface is entirely async and supports scaling to hundreds or thousands of parallel environments.

Key Features

RL-First Design: Built around the reinforcement learning loop with observations (LLM requests) and actions (LLM responses)
LLM-Level Optimization: Train the LLM within code agents, not just the agent as a whole
Distributed Workloads: Support for high-volume, distributed training and evaluation
Mechanistic Interpretability: Raw access to LLM requests and responses for deep analysis
Async Gym/dm_env-like Spec: Close to Gym/dm_env spec, with async methods for performance

Full Documentation

The complete ARES documentation (Core Concepts, How It Works, API reference, and examples) is hosted on Read the Docs:

ARES Documentation (Read the Docs)

GitHub Repository

Core Concepts — System architecture, Environment, CodeAgent, Container, LLMClient
How It Works — Queue-mediated communication, multiple environments, limitations

Install with: uv add martian-ares