Engineering

CodeAct Executor: Let Your AutoAgents Agent Write Code, Not Just Call Tools

Apr 10, 2026 · 6 min

If you have built LLM features on top of tool calling, you have probably seen the same pattern: the model spends half the run deciding which function to call next, and the other half dragging intermediate results back through the context window.

That works. It is also an expensive way to solve problems that are mostly orchestration, data shaping, and arithmetic.

There are three things worth keeping in mind:

Tool-call loops get slow and expensive fast.
Models are weak at bookkeeping and arithmetic when they do it in plain language.
Modern models are often very good at writing short, usable programs.

AutoAgents leans into that third point with CodeActAgent, its executor for code-driven reasoning.

What CodeAct changes

With ReActAgent, the model reasons across multiple turns and calls tools directly.

With CodeActAgent, the model gets one high-leverage move: write a short TypeScript program and execute it in a sandbox.

Internally, that means the model is not juggling your full tool list one call at a time. It is given a single code-execution surface, and your registered tools are bridged into the sandbox as typed external_* functions.

That changes the execution model in a few important ways:

The model can use variables, loops, conditionals, arrays, and async control flow instead of treating every tool step as a separate reasoning turn.
Your registered Rust tools become callable inside the sandbox without losing their schemas.
Calculations happen inside a real runtime, not inside the model’s token prediction.
Each execution runs in a fresh sandbox with explicit resource limits.
AutoAgents keeps a structured execution trace, so you can inspect generated source, console output, tool calls, results, and failures after the run.

In practice, that means the model can stop acting like a brittle workflow engine and start acting like a small program synthesizer.

Why code execution beats chained tool calls

Direct tool calling is still useful, but it has an obvious cost profile.

Every extra tool invocation adds:

another round-trip through the model
more tokens for parameters and responses
more chances for the model to lose structure between steps
more room for arithmetic and aggregation errors

Code execution compresses that loop.

Instead of saying, “Call tool A, then wait, then call tool B, then summarize,” the model can write one TypeScript program that does the full job: fetch data, batch work, transform results, compute totals, log checkpoints, and return the final answer.

That is especially useful for tasks like:

multi-step tool composition
batched or parallel tool usage
data transformation and filtering
anything involving sums, averages, percentages, date math, or string processing
coding assistants that need a real execution surface instead of a chatty function loop

A small example

The upstream examples/code_mode sample keeps the tool surface intentionally small: AddNumbers and MultiplyNumbers.

That might sound trivial, but it makes the point clearly. For a prompt like:

What is the current UTC time, and what is (7 + 5) * 9 using the math tools? Answer in one short sentence.

a CodeAct-style solution is not “tool call, think, tool call, think.” It is closer to this:

const now = new Date().toISOString()
const sum = await external_AddNumbers({ left: 7, right: 5 })

console.log(`sum=${sum}`)

const total = await external_MultiplyNumbers({ left: sum, right: 9 })

return `Current UTC time: ${now}. (7 + 5) * 9 = ${total}.`

That is the real advantage. The model can mix built-in JavaScript capabilities such as Date, Math, arrays, JSON, and strings with your typed tools inside a single execution.

Because the sandbox is isolated, generated code should stick to built-in JavaScript globals and the registered tool bindings. No arbitrary imports, no host globals, and no hand-waving over intermediate work.

Setting it up in AutoAgents

CodeActAgent is available on native targets and requires the codeact feature.

A minimal setup looks like this:

cargo add autoagents --features openai,codeact
cargo add autoagents-derive serde serde_json tokio

From there, define the tools you want the sandbox to reach.

use autoagents::async_trait;
use autoagents::core::tool::{ToolCallError, ToolRuntime};
use autoagents::prelude::ToolT;
use autoagents_derive::{ToolInput, tool};
use serde::{Deserialize, Serialize};
use serde_json::Value;

#[derive(Debug, Serialize, Deserialize, ToolInput)]
pub struct BinaryMathArgs {
    #[input(description = "The left-hand integer operand")]
    pub left: i64,
    #[input(description = "The right-hand integer operand")]
    pub right: i64,
}

#[tool(
    name = "AddNumbers",
    description = "Add two integers and return the result",
    input = BinaryMathArgs,
    output = i64,
)]
#[derive(Default)]
pub struct AddNumbers;

#[async_trait]
impl ToolRuntime for AddNumbers {
    async fn execute(&self, args: Value) -> Result<Value, ToolCallError> {
        let args: BinaryMathArgs = serde_json::from_value(args)?;
        Ok(Value::from(args.left + args.right))
    }
}

MultiplyNumbers is the same pattern. The important part is not the math. The important part is that the tool schema is explicit, typed, and available to the sandboxed program.

Building the executor

Once the tools are in place, the executor wiring is compact.

use autoagents::core::agent::memory::SlidingWindowMemory;
use autoagents::core::agent::prebuilt::executor::{
    CodeActAgent, CodeActAgentOutput, CodeActExecutionRecord,
};
use autoagents::core::agent::task::Task;
use autoagents::core::agent::{AgentBuilder, AgentOutputT, DirectAgent};
use autoagents::llm::backends::openai::OpenAI;
use autoagents::llm::builder::LLMBuilder;
use autoagents_derive::{AgentHooks, agent};
use serde::{Deserialize, Serialize};
use serde_json::Value;
use std::sync::Arc;

#[derive(Debug, Clone, Serialize, Deserialize)]
struct CodeModeOutput {
    result: String,
    executions: Vec<CodeActExecutionRecord>,
}

impl AgentOutputT for CodeModeOutput {
    fn output_schema() -> &'static str {
        String::output_schema()
    }

    fn structured_output_format() -> Value {
        String::structured_output_format()
    }
}

impl From<CodeActAgentOutput> for CodeModeOutput {
    fn from(value: CodeActAgentOutput) -> Self {
        Self {
            result: value.response,
            executions: value.executions,
        }
    }
}

#[agent(
    name = "code_mode_agent",
    description = "Write one concise TypeScript script to solve the task. Use built-in JavaScript globals and the provided tools. Return a plain string.",
    tools = [AddNumbers, MultiplyNumbers],
    output = CodeModeOutput,
)]
#[derive(Clone, Default, AgentHooks)]
struct CodeModeAgent;

let api_key = std::env::var("OPENAI_API_KEY").expect("OPENAI_API_KEY must be set");

let llm: Arc<OpenAI> = LLMBuilder::<OpenAI>::new()
    .api_key(api_key)
    .model("gpt-4o")
    .build()
    .expect("failed to build OpenAI provider");

let handle = AgentBuilder::<_, DirectAgent>::new(CodeActAgent::new(CodeModeAgent))
    .llm(llm)
    .memory(Box::new(SlidingWindowMemory::new(8)))
    .build()
    .await?;

let result: CodeModeOutput = handle.agent.run(Task::new(prompt)).await?;

There are a few details worth noticing here:

The example uses SlidingWindowMemory, so recent context and tool interactions stay available across turns.
The executor output includes both the final response and a list of executions.
The prompt pushes the model toward one concise TypeScript program instead of a wandering multi-turn chain.

That last point matters. Code mode works best when the model is clearly told what kind of program to write, what built-in APIs are fair game, and what shape the final answer should take.

The execution trace is the feature

A lot of agent systems stop at “the model answered.”

CodeAct is more useful because it lets you inspect how the answer was produced.

The sample CLI prints every CodeActExecutionRecord, including:

execution id and duration
console output emitted by the sandbox
nested tool calls with arguments and results
the final sandbox result
any execution error
the generated TypeScript source

That is not just nice for demos. It is how you debug prompt quality, validate tool schemas, and understand why a code-driven agent succeeded or failed.

If a generated script is inefficient, you can tighten the system prompt. If the model is misusing a tool, you can fix the tool description or input schema. If a run fails, you have something concrete to inspect instead of a vague “the model got confused.”

When to choose CodeAct instead of ReAct

AutoAgents gives you both, and they solve different problems.

Use ReActAgent when:

each tool call is part of the reasoning process
you want explicit turn-by-turn tool orchestration
the workflow is better expressed as “think, act, observe”

Use CodeActAgent when:

the task is mostly tool composition and transformation
batching, loops, or local computation will reduce model turns
correctness depends on the runtime doing actual math or data processing
you want visibility into generated code and execution traces

Use BasicAgent when you just need a single prompt/response pass and no orchestration at all.

The bigger point

The interesting shift here is not TypeScript by itself.

It is that AutoAgents lets you move orchestration out of the model’s conversational loop and into executable code, while still keeping the agent interface clean. You define tools in Rust. You pick the executor. AutoAgents handles the run loop, memory, provider wiring, and execution records. The model contributes the shortest possible program for the task.

That is a better fit for real developer workflows.

If your agent keeps getting slower, chattier, and less reliable as you add tools, the answer may not be more prompt engineering. The answer may be to stop asking the model to imitate a runtime and let it write code for one instead.