Agents
Agents are instruction-driven, model-powered reasoning components that follow a plan → act → observe → repeat loop. They use an LLM to decide the next action, call tools when needed, and return structured output in TEXT, MARKDOWN, or JSON format.
Setup
pip install aixplain
from aixplain import Aixplain
aix = Aixplain(api_key="YOUR_API_KEY")
Quick start
Create and run a minimal agent to validate your setup:
agent = aix.Agent(
name="Hello Agent",
description="Answers general questions clearly and concisely.",
instructions="You are a helpful assistant.",
)
agent.save()
response = agent.run(query="What is machine learning?")
print(response.data.output)
print(agent.path)
agent.save() transitions the agent from DRAFT to ONBOARDED state and makes it callable.
How it works
Each run executes a reasoning loop:
1. INIT → Load config, validate input, process variables
2. REASONING LOOP ⟲
├─> LLM plans next action
├─> Execute tools (if needed)
├─> Evaluate results
└─> Repeat until complete or max_iterations reached
3. RETURN → AgentResponse with output + metadata
The LLM autonomously decides which tools to call and when. Runs continue until the task is complete or max_iterations is hit.
Agent states
| State | Description |
|---|---|
DRAFT | Created but not persisted. Call agent.save() to promote. |
ONBOARDED | Persisted and production-ready. |
Tools
Tools extend an agent beyond text generation. The agent decides autonomously when to invoke them.
# Marketplace tool
web_search_tool = aix.Tool.get("tavily/tavily-search-api")
# Model used as a tool
translation_model = aix.Model.get("google/translate-multi-lingual")
Pass tools at agent creation:
# replaces: LangChain AgentExecutor + ReAct prompt engineering
# one Agent() call, runtime loop handled by AgenticOS
INSTRUCTIONS = """
You are a technical documentation assistant.
Think step-by-step when solving problems. Explain non-obvious choices.
Use tools only when internal knowledge is insufficient.
Prefer official sources when citing.
"""
agent = aix.Agent(
name="Research Agent",
description="Researches topics using web search.",
instructions=INSTRUCTIONS,
tools=[web_search_tool],
)
agent.save()
response = agent.run(query="What are the latest developments in AI safety?")
print(response.data.output)
You can test a tool in isolation before attaching it:
print(web_search_tool.list_actions())
response = web_search_tool.run(data="What is aiXplain?")
print(response.data)
If the agent ignores a tool, check response.data.steps for what it attempted, then tighten the tool's name and description. If total parameters across all tools exceed 100, optional ones may be silently dropped — mark only the ones you need as required.
LLM configuration
The default model is GPT-5.4. Override it at agent creation:
# Option A: Model ID
SONNET_MODEL_ID = "67be216bd8f6a65d6f74d5e9"
agent = aix.Agent(
name="Sonnet Agent",
description="...",
llm=aix.Model.get(SONNET_MODEL_ID),
)
# Option B: Fine-grained parameters via the inputs proxy
llm = aix.Model.get("openai/gpt-5.4")
llm.inputs.temperature = 1
llm.inputs.max_tokens = 100_000
agent = aix.Agent(
name="Custom LLM Agent",
description="...",
llm=llm,
)
# Option C: Reasoning effort (GPT-5.4 and other reasoning models)
llm = aix.Model.get("openai/gpt-5.4")
llm.inputs.reasoning_effort = "high" # "low" | "medium" | "high"
agent = aix.Agent(
name="Deep Reasoning Agent",
description="Handles complex, multi-step analysis.",
llm=llm,
)
Choose an LLM based on: context window size, reasoning depth, latency requirements, cost per 1M tokens, tool-calling reliability, and multilingual quality.
Output format
Available formats: text (default) | markdown | json.
When using json, pass an expected_output schema. Three formats are accepted:
from pydantic import BaseModel
from typing import List, Dict
from aixplain.v2.agent import OutputFormat
# Option 1: Text description of the shape
expected_output = """{"name": "string", "calories": "string"}"""
# Option 2: Dict
expected_output = {"name": "string", "calories": "string"}
# Option 3: Pydantic model (recommended — adds type validation)
class RecipeOutput(BaseModel):
name: str
description: str
ingredients: List[str]
instructions: str
nutrition: Dict[str, str]
expected_output = RecipeOutput
recipe_agent = aix.Agent(
name="Recipe Structurer",
description="Culinary assistant that returns structured recipes.",
instructions="Extract and organise recipe data into the required JSON shape. Use web search to fill gaps.",
tools=[web_search_tool],
output_format=OutputFormat.JSON,
expected_output=expected_output,
)
recipe_agent.save()
response = recipe_agent.run("Chocolate cake recipe")
print(response.data.output)
Runtime parameters
agent.run() accepts the following parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
query | str | — | Main task or question. |
content | dict | list | None | Text, file paths, or URLs. Use dict for {{template}} substitution. |
data | any | None | Alternative to query + content — use one or the other. |
variables | dict | None | Values substituted into {{placeholders}} in instructions / description. |
session_id | str | None | Resume a stateful session (14-day retention). |
history | list | None | Inject prior turns without a session. |
max_tokens | int | 2048 | Output token cap per run. |
max_iterations | int | 5 | Max reasoning loop iterations. |
runResponseGeneration | bool | True | Whether to generate a final response after tool steps. |
name | str | "model_process" | Execution label in logs. |
trace_request | bool | False | Return a request ID for backend tracing. |
progress_format | str | None | None | "status" (single line) or "logs" (timeline). None disables output. |
progress_verbosity | int | 1 | Detail level: 1 minimal, 2 includes thoughts, 3 full I/O. |
progress_truncate | bool | True | Truncate long text in progress output. |
service_version | str | "V2" | Agentification engine version: "V2" (legacy) or "V3" (new). |
timeout | int | 300 | Seconds to poll before the SDK stops waiting. The agent may continue server-side. |
wait_time | float | 0.5 | Seconds between polling checks. |
Agentification engine (service_version)
Pass service_version to choose which agentification engine processes the run.
from aixplain.v2 import ServiceVersion
agent.run("Hello") # default: V2
agent.run("Hello", service_version="V3") # string form
agent.run("Hello", service_version=ServiceVersion.V3) # enum form
| Version | Description |
|---|---|
"V2" | Legacy engine (default). Stable and widely tested. |
"V3" | New engine. Supports tasks, richer step traces, and improved tool-call fidelity. |
Understanding max_tokens
Both llm.inputs.max_tokens and the per-run max_tokens cap output tokens only (not input/context). They are independent levers:
# Persistent default — applies to every invocation by this agent
llm.inputs.max_tokens = 100_000
# Per-run override — applies to this execution only
response = agent.run(query="...", max_tokens=4000)
Keep caps conservative. Raise max_tokens on a single run first; if truncation is frequent, raise llm.inputs.max_tokens permanently.
Variable substitution
Use {{variable}} placeholders in instructions or description, then supply values at runtime via variables:
agent = aix.Agent(
name="Multilingual Researcher",
description="Research assistant for {{topic}}.",
instructions="""
You are a research assistant specialising in {{topic}}.
Always respond in {{language}}.
Focus on peer-reviewed sources when available.
""",
tools=[web_search_tool],
)
agent.save()
response = agent.run(
query="What are the key challenges?",
variables={"topic": "quantum computing", "language": "Spanish"},
)
print(response.data.output)
variables substitution applies to instructions and description only — not to query or content.
Progress streaming
# Disabled (default)
response = agent.run(query="What is machine learning?")
# Compact single-line status
response = agent.run(
query="What is machine learning?",
progress_format="status",
progress_verbosity=1,
)
# Full timestamped log with agent reasoning
response = agent.run(
query="What is machine learning?",
progress_format="logs",
progress_verbosity=2,
progress_truncate=True,
)
| Level | What's shown |
|---|---|
1 | Step names and tool invocations |
2 | Steps + agent reasoning / thoughts |
3 | Full inputs and outputs at every step |
Session management
Pass session_id to persist multi-turn context (stored 14 days, not used for training). Pass history to inject prior turns from an external source.
# Start a session
session_id = agent.generate_session_id()
response = agent.run(query="What is machine learning?", session_id=session_id)
print(response.data.output)
# Follow-up — agent retains context from the first turn
followup = agent.run(query="Give me a practical example.", session_id=session_id)
print(followup.data.output)
# Inject history manually (no server-side memory)
history = [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
]
response = agent.run("Tell me a fun fact about it.", history=history)
print(response.data.output)
# Seed a new session with existing history
session_id = agent.generate_session_id(history=history)
response = agent.run("Tell me more about that.", session_id=session_id)
print(response.data.output)
Async calling
Use run_async() to start a run without blocking. The method returns immediately with a polling URL; call agent.poll(url) until result.completed is True.
import time
response = agent.run_async(query="Summarise the history of computing.")
while True:
if not response.url: # completed immediately (no polling needed)
print(response.data.output)
break
result = agent.poll(response.url)
if result.completed:
print(result.data.output)
break
time.sleep(5)
Batch async
Start multiple runs in parallel, then collect results as they finish:
import time
queries = [
"What are the benefits of cloud computing?",
"Explain blockchain in plain English.",
"What is reinforcement learning?",
]
# Kick off all runs
pending = []
for query in queries:
r = agent.run_async(query=query)
if r.url:
pending.append((query, r.url))
else:
print(f"[immediate] {r.data.output}\n")
# Poll until all finish
results = []
while pending:
for query, url in pending[:]:
result = agent.poll(url)
if result.completed:
results.append((query, result.data.output))
pending.remove((query, url))
time.sleep(3)
for query, output in results:
print(f"Q: {query}\nA: {output}\n")
Tracing and monitoring
Every run returns structured traces. Use them for debugging and cost tracking:
# replaces: LangSmith tracing + Helicone logging + custom middleware
# step-level traces are on by default; no instrumentation needed
response = agent.run(
query="What are the top programming languages in 2025?",
progress_format="logs",
progress_verbosity=1,
service_version="V3",
)
# Run outcome
print("Status: ", response.status) # SUCCESS / FAILED / IN_PROGRESS
print("Output: ", response.data.output)
print("Completed: ", response.completed)
print("Error: ", response.error_message)
# Top-level run metrics
print("Steps: ", len(response.data.steps or []))
print("Session ID: ", response.data.session_id)
print("Credits: ", response.used_credits)
print("Run time: ", response.run_time, "s")
Inspecting steps
response.data.steps contains every reasoning step. Each step has an agent (which sub-agent ran) and a unit (which model or tool it invoked).
import json
for i, step in enumerate(response.data.steps or []):
agent_info = step.get("agent", {})
unit_info = step.get("unit", {})
agent_name = agent_info.get("name", "Unknown") if isinstance(agent_info, dict) else str(agent_info)
unit_name = unit_info.get("name", "Unknown") if isinstance(unit_info, dict) else str(unit_info)
unit_type = unit_info.get("type", "") if isinstance(unit_info, dict) else ""
is_tool = unit_type.lower() == "tool"
print(f"\n--- Step {i+1}: {agent_name} → {unit_name} ({'Tool' if is_tool else 'LLM'}) ---")
print(f" API calls: {step.get('api_calls', 0)}")
print(f" Credits: {step.get('used_credits', 0):.6f}")
if step.get("thought"):
print(f" Thought: {str(step['thought'])[:200]}")
if step.get("task"):
print(f" Task: {step['task']}")
if step.get("input"):
inp = step["input"]
print(f" Input: {json.dumps(inp)[:300] if isinstance(inp, dict) else str(inp)[:300]}")
if step.get("output"):
out = step["output"]
print(f" Output: {json.dumps(out)[:300] if isinstance(out, dict) else str(out)[:300]}")
if step.get("error") or step.get("error_message"):
print(f" Error: {step.get('error') or step.get('error_message')}")
Step fields reference:
| Field | Description |
|---|---|
agent | Sub-agent that executed this step (dict with name and id). |
unit | Model or tool invoked (dict with name, id, and type). type is "tool" for tool calls, otherwise an LLM step. |
api_calls | Number of API calls made in this step. |
used_credits | Credits consumed by this step. |
thought | Agent's internal reasoning before acting (visible at progress_verbosity=2+). |
task | Task name assigned to this step (V3 only, when tasks are configured). |
input | Input passed to the unit. |
output | Output returned by the unit. |
error / error_message | Error if this step failed. |
Execution metrics
stats = response.data.execution_stats or {}
print("API calls:", stats.get("api_calls"))
print("Credits:", stats.get("credits"))
print("Runtime:", stats.get("runtime"), "s")
print("Assets used:", stats.get("assets_used"))
print("Session ID:", stats.get("session_id"))
print("Run ID:", stats.get("params", {}).get("id"))
print("Request ID:", stats.get("request_id"))
Evaluation
Evaluate one or more agents against a dataset of queries using quantitative metrics — instead of manually inspecting outputs.
Setup
from aixplain.v2.agent_evaluator import AgentEvaluationRun, Dataset
AgentEvaluationRun.configure_insights(aix)
Datasets
# From a list of queries
dataset = Dataset.from_queries(
["What is the capital of France?", "Who invented the telephone?"],
name="Quick Test",
)
# From a CSV file with references and metadata
dataset = Dataset.from_csv(
"eval_data.csv",
query_column="query",
reference_column="reference",
metadata_columns=["topic"],
name="My Eval Dataset",
)
reference is used by correctness-style metrics to score the agent's answer against ground truth. metadata_columns are carried through to results for filtering.
Metrics
metric_correctness = aix.Metric.get("aixplain-benchmarking/correctness-score/aixplain")
metric_correctness.agent_response_data_fields = aix.Metric.AgentResponseDataFields(
query=True, trace=True, output=True
)
metric_correctness.score_type = "numeric"
metric_harmfulness = aix.Metric.get("aixplain-benchmarking/harmfulness/aixplain")
metric_harmfulness.agent_response_data_fields = aix.Metric.AgentResponseDataFields(
query=True, trace=True, output=True
)
metric_harmfulness.score_type = "categorical"
AgentResponseDataFields controls what context the evaluator receives. score_type is "numeric" for continuous scores or "categorical" for discrete labels.
Custom metrics
metric = aix.Metric.create(
name="my-conciseness-metric",
metric_description="Checks whether the response is concise.",
llm_path="openai/gpt-5.4-nano/openai",
score_type="categorical",
instruction="Judge whether the response answers the question without unnecessary detail.",
categories=["Concise", "Verbose"],
detailed_rubric={
"Concise": "The response is direct and contains no superfluous content.",
"Verbose": "The response includes unnecessary repetition or filler.",
},
)
metric.agent_response_data_fields = aix.Metric.AgentResponseDataFields(
query=True, trace=True, output=True
)
Run an evaluation
evaluator = aix.Eval()
results = evaluator.evaluate(
agents=[agent_a, agent_b],
dataset=dataset,
metrics=[metric_correctness, metric_harmfulness],
)
Save and update
The typical lifecycle is create → save → run → update → save:
agent.output_format = "markdown"
agent.max_iterations = 15
agent.save()
Call agent.save() after any change to name, description, instructions, tools, llm, or output_format.
Troubleshooting
Agent ignores tools
Inspect response.data.steps to see what the agent attempted. Check that the tool's name and description are unambiguous. If total tool parameters exceed 100, optional ones may be silently dropped.
agent reached the maximum number of iterations
The agent hit max_iterations (default 5 for single agents, 30 for team agents). Raise it for complex tasks:
agent.max_iterations = 20
agent.save()
model response was cut off because the maximum token limit was reached
Increase the LLM's persistent token cap:
llm = aix.Model.get("openai/gpt-5.4")
llm.inputs.max_tokens = 100_000
agent.llm = llm
agent.save()
Agent response is cropped
The agent's own max_tokens (default 2048) caps the final output, independent of the LLM cap. Raise it:
agent.max_tokens = 20_000
agent.save()