Skip to main content
Version: v2.0

Agents

Agents are instruction-driven, model-powered reasoning components that follow a plan → act → observe → repeat loop. They use an LLM to decide the next action, call tools when needed, and return structured output in TEXT, MARKDOWN, or JSON format.

Setup

pip install aixplain
from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")

Quick start

Create and run a minimal agent to validate your setup:

agent = aix.Agent(
name="Hello Agent",
description="Answers general questions clearly and concisely.",
instructions="You are a helpful assistant.",
)
agent.save()

response = agent.run(query="What is machine learning?")
print(response.data.output)
print(agent.path)
Show output

agent.save() transitions the agent from DRAFT to ONBOARDED state and makes it callable.

How it works

Each run executes a reasoning loop:

1. INIT          → Load config, validate input, process variables
2. REASONING LOOP ⟲
├─> LLM plans next action
├─> Execute tools (if needed)
├─> Evaluate results
└─> Repeat until complete or max_iterations reached
3. RETURN → AgentResponse with output + metadata

The LLM autonomously decides which tools to call and when. Runs continue until the task is complete or max_iterations is hit.

Agent states

StateDescription
DRAFTCreated but not persisted. Call agent.save() to promote.
ONBOARDEDPersisted and production-ready.

Tools

Tools extend an agent beyond text generation. The agent decides autonomously when to invoke them.

# Marketplace tool
web_search_tool = aix.Tool.get("tavily/tavily-search-api")

# Model used as a tool
translation_model = aix.Model.get("google/translate-multi-lingual")

Pass tools at agent creation:

INSTRUCTIONS = """
You are a technical documentation assistant.
Think step-by-step when solving problems. Explain non-obvious choices.
Use tools only when internal knowledge is insufficient.
Prefer official sources when citing.
"""

agent = aix.Agent(
name="Research Agent",
description="Researches topics using web search.",
instructions=INSTRUCTIONS,
tools=[web_search_tool],
)
agent.save()

response = agent.run(query="What are the latest developments in AI safety?")
print(response.data.output)
Show output

You can test a tool in isolation before attaching it:

print(web_search_tool.list_actions())
response = web_search_tool.run(data="What is aiXplain?")
print(response.data)
Show output
tip

If the agent ignores a tool, check response.data.steps for what it attempted, then tighten the tool's name and description. If total parameters across all tools exceed 100, optional ones may be silently dropped — mark only the ones you need as required.

Learn more about tools →

LLM configuration

The default model is GPT-4o Mini. Override it at agent creation:

# Option A: Model ID
SONNET_MODEL_ID = "67be216bd8f6a65d6f74d5e9"
agent = aix.Agent(
name="Sonnet Agent",
description="...",
llm=aix.Model.get(SONNET_MODEL_ID),
)

# Option B: Fine-grained parameters via the inputs proxy
llm = aix.Model.get("openai/gpt-4o")
llm.inputs.temperature = 1
llm.inputs.max_tokens = 100_000

agent = aix.Agent(
name="Custom LLM Agent",
description="...",
llm=llm,
)

Choose an LLM based on: context window size, reasoning depth, latency requirements, cost per 1M tokens, tool-calling reliability, and multilingual quality.

Output format

Available formats: text (default) | markdown | json.

When using json, pass an expected_output schema. Three formats are accepted:

from pydantic import BaseModel
from typing import List, Dict

# Option 1: Text description of the shape
expected_output = """{"name": "string", "calories": "string"}"""

# Option 2: Dict
expected_output = {"name": "string", "calories": "string"}

# Option 3: Pydantic model (recommended — adds type validation)
class RecipeOutput(BaseModel):
name: str
description: str
ingredients: List[str]
instructions: str
nutrition: Dict[str, str]

expected_output = RecipeOutput

recipe_agent = aix.Agent(
name="Recipe Structurer",
description="Culinary assistant that returns structured recipes.",
instructions="Extract and organise recipe data into the required JSON shape. Use web search to fill gaps.",
tools=[web_search_tool],
output_format="json",
expected_output=expected_output,
)
recipe_agent.save()

response = recipe_agent.run("Chocolate cake recipe")
print(response.data.output)
Show output

Runtime parameters

agent.run() accepts the following parameters:

ParameterTypeDefaultDescription
querystrMain task or question.
contentdict | listNoneText, file paths, or URLs. Use dict for {{template}} substitution.
dataanyNoneAlternative to query + content — use one or the other.
variablesdictNoneValues substituted into {{placeholders}} in instructions / description.
session_idstrNoneResume a stateful session (14-day retention).
historylistNoneInject prior turns without a session.
max_tokensint2048Output token cap per run.
max_iterationsint5Max reasoning loop iterations.
runResponseGenerationboolTrueWhether to generate a final response after tool steps.
namestr"model_process"Execution label in logs.
trace_requestboolFalseReturn a request ID for backend tracing.
progress_formatstr | NoneNone"status" (single line) or "logs" (timeline). None disables output.
progress_verbosityint1Detail level: 1 minimal, 2 includes thoughts, 3 full I/O.
progress_truncateboolTrueTruncate long text in progress output.
timeoutint300Seconds to poll before the SDK stops waiting. The agent may continue server-side.
wait_timefloat0.5Seconds between polling checks.

Understanding max_tokens

Both llm.inputs.max_tokens and the per-run max_tokens cap output tokens only (not input/context). They are independent levers:

# Persistent default — applies to every invocation by this agent
llm.inputs.max_tokens = 100_000

# Per-run override — applies to this execution only
response = agent.run(query="...", max_tokens=4000)

Keep caps conservative. Raise max_tokens on a single run first; if truncation is frequent, raise llm.inputs.max_tokens permanently.

Variable substitution

Use {{variable}} placeholders in instructions or description, then supply values at runtime via variables:

agent = aix.Agent(
name="Multilingual Researcher",
description="Research assistant for {{topic}}.",
instructions="""
You are a research assistant specialising in {{topic}}.
Always respond in {{language}}.
Focus on peer-reviewed sources when available.
""",
tools=[web_search_tool],
)
agent.save()

response = agent.run(
query="What are the key challenges?",
variables={"topic": "quantum computing", "language": "Spanish"},
)
print(response.data.output)
Show output
note

variables substitution applies to instructions and description only — not to query or content.

Progress streaming

# Disabled (default)
response = agent.run(query="What is machine learning?")

# Compact single-line status
response = agent.run(
query="What is machine learning?",
progress_format="status",
progress_verbosity=1,
)

# Full timestamped log with agent reasoning
response = agent.run(
query="What is machine learning?",
progress_format="logs",
progress_verbosity=2,
progress_truncate=True,
)
Show output
LevelWhat's shown
1Step names and tool invocations
2Steps + agent reasoning / thoughts
3Full inputs and outputs at every step

Session management

Pass session_id to persist multi-turn context (stored 14 days, not used for training). Pass history to inject prior turns from an external source.

# Start a session
session_id = agent.generate_session_id()

response = agent.run(query="What is machine learning?", session_id=session_id)
print(response.data.output)

# Follow-up — agent retains context from the first turn
followup = agent.run(query="Give me a practical example.", session_id=session_id)
print(followup.data.output)
Show output
# Inject history manually (no server-side memory)
history = [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
]
response = agent.run("Tell me a fun fact about it.", history=history)
print(response.data.output)

# Seed a new session with existing history
session_id = agent.generate_session_id(history=history)
response = agent.run("Tell me more about that.", session_id=session_id)
print(response.data.output)
Show output

Tracing and monitoring

Every run returns structured traces. Use them for debugging and cost tracking:

response = agent.run(query="Summarise this page.", content=["https://en.wikipedia.org/wiki/Freedom"])

# Run outcome
print("Status:", response.status) # SUCCEEDED / FAILED / IN_PROGRESS
print("Output:", response.data.output) # Final answer
print("Completed:", response.completed) # True when run reached a terminal state
print("Error:", response.error_message)

# Reasoning steps
for i, step in enumerate(response.data.steps or []):
print(f"\n--- Step {i+1}: {step.get('agent')} ---")
print("Thought:", step.get("thought"))
print("Action:", step.get("action"))
print("Reason:", step.get("reason"))

for ts in step.get("tool_steps") or []:
print(" Tool:", ts.get("tool"))
print(" Input:", ts.get("input"))
print(" Output:", str(ts.get("output"))[:200], "…")
print(" Error:", ts.get("error"))

# Execution metrics
stats = response.data.executionStats or {}
print("\nAPI calls:", stats.get("api_calls"))
print("Credits:", stats.get("credits"))
print("Runtime:", stats.get("runtime"), "s")
print("Assets used:", stats.get("assets_used"))
print("Session ID:", stats.get("session_id"))
print("Run ID:", stats.get("params", {}).get("id"))
print("Request ID:", stats.get("request_id"))
Show output

API key rate limiting

Token-based rate limits support granular control via TokenType:

from aixplain.enums import TokenType
from aixplain.modules import APIKeyLimits

# Limit on combined input + output tokens
limits = APIKeyLimits(
token_limit=100_000,
token_type=TokenType.TOTAL,
)

# Limit on output tokens only (useful for cost control)
limits = APIKeyLimits(
token_limit=50_000,
token_type=TokenType.OUTPUT,
)
TokenTypeTracks
INPUTPrompt tokens only
OUTPUTCompletion tokens only
TOTALCombined input + output

Save and update

The typical lifecycle is create → save → run → update → save:

agent.output_format = "markdown"
agent.max_iterations = 15
agent.save()

Call agent.save() after any change to name, description, instructions, tools, llm, or output_format.

Troubleshooting

Agent ignores tools Inspect response.data.steps to see what the agent attempted. Check that the tool's name and description are unambiguous. If total tool parameters exceed 100, optional ones may be silently dropped.

agent reached the maximum number of iterations The agent hit max_iterations (default 5 for single agents, 30 for team agents). Raise it for complex tasks:

agent.max_iterations = 20
agent.save()

model response was cut off because the maximum token limit was reached Increase the LLM's persistent token cap:

llm = aix.Model.get("openai/gpt-4o")
llm.inputs.max_tokens = 100_000
agent.llm = llm
agent.save()

Agent response is cropped The agent's own max_tokens (default 2048) caps the final output, independent of the LLM cap. Raise it:

agent.max_tokens = 20_000
agent.save()

Next steps