Skip to main content
Version: v2.0

Knowledge Base

Index tools provide vector database capabilities for semantic search and retrieval-augmented generation (RAG). Documents are embedded and stored so agents can retrieve relevant context by meaning, not just keyword match.

Setup

pip install aixplain
from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")

Quick start

import time

index_tool = aix.Tool(
name=f"Quick Index {int(time.time())}",
description="Vector database for semantic search.",
integration="6904bcf672a6e36b68bb72fb", # aiR vector database
)
index_tool.save()

# Add documents
index_tool.run(action="upsert", data={"records": [
{"id": "doc1", "text": "Python is a programming language.", "metadata": {"category": "tech"}},
{"id": "doc2", "text": "Machine learning uses algorithms.", "metadata": {"category": "ai"}},
]})

# Search
response = index_tool.run(action="search", data={"query": "programming"})
print(response.data)
Show output

Create the tool

index_tool = aix.Tool(
name=f"Index Tool {int(time.time())}",
description="Vector database for product information.",
integration="6904bcf672a6e36b68bb72fb",
)
index_tool.save()

index_tool.list_actions()
Show output

The integration ID 6904bcf672a6e36b68bb72fb points to aiXplain's aiR vector database service.

Available actions

ActionDescription
upsertAdd or update documents
searchQuery by semantic similarity
getRetrieve a document by ID
deleteRemove a document by ID
countReturn total document count
metadataInspect index configuration

Prepare documents

Each document requires an id and text field. metadata is optional but enables filtering.

# List of dicts (most common)
documents = [
{"id": "doc1", "text": "apple", "metadata": {"type": "simple", "color": "green"}},
{"id": "doc2", "text": "banana", "metadata": {"type": "simple", "color": "yellow"}},
{"id": "doc3", "text": "strawberry", "metadata": {"type": "aggregate", "color": "red"}},
]

Document text is capped at 100,000 characters. For longer documents, split them first or use the chunking option in upsert.

When loading from CSV, metadata columns are stored as strings and must be parsed back:

import ast, pandas as pd

df = pd.read_csv("documents.csv")
df["metadata"] = df["metadata"].apply(ast.literal_eval)
documents = df.to_dict(orient="records")

Upsert documents

# Basic
index_tool.run(action="upsert", data={"records": documents})

# With chunking for long texts
index_tool.run(action="upsert", data={
"records": documents,
"chunking": {
"split_by": "sentence", # "word", "sentence", or "character"
"split_length": 3, # units per chunk
"split_overlap": 1, # overlap between consecutive chunks
},
})
Show output

split_overlap prevents related content from being split across chunks with no shared context.

# Basic semantic search
response = index_tool.run(action="search", data={"query": "yellow fruit"})
for record in response.data:
print(record["id"], record["text"], record.get("score"))
Show output
# With metadata filters
response = index_tool.run(
action="search",
data={
"query": "yellow",
"top_k": 5,
"filters": [
{"field": "color", "operator": "==", "value": "yellow"},
{"field": "type", "operator": "==", "value": "simple"},
],
},
)
Show output

Filter operators: ==, !=, >, <, >=, <=, in, not in.

Get, delete, count, metadata

# Retrieve by ID
response = index_tool.run(action="get", data={"id": "doc1"})
# or
response = index_tool.run(action="get", data="doc1")

# Delete by ID
index_tool.run(action="delete", data={"id": "doc1"})

# Count all documents
response = index_tool.run(action="count")
print(response.data)

# Index configuration
response = index_tool.run(action="metadata")
print(response.data)
Show output

Use with agents

index_tool.allowed_actions = ["search", "get"]  # prevent agent from writing to the index

agent = aix.Agent(
name="Product Assistant",
description="Helps users find products.",
instructions="Search the product index to answer questions. Include price and brand in your answers.",
tools=[index_tool],
)
agent.save()

response = agent.run("Find affordable electronics under $200.")
print(response.data.output)
Show output

Restrict allowed_actions to ["search"] or ["search", "get"] for agents that should only read from the index.

Full example

import time
from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")

# 1. Create tool
index_tool = aix.Tool(
name=f"Product Index {int(time.time())}",
description="Vector database for product information.",
integration="6904bcf672a6e36b68bb72fb",
)
index_tool.save()

# 2. Upsert documents
index_tool.run(action="upsert", data={"records": [
{"id": "prod1", "text": "Wireless headphones with noise cancellation.", "metadata": {"category": "electronics", "price": 199}},
{"id": "prod2", "text": "Ergonomic office chair with lumbar support.", "metadata": {"category": "furniture", "price": 299}},
{"id": "prod3", "text": "Stainless steel water bottle, cold for 24 hours.", "metadata": {"category": "accessories", "price": 29}},
]})

# 3. Agent (read-only)
index_tool.allowed_actions = ["search", "get"]

agent = aix.Agent(
name="Product Assistant",
description="Helps users find products.",
instructions="Search the product index to answer questions. Include price and category.",
tools=[index_tool],
)
agent.save()

# 4. Run
response = agent.run("Find affordable electronics under $200.")
print(response.data.output)
Show output

Troubleshooting

Documents not appearing in search results Confirm the upsert succeeded with index_tool.run(action="count"). Try broadening the query or removing filters.

Metadata filters return no results Field names are case-sensitive and must match exactly. Verify value types match (string vs. number).

Agent does not use the tool Ensure instructions explicitly direct the agent to search the index. Confirm the tool is in tools and that allowed_actions includes "search".

CSV metadata parsing errors Apply ast.literal_eval after reading the CSV — pandas reads dict-like columns as plain strings.

Next steps