Knowledge Base
Index tools provide vector database capabilities for semantic search and retrieval-augmented generation (RAG). Documents are embedded and stored so agents can retrieve relevant context by meaning, not just keyword match.
Setup
pip install aixplain
from aixplain import Aixplain
aix = Aixplain(api_key="YOUR_API_KEY")
Quick start
import time
index_tool = aix.Tool(
name=f"Quick Index {int(time.time())}",
description="Vector database for semantic search.",
integration="6904bcf672a6e36b68bb72fb", # aiR vector database
)
index_tool.save()
# Add documents
index_tool.run(action="upsert", data={"records": [
{"id": "doc1", "text": "Python is a programming language.", "metadata": {"category": "tech"}},
{"id": "doc2", "text": "Machine learning uses algorithms.", "metadata": {"category": "ai"}},
]})
# Search
response = index_tool.run(action="search", data={"query": "programming"})
print(response.data)
Create the tool
index_tool = aix.Tool(
name=f"Index Tool {int(time.time())}",
description="Vector database for product information.",
integration="6904bcf672a6e36b68bb72fb",
)
index_tool.save()
index_tool.list_actions()
The integration ID 6904bcf672a6e36b68bb72fb points to aiXplain's aiR vector database service.
Available actions
| Action | Description |
|---|---|
upsert | Add or update documents |
search | Query by semantic similarity |
get | Retrieve a document by ID |
delete | Remove a document by ID |
count | Return total document count |
metadata | Inspect index configuration |
Prepare documents
Each document requires an id and text field. metadata is optional but enables filtering.
# List of dicts (most common)
documents = [
{"id": "doc1", "text": "apple", "metadata": {"type": "simple", "color": "green"}},
{"id": "doc2", "text": "banana", "metadata": {"type": "simple", "color": "yellow"}},
{"id": "doc3", "text": "strawberry", "metadata": {"type": "aggregate", "color": "red"}},
]
Document text is capped at 100,000 characters. For longer documents, split them first or use the chunking option in upsert.
When loading from CSV, metadata columns are stored as strings and must be parsed back:
import ast, pandas as pd
df = pd.read_csv("documents.csv")
df["metadata"] = df["metadata"].apply(ast.literal_eval)
documents = df.to_dict(orient="records")
Upsert documents
# Basic
index_tool.run(action="upsert", data={"records": documents})
# With chunking for long texts
index_tool.run(action="upsert", data={
"records": documents,
"chunking": {
"split_by": "sentence", # "word", "sentence", or "character"
"split_length": 3, # units per chunk
"split_overlap": 1, # overlap between consecutive chunks
},
})
split_overlap prevents related content from being split across chunks with no shared context.
Search
# Basic semantic search
response = index_tool.run(action="search", data={"query": "yellow fruit"})
for record in response.data:
print(record["id"], record["text"], record.get("score"))
# With metadata filters
response = index_tool.run(
action="search",
data={
"query": "yellow",
"top_k": 5,
"filters": [
{"field": "color", "operator": "==", "value": "yellow"},
{"field": "type", "operator": "==", "value": "simple"},
],
},
)
Filter operators: ==, !=, >, <, >=, <=, in, not in.
Get, delete, count, metadata
# Retrieve by ID
response = index_tool.run(action="get", data={"id": "doc1"})
# or
response = index_tool.run(action="get", data="doc1")
# Delete by ID
index_tool.run(action="delete", data={"id": "doc1"})
# Count all documents
response = index_tool.run(action="count")
print(response.data)
# Index configuration
response = index_tool.run(action="metadata")
print(response.data)
Use with agents
index_tool.allowed_actions = ["search", "get"] # prevent agent from writing to the index
agent = aix.Agent(
name="Product Assistant",
description="Helps users find products.",
instructions="Search the product index to answer questions. Include price and brand in your answers.",
tools=[index_tool],
)
agent.save()
response = agent.run("Find affordable electronics under $200.")
print(response.data.output)
Restrict allowed_actions to ["search"] or ["search", "get"] for agents that should only read from the index.
Full example
import time
from aixplain import Aixplain
aix = Aixplain(api_key="YOUR_API_KEY")
# 1. Create tool
index_tool = aix.Tool(
name=f"Product Index {int(time.time())}",
description="Vector database for product information.",
integration="6904bcf672a6e36b68bb72fb",
)
index_tool.save()
# 2. Upsert documents
index_tool.run(action="upsert", data={"records": [
{"id": "prod1", "text": "Wireless headphones with noise cancellation.", "metadata": {"category": "electronics", "price": 199}},
{"id": "prod2", "text": "Ergonomic office chair with lumbar support.", "metadata": {"category": "furniture", "price": 299}},
{"id": "prod3", "text": "Stainless steel water bottle, cold for 24 hours.", "metadata": {"category": "accessories", "price": 29}},
]})
# 3. Agent (read-only)
index_tool.allowed_actions = ["search", "get"]
agent = aix.Agent(
name="Product Assistant",
description="Helps users find products.",
instructions="Search the product index to answer questions. Include price and category.",
tools=[index_tool],
)
agent.save()
# 4. Run
response = agent.run("Find affordable electronics under $200.")
print(response.data.output)
Troubleshooting
Documents not appearing in search results
Confirm the upsert succeeded with index_tool.run(action="count"). Try broadening the query or removing filters.
Metadata filters return no results Field names are case-sensitive and must match exactly. Verify value types match (string vs. number).
Agent does not use the tool
Ensure instructions explicitly direct the agent to search the index. Confirm the tool is in tools and that allowed_actions includes "search".
CSV metadata parsing errors
Apply ast.literal_eval after reading the CSV — pandas reads dict-like columns as plain strings.