Version: v2.0

Knowledge Base

Index tools provide vector database capabilities for semantic search and retrieval-augmented generation (RAG). Documents are embedded and stored so agents can retrieve relevant context by meaning, not just keyword match.

Setup

pip install aixplain

from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")

Quick start

import time

index_tool = aix.Tool(
    name=f"Quick Index {int(time.time())}",
    description="Vector database for semantic search.",
    integration="6904bcf672a6e36b68bb72fb",  # aiR vector database
)
index_tool.save()

# Add documents
index_tool.run(action="upsert", data={"records": [
    {"id": "doc1", "text": "Python is a programming language.", "metadata": {"category": "tech"}},
    {"id": "doc2", "text": "Machine learning uses algorithms.",  "metadata": {"category": "ai"}},
]})

# Search
response = index_tool.run(action="search", data={"query": "programming"})
print(response.data)

Show output

Create the tool

index_tool = aix.Tool(
    name=f"Index Tool {int(time.time())}",
    description="Vector database for product information.",
    integration="6904bcf672a6e36b68bb72fb",
)
index_tool.save()

index_tool.list_actions()

Show output

The integration ID 6904bcf672a6e36b68bb72fb points to aiXplain's aiR vector database service.

Configure the embedding model

By default, the index uses aiXplain's default embedding model. Pass config={"model": "<model_id>"} to use a different one.

import time
from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")

index_tool = aix.Tool(
    name=f"My Index {int(time.time())}",
    description="Vector database with BGE embedding model.",
    integration="6904bcf672a6e36b68bb72fb",
    config={"model": "67efd4f92a0a850afa045af7"},  # embedding model ID
)
index_tool.save()

Parameter	Type	Required	Default	Description
`integration`	`str`	✅	—	Always `"6904bcf672a6e36b68bb72fb"` for the aiR vector database.
`config.model`	`str`	—	aiXplain default	Embedding model ID used to vectorise documents at upsert time.

Available actions

Action	Description
`upsert`	Add or update documents
`search`	Query by semantic similarity
`get`	Retrieve a document by ID
`delete`	Remove a document by ID
`count`	Return total document count
`metadata`	Inspect index configuration

Prepare documents

Each document requires an id and text field. metadata is optional but enables filtering.

# List of dicts (most common)
documents = [
    {"id": "doc1", "text": "apple",      "metadata": {"type": "simple",    "color": "green"}},
    {"id": "doc2", "text": "banana",     "metadata": {"type": "simple",    "color": "yellow"}},
    {"id": "doc3", "text": "strawberry", "metadata": {"type": "aggregate", "color": "red"}},
]

Document text is capped at 100,000 characters. For longer documents, split them first or use the chunking option in upsert.

When loading from CSV, metadata columns are stored as strings and must be parsed back:

import ast, pandas as pd

df = pd.read_csv("documents.csv")
df["metadata"] = df["metadata"].apply(ast.literal_eval)
documents = df.to_dict(orient="records")

Upsert documents

# Basic
index_tool.run(action="upsert", data={"records": documents})

# With chunking for long texts
index_tool.run(action="upsert", data={
    "records": documents,
    "chunking": {
        "split_by": "sentence",  # "word", "sentence", or "character"
        "split_length": 3,       # units per chunk
        "split_overlap": 1,      # overlap between consecutive chunks
    },
})

Show output

split_overlap prevents related content from being split across chunks with no shared context.

Search

# Basic semantic search
response = index_tool.run(action="search", data={"query": "yellow fruit"})
for record in response.data:
    print(record["id"], record["text"], record.get("score"))

Show output

# With metadata filters
response = index_tool.run(
    action="search",
    data={
        "query": "yellow",
        "top_k": 5,
        "filters": [
            {"field": "color", "operator": "==", "value": "yellow"},
            {"field": "type",  "operator": "==", "value": "simple"},
        ],
    },
)

Show output

Filter operators: ==, !=, >, <, >=, <=, in, not in.

Get, delete, count, metadata

# Retrieve by ID
response = index_tool.run(action="get", data={"id": "doc1"})
# or
response = index_tool.run(action="get", data="doc1")

# Delete by ID
index_tool.run(action="delete", data={"id": "doc1"})

# Count all documents
response = index_tool.run(action="count")
print(response.data)

# Index configuration
response = index_tool.run(action="metadata")
print(response.data)

Show output

Use with agents

note

# replaces: LangChain RAG chain + VectorStoreRetriever + separate embedding pipeline
# index tool handles retrieval; agent reasons over results natively

index_tool.allowed_actions = ["search", "get"]  # prevent agent from writing to the index

agent = aix.Agent(
    name="Product Assistant",
    description="Helps users find products.",
    instructions="Search the product index to answer questions. Include price and brand in your answers.",
    tools=[index_tool],
)
agent.save()

response = agent.run("Find affordable electronics under $200.")
print(response.data.output)

Show output

Restrict allowed_actions to ["search"] or ["search", "get"] for agents that should only read from the index.

Full example

import time
from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")

# 1. Create tool
index_tool = aix.Tool(
    name=f"Product Index {int(time.time())}",
    description="Vector database for product information.",
    integration="6904bcf672a6e36b68bb72fb",
)
index_tool.save()

# 2. Upsert documents
index_tool.run(action="upsert", data={"records": [
    {"id": "prod1", "text": "Wireless headphones with noise cancellation.",        "metadata": {"category": "electronics", "price": 199}},
    {"id": "prod2", "text": "Ergonomic office chair with lumbar support.",          "metadata": {"category": "furniture",    "price": 299}},
    {"id": "prod3", "text": "Stainless steel water bottle, cold for 24 hours.",    "metadata": {"category": "accessories",  "price": 29}},
]})

# 3. Agent (read-only)
index_tool.allowed_actions = ["search", "get"]

agent = aix.Agent(
    name="Product Assistant",
    description="Helps users find products.",
    instructions="Search the product index to answer questions. Include price and category.",
    tools=[index_tool],
)
agent.save()

# 4. Run
response = agent.run("Find affordable electronics under $200.")
print(response.data.output)

Show output

Troubleshooting

Documents not appearing in search results Confirm the upsert succeeded with index_tool.run(action="count"). Try broadening the query or removing filters.

Metadata filters return no results Field names are case-sensitive and must match exactly. Verify value types match (string vs. number).

Agent does not use the tool Ensure instructions explicitly direct the agent to search the index. Confirm the tool is in tools and that allowed_actions includes "search".

CSV metadata parsing errors Apply ast.literal_eval after reading the CSV — pandas reads dict-like columns as plain strings.

Setup​

Quick start​

Create the tool​

Configure the embedding model​

Available actions​

Prepare documents​

Upsert documents​

Search​

Get, delete, count, metadata​

Use with agents​

Full example​

Troubleshooting​

Setup

Quick start

Create the tool

Configure the embedding model

Available actions

Prepare documents

Upsert documents

Search

Get, delete, count, metadata

Use with agents

Full example

Troubleshooting