The rag_search tool is the cornerstone of information retrieval within the UBIK platform. It allows agents to perform Retrieval-Augmented Generation (RAG) searches across your uploaded documents.
Unlike a standard keyword search, this tool uses semantic understanding to find the most relevant “chunks” of text from your knowledge base and uses a Large Language Model (LLM) to synthesize a precise answer grounded in those facts.
Use rag_search when you need to:
- Answer specific questions based on your private data (e.g., “What is the vacation policy?”).
- Find specific facts buried in large documents.
- Verify information against a trusted source.
- Retrieve context to support a conversation.
This tool is optimized for retrieval accuracy and grounded generation. It is not intended for processing entire documents or generating long-form summaries (use information_analysis for that).
The tool accepts the following parameters:
| Parameter | Type | Required | Description |
|---|
query | string | Yes | The natural language question or search query. Be as specific as possible for best results. |
document_ids | array<uuid> | No | A list of specific Document UUIDs to search within. If omitted, the search runs across all documents accessible to the user/session. |
Scoping & Permissions
The rag_search tool automatically respects the security context of the execution:
- User Access: Searches documents owned by the user or shared with them via workspaces.
- Session Context: If running within a chat session, it includes documents attached to that specific session.
- External ID: For multi-tenant applications, it strictly enforces
external_user_id boundaries, ensuring users never see data from other tenants.
Output Structure
The tool returns a structured object containing the answer, the evidence used to generate it, and metadata about the execution.
{
"response": "**Reflection:**\n*The user is asking about the remote work policy. I need to check the employee handbook for eligibility and approval processes...*\n\n# Remote Work Guidelines\n\nAccording to the company handbook, remote work is allowed under specific conditions:\n\n- Employees must have completed their probation period <citation id=\"9bdef571-ed43-4cb7-a4a1-1011edce8a62\">[1]</citation>\n- Approval is required from the direct manager at least 48 hours in advance <citation id=\"af572b1c-cb3a-49dc-a062-17860219b8ef\">[2]</citation>\n\nExceptions can be made for medical reasons.",
"contexts": [
{
"rank": 1,
"chunk_id": "9bdef571-ed43-4cb7-a4a1-1011edce8a62",
"document_id": "7f15f1ff-d15e-4894-8fb3-155392ab8972",
"text_preview": "Eligibility for remote work: Full-time employees who have successfully completed their 3-month probation period are eligible...",
"used_in_response": true
},
{
"rank": 2,
"chunk_id": "af572b1c-cb3a-49dc-a062-17860219b8ef",
"document_id": "7f15f1ff-d15e-4894-8fb3-155392ab8972",
"text_preview": "Request process: Submit a request via the HR portal. Manager approval is required 48 hours prior to the requested date...",
"used_in_response": true
}
],
"sources_used": [1, 2],
"model": "claude-3-7-sonnet-20250219-thinking",
"execution_id": "call_HB55iUMZE3dZ3QKCHGKE6qYF"
}
| Field | Description |
|---|
response | The natural language answer. Can include a “Reflection” block (thinking process), Markdown formatting, and inline citations pointing to specific chunks. |
contexts | A list of the retrieved text chunks passed to the LLM. Includes chunk_id, document_id, and text_preview. |
sources_used | A list of indices (ranks) corresponding to the contexts that were explicitly used to form the answer. These indices are derived from citations (e.g., <source_1>) generated by the model. |
model | The specific LLM used for generation. |
execution_id | The unique identifier for this tool execution. |
Retrieving Chunk Details
The rag_search response provides chunk_ids in the contexts array. You can use these IDs to fetch precise location data for highlighting or deep-linking within the original document using the GET /chunks/{chunk_id} endpoint.
The response structure adapts to the content modality (Text/PDF vs. Audio/Video):
{
"id": "9bdef571-ed43-4cb7-a4a1-1011edce8a62",
"document_id": "7f15f1ff-d15e-4894-8fb3-155392ab8972",
"text": "Full text content of the chunk...",
// For PDFs and Images
"page_number": 3,
"bbox": [
{
"bbox": [100.5, 200.0, 300.5, 250.0], // [x1, y1, x2, y2]
"page_number": 3
},
{
"bbox": [50.0, 100.0, 200.0, 150.0], // Continuation on next page
"page_number": 4
}
],
// For Audio and Video
"start_time": 120.5, // Seconds
"end_time": 135.0, // Seconds
"metadata": {
"filename": "handbook.pdf",
"languages": ["eng"],
"modality": "text"
}
}
| Field | Description |
|---|
bbox | A list of bounding boxes for visual highlighting. Each entry contains coordinates [x1, y1, x2, y2] and the specific page_number. Note: Some document types (e.g., plain text files, markdown) may not provide coordinates. |
page_number | The primary page number for the chunk (1-indexed). Null for time-based media. |
start_time / end_time | Timestamps in seconds, used for seeking in audio or video players. |
Streaming Events
When used in streaming mode, the rag_search tool emits real-time events via SSE (Server-Sent Events). This allows you to track the progress of the RAG pipeline and display the answer as it is generated.
Event Types
| Event | Description |
|---|
tool_update | Indicates a progress update (phase change). |
tool_partial_update | Contains a new text fragment of the generated response (streaming). |
error | Signals that a critical error occurred during execution. |
tool_end | Signals the end of the tool execution and provides the full final result. |
The tool_update event contains a data field with a phase and a status. Here are the possible phases:
-
SEARCH_PREPARATION
status: started
- Indicates that the pipeline has started and is preparing the search.
-
RETRIEVAL
status: completed
data: { "retrieved_count": <int> }
- Indicates that the initial vector search is complete and how many documents were found.
-
RERANKING
status: completed
data: { "initial_count": <int>, "reranked_count": <int>, "kept_count": <int> }
- Indicates that results have been re-ranked by relevance.
kept_count is the number of documents kept for generation.
-
COMPILING_RESULTS (Generation)
status: started
- Indicates that the LLM generation of the answer is starting.
Content Streaming (tool_partial_update)
During the generation phase, tool_partial_update events are emitted for each generated text fragment.
content: <string> (The text fragment)
output_key: "response"
These fragments must be concatenated to form the complete answer.
Handling Large Events (Chunking)If an event payload exceeds the SSE size limit, it will be split into multiple _delta_sse events. For detailed instructions and code examples on how to buffer and reconstruct these chunked events, please refer to the Streaming Results Guide or the Agent Session Events Guide.
Example Event Flow
// Start
{ "event": "tool_update", "data": { "phase": "SEARCH_PREPARATION", "status": "started" } }
// Retrieval completed
{ "event": "tool_update", "data": { "phase": "RETRIEVAL", "status": "completed", "data": { "retrieved_count": 15 } } }
// Reranking completed
{ "event": "tool_update", "data": { "phase": "RERANKING", "status": "completed", "data": { "initial_count": 15, "reranked_count": 15, "kept_count": 5 } } }
// Generation started
{ "event": "tool_update", "data": { "phase": "COMPILING_RESULTS", "status": "started" } }
// Response streaming (tool_partial_update)
{ "event": "tool_partial_update", "data": { "content": "According", "output_key": "response" } }
{ "event": "tool_partial_update", "data": { "content": " to", "output_key": "response" } }
{ "event": "tool_partial_update", "data": { "content": " the", "output_key": "response" } }
{ "event": "tool_partial_update", "data": { "content": " document...", "output_key": "response" } }
// End
{ "event": "tool_end", "data": { ...full final result... } }
1. Broad Search
Searching across all available knowledge.
Input:
{
"query": "How do I reset my 2FA token?"
}
2. Scoped Search
Searching only within a specific technical manual.
Input:
{
"query": "What is the error code E-505?",
"document_ids": ["550e8400-e29b-41d4-a716-446655440000"]
}
Multimodal Capabilities
The rag_search pipeline is fully multimodal. If you have indexed documents containing images (like PDFs with charts or slides), the search can retrieve relevant visual context.
- Text-to-Image Retrieval: Your text query can match descriptions of images.
- Image Understanding: The generation model can “see” the retrieved images to answer questions about charts, diagrams, or photos.
Activation RequiredMultimodal RAG is not enabled by default. To activate this feature for your workspace, please contact the UBIK team at contact@ubik-agent.com.
For a deeper dive into how the pipeline handles embeddings, re-ranking, and hybrid search, see the RAG Pipeline Deep Dive.