Skip to main content
The rag_search tool is the cornerstone of information retrieval within the UBIK platform. It allows agents to perform Retrieval-Augmented Generation (RAG) searches across your uploaded documents. Unlike a standard keyword search, this tool uses semantic understanding to find the most relevant “chunks” of text from your knowledge base and uses a Large Language Model (LLM) to synthesize a precise answer grounded in those facts.

When to Use This Tool

Use rag_search when you need to:
  • Answer specific questions based on your private data (e.g., “What is the vacation policy?”).
  • Find specific facts buried in large documents.
  • Verify information against a trusted source.
  • Retrieve context to support a conversation.
This tool is optimized for retrieval accuracy and grounded generation. It is not intended for processing entire documents or generating long-form summaries (use information_analysis for that).

Input Parameters

The tool accepts the following parameters:
ParameterTypeRequiredDescription
querystringYesThe natural language question or search query. Be as specific as possible for best results.
document_idsarray<uuid>NoA list of specific Document UUIDs to search within. If omitted, the search runs across all documents accessible to the user/session.

Scoping & Permissions

The rag_search tool automatically respects the security context of the execution:
  • User Access: Searches documents owned by the user or shared with them via workspaces.
  • Session Context: If running within a chat session, it includes documents attached to that specific session.
  • External ID: For multi-tenant applications, it strictly enforces external_user_id boundaries, ensuring users never see data from other tenants.

Output Structure

The tool returns a structured object containing the answer, the evidence used to generate it, and metadata about the execution.
{
  "response": "**Reflection:**\n*The user is asking about the remote work policy. I need to check the employee handbook for eligibility and approval processes...*\n\n# Remote Work Guidelines\n\nAccording to the company handbook, remote work is allowed under specific conditions:\n\n- Employees must have completed their probation period <citation id=\"9bdef571-ed43-4cb7-a4a1-1011edce8a62\">[1]</citation>\n- Approval is required from the direct manager at least 48 hours in advance <citation id=\"af572b1c-cb3a-49dc-a062-17860219b8ef\">[2]</citation>\n\nExceptions can be made for medical reasons.",
  "contexts": [
    {
      "rank": 1,
      "chunk_id": "9bdef571-ed43-4cb7-a4a1-1011edce8a62",
      "document_id": "7f15f1ff-d15e-4894-8fb3-155392ab8972",
      "text_preview": "Eligibility for remote work: Full-time employees who have successfully completed their 3-month probation period are eligible...",
      "used_in_response": true
    },
    {
      "rank": 2,
      "chunk_id": "af572b1c-cb3a-49dc-a062-17860219b8ef",
      "document_id": "7f15f1ff-d15e-4894-8fb3-155392ab8972",
      "text_preview": "Request process: Submit a request via the HR portal. Manager approval is required 48 hours prior to the requested date...",
      "used_in_response": true
    }
  ],
  "sources_used": [1, 2],
  "model": "claude-3-7-sonnet-20250219-thinking",
  "execution_id": "call_HB55iUMZE3dZ3QKCHGKE6qYF"
}
FieldDescription
responseThe natural language answer. Can include a “Reflection” block (thinking process), Markdown formatting, and inline citations pointing to specific chunks.
contextsA list of the retrieved text chunks passed to the LLM. Includes chunk_id, document_id, and text_preview.
sources_usedA list of indices (ranks) corresponding to the contexts that were explicitly used to form the answer. These indices are derived from citations (e.g., <source_1>) generated by the model.
modelThe specific LLM used for generation.
execution_idThe unique identifier for this tool execution.

Retrieving Chunk Details

The rag_search response provides chunk_ids in the contexts array. You can use these IDs to fetch precise location data for highlighting or deep-linking within the original document using the GET /chunks/{chunk_id} endpoint. The response structure adapts to the content modality (Text/PDF vs. Audio/Video):
{
  "id": "9bdef571-ed43-4cb7-a4a1-1011edce8a62",
  "document_id": "7f15f1ff-d15e-4894-8fb3-155392ab8972",
  "text": "Full text content of the chunk...",
  
  // For PDFs and Images
  "page_number": 3,
  "bbox": [
    {
      "bbox": [100.5, 200.0, 300.5, 250.0], // [x1, y1, x2, y2]
      "page_number": 3
    },
    {
      "bbox": [50.0, 100.0, 200.0, 150.0], // Continuation on next page
      "page_number": 4
    }
  ],

  // For Audio and Video
  "start_time": 120.5, // Seconds
  "end_time": 135.0,   // Seconds

  "metadata": {
    "filename": "handbook.pdf",
    "languages": ["eng"],
    "modality": "text"
  }
}
FieldDescription
bboxA list of bounding boxes for visual highlighting. Each entry contains coordinates [x1, y1, x2, y2] and the specific page_number. Note: Some document types (e.g., plain text files, markdown) may not provide coordinates.
page_numberThe primary page number for the chunk (1-indexed). Null for time-based media.
start_time / end_timeTimestamps in seconds, used for seeking in audio or video players.

Streaming Events

When used in streaming mode, the rag_search tool emits real-time events via SSE (Server-Sent Events). This allows you to track the progress of the RAG pipeline and display the answer as it is generated.

Event Types

EventDescription
tool_updateIndicates a progress update (phase change).
tool_partial_updateContains a new text fragment of the generated response (streaming).
errorSignals that a critical error occurred during execution.
tool_endSignals the end of the tool execution and provides the full final result.

Pipeline Phases (tool_update)

The tool_update event contains a data field with a phase and a status. Here are the possible phases:
  1. SEARCH_PREPARATION
    • status: started
    • Indicates that the pipeline has started and is preparing the search.
  2. RETRIEVAL
    • status: completed
    • data: { "retrieved_count": <int> }
    • Indicates that the initial vector search is complete and how many documents were found.
  3. RERANKING
    • status: completed
    • data: { "initial_count": <int>, "reranked_count": <int>, "kept_count": <int> }
    • Indicates that results have been re-ranked by relevance. kept_count is the number of documents kept for generation.
  4. COMPILING_RESULTS (Generation)
    • status: started
    • Indicates that the LLM generation of the answer is starting.

Content Streaming (tool_partial_update)

During the generation phase, tool_partial_update events are emitted for each generated text fragment.
  • content: <string> (The text fragment)
  • output_key: "response"
These fragments must be concatenated to form the complete answer.
Handling Large Events (Chunking)If an event payload exceeds the SSE size limit, it will be split into multiple _delta_sse events. For detailed instructions and code examples on how to buffer and reconstruct these chunked events, please refer to the Streaming Results Guide or the Agent Session Events Guide.

Example Event Flow

// Start
{ "event": "tool_update", "data": { "phase": "SEARCH_PREPARATION", "status": "started" } }

// Retrieval completed
{ "event": "tool_update", "data": { "phase": "RETRIEVAL", "status": "completed", "data": { "retrieved_count": 15 } } }

// Reranking completed
{ "event": "tool_update", "data": { "phase": "RERANKING", "status": "completed", "data": { "initial_count": 15, "reranked_count": 15, "kept_count": 5 } } }

// Generation started
{ "event": "tool_update", "data": { "phase": "COMPILING_RESULTS", "status": "started" } }

// Response streaming (tool_partial_update)
{ "event": "tool_partial_update", "data": { "content": "According", "output_key": "response" } }
{ "event": "tool_partial_update", "data": { "content": " to", "output_key": "response" } }
{ "event": "tool_partial_update", "data": { "content": " the", "output_key": "response" } }
{ "event": "tool_partial_update", "data": { "content": " document...", "output_key": "response" } }

// End
{ "event": "tool_end", "data": { ...full final result... } }
Searching across all available knowledge. Input:
{
  "query": "How do I reset my 2FA token?"
}
Searching only within a specific technical manual. Input:
{
  "query": "What is the error code E-505?",
  "document_ids": ["550e8400-e29b-41d4-a716-446655440000"]
}

Multimodal Capabilities

The rag_search pipeline is fully multimodal. If you have indexed documents containing images (like PDFs with charts or slides), the search can retrieve relevant visual context.
  • Text-to-Image Retrieval: Your text query can match descriptions of images.
  • Image Understanding: The generation model can “see” the retrieved images to answer questions about charts, diagrams, or photos.
Activation RequiredMultimodal RAG is not enabled by default. To activate this feature for your workspace, please contact the UBIK team at contact@ubik-agent.com.
For a deeper dive into how the pipeline handles embeddings, re-ranking, and hybrid search, see the RAG Pipeline Deep Dive.