Core technologies
| Component | Default | Purpose |
|---|---|---|
| Document parsing | Marker | Extracts text, tables, and layout from 22+ file formats |
| OCR | Chandra | Recognizes text in scanned documents and images |
| Chunking | Chonkie | Splits documents into semantically coherent chunks |
| Embeddings | Text-embedding-3-large | Generates vector embeddings for search |
| Vector database | Turbopuffer | Indexes and queries embeddings at scale |
| Reranking | Zerank-2 | Improves the relevance of retrieved chunks |
| Generation | GPT-5.1 | Powers the retrieval agent and LLM parsing |
| Object storage | Cloudflare R2 | Stores original files and processed artifacts |
| Job queue | Trigger.dev | Orchestrates async document processing |
| Caching | Upstash | Redis caching and queue management |
Overview

Ingestion
When you upload a file or text, it enters the ingestion pipeline:- Parsing — Documents are parsed to extract text, tables, and layout. Scanned content goes through OCR. Multimodal content is either extracted using an LLM descriptor or natively embedded.
- Chunking — Extracted text is split into chunks. Chunk boundaries respect sentence and paragraph structure. Specialized chunkers are used when processing tables, images, and code blocks.
Storage
Each chunk is embedded and stored in two places:- Object storage (R2) — The original file, extracted text, and metadata are persisted for retrieval and future reprocessing. This is also used for the chunk viewer UI.
- Vector database (Turbopuffer) — Embeddings are indexed for semantic search. Chunks’ plain text is used for lexical search. Turbopuffer caches hot namespaces on NVMe SSD, so subsequent queries to the same namespace are fast.
Retrieval
When querying a namespace, Agentset runs agentic retrieval instead of single-shot search. Standard RAG pipelines embed the query, find the nearest vectors, and return results. This approach covers only a limited portion of the search space, can’t handle multi-hop questions, and is bound by chunk boundaries (i.e. if information is split across 2 or more chunks). Agentset has a retrieval agent with access to tools. This is heavily inspired by Agentic coding tools such as Claude Code and Cursor. The agent first generates a set of queries to answer the user’s question, then uses retrieval tools:- Semantic search — Query the vector database + reranker to find semantically similar chunks.
- Keyword search — Use lexical search to find chunks that contain exact or partial keyword matches.
- Go to page — Navigate to a specific page (or group of pages) to read the entire content.
- Metadata traversal — Traverse chunk metadata to filter relevant chunks.
Next steps
- Quickstart — Build your first RAG pipeline
- Benchmarks — Compare retrieval accuracy
- Chunking settings — Configure how documents are split
- Search — Query your documents
- Self-hosting — Deploy on your own infrastructure