Architecture

Agentset combines best-in-class open source tools with a serverless, modular architecture to deliver performant RAG for millions of documents.

Core technologies

Component	Default	Purpose
Document parsing	Marker	Extracts text, tables, and layout from 22+ file formats
OCR	Chandra	Recognizes text in scanned documents and images
Chunking	Chonkie	Splits documents into semantically coherent chunks
Embeddings	Text-embedding-3-large	Generates vector embeddings for search
Vector database	Turbopuffer	Indexes and queries embeddings at scale
Reranking	Zerank-2	Improves the relevance of retrieved chunks
Generation	GPT-5.1	Powers the retrieval agent and LLM parsing
Object storage	Cloudflare R2	Stores original files and processed artifacts
Job queue	Trigger.dev	Orchestrates async document processing
Caching	Upstash	Redis caching and queue management

Overview

Agentset has three components: ingestion, storage, and retrieval.

Ingestion

When you upload a file or text, it enters the ingestion pipeline:

Parsing — Documents are parsed to extract text, tables, and layout. Scanned content goes through OCR. Multimodal content is either extracted using an LLM descriptor or natively embedded.
Chunking — Extracted text is split into chunks. Chunk boundaries respect sentence and paragraph structure. Specialized chunkers are used when processing tables, images, and code blocks.

The ingestion pipeline runs asynchronously through a queue system. A 100-page PDF typically processes in under 60 seconds.

Storage

Each chunk is embedded and stored in two places:

Object storage (R2) — The original file, extracted text, and metadata are persisted for retrieval and future reprocessing. This is also used for the chunk viewer UI.
Vector database (Turbopuffer) — Embeddings are indexed for semantic search. Chunks’ plain text is used for lexical search. Turbopuffer caches hot namespaces on NVMe SSD, so subsequent queries to the same namespace are fast.

This set-up gives flexibility to do both semantic and lexical search, reprocess content when new improvements are made, and debug source content.

Retrieval

When querying a namespace, Agentset runs agentic retrieval instead of single-shot search. Standard RAG pipelines embed the query, find the nearest vectors, and return results. This approach covers only a limited portion of the search space, can’t handle multi-hop questions, and is bound by chunk boundaries (i.e. if information is split across 2 or more chunks). Agentset has a retrieval agent with access to tools. This is heavily inspired by Agentic coding tools such as Claude Code and Cursor. The agent first generates a set of queries to answer the user’s question, then uses retrieval tools:

Semantic search — Query the vector database + reranker to find semantically similar chunks.
Keyword search — Use lexical search to find chunks that contain exact or partial keyword matches.
Go to page — Navigate to a specific page (or group of pages) to read the entire content.
Metadata traversal — Traverse chunk metadata to filter relevant chunks.

Tools run in parallel unless a follow-up query is needed. The system is optimized to return results in one round-trip when possible. This approach results in higher recall and accuracy. See benchmarks for accuracy comparisons against standard RAG.

Next steps

Quickstart — Build your first RAG pipeline
Benchmarks — Compare retrieval accuracy
Chunking settings — Configure how documents are split
Search — Query your documents
Self-hosting — Deploy on your own infrastructure

Get Started

Data Ingestion

Search and Retrieval

Production

Webhooks

Cookbooks

Core technologies

Overview

Ingestion

Storage

Retrieval

Next steps

Get Started

Data Ingestion

Search and Retrieval

Production

Webhooks

Cookbooks

​Core technologies

​Overview

​Ingestion

​Storage

​Retrieval

​Next steps

Core technologies

Overview

Ingestion

Storage

Retrieval

Next steps