Skip to main content
Agentset combines best-in-class open source tools with a serverless, modular architecture to deliver performant RAG for millions of documents.

Core technologies

ComponentDefaultPurpose
Document parsingMarkerExtracts text, tables, and layout from 22+ file formats
OCRChandraRecognizes text in scanned documents and images
ChunkingChonkieSplits documents into semantically coherent chunks
EmbeddingsText-embedding-3-largeGenerates vector embeddings for search
Vector databaseTurbopufferIndexes and queries embeddings at scale
RerankingZerank-2Improves the relevance of retrieved chunks
GenerationGPT-5.1Powers the retrieval agent and LLM parsing
Object storageCloudflare R2Stores original files and processed artifacts
Job queueTrigger.devOrchestrates async document processing
CachingUpstashRedis caching and queue management

Overview

Agentset architecture diagram
Agentset has three components: ingestion, storage, and retrieval.

Ingestion

When you upload a file or text, it enters the ingestion pipeline:
  1. Parsing — Documents are parsed to extract text, tables, and layout. Scanned content goes through OCR. Multimodal content is either extracted using an LLM descriptor or natively embedded.
  2. Chunking — Extracted text is split into chunks. Chunk boundaries respect sentence and paragraph structure. Specialized chunkers are used when processing tables, images, and code blocks.
The ingestion pipeline runs asynchronously through a queue system. A 100-page PDF typically processes in under 60 seconds.

Storage

Each chunk is embedded and stored in two places:
  • Object storage (R2) — The original file, extracted text, and metadata are persisted for retrieval and future reprocessing. This is also used for the chunk viewer UI.
  • Vector database (Turbopuffer) — Embeddings are indexed for semantic search. Chunks’ plain text is used for lexical search. Turbopuffer caches hot namespaces on NVMe SSD, so subsequent queries to the same namespace are fast.
This set-up gives flexibility to do both semantic and lexical search, reprocess content when new improvements are made, and debug source content.

Retrieval

When querying a namespace, Agentset runs agentic retrieval instead of single-shot search. Standard RAG pipelines embed the query, find the nearest vectors, and return results. This approach covers only a limited portion of the search space, can’t handle multi-hop questions, and is bound by chunk boundaries (i.e. if information is split across 2 or more chunks). Agentset has a retrieval agent with access to tools. This is heavily inspired by Agentic coding tools such as Claude Code and Cursor. The agent first generates a set of queries to answer the user’s question, then uses retrieval tools:
  • Semantic search — Query the vector database + reranker to find semantically similar chunks.
  • Keyword search — Use lexical search to find chunks that contain exact or partial keyword matches.
  • Go to page — Navigate to a specific page (or group of pages) to read the entire content.
  • Metadata traversal — Traverse chunk metadata to filter relevant chunks.
Tools run in parallel unless a follow-up query is needed. The system is optimized to return results in one round-trip when possible. This approach results in higher recall and accuracy. See benchmarks for accuracy comparisons against standard RAG.

Next steps