Chunking Settings

Control how Agentset splits your documents into chunks. Chunks split intelligently to preserve sentences and paragraphs. Agentset detects images, tables, and code blocks in the content and processes them using standalone chunkers.

Chunk size

Set chunkSize to control the target number of characters each chunk contains. Smaller chunks are more precise, while larger chunks preserve more context. The default is 2048 characters.

const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    chunkSize: 2048,
  },
});

Chunk boundaries are adjusted to preserve semantic coherence. Chunk sizes are designed to be close to the target value, but vary to achieve optimal splits.

Processing mode

Set mode to control the tradeoff between speed and accuracy when processing documents.

const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    mode: "accurate",
  },
});

Mode	Description
`fast`	Fastest processing, suitable for simple documents
`balanced`	Default. Good balance of speed and quality
`accurate`	Best layout detection, ideal for complex documents with tables or figures

LLM-assisted parsing

Enable useLlm to improve extraction of tables, forms, inline math, and complex layouts. This is enabled by default.

const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    useLlm: false,
  },
});

Force OCR

Set forceOcr to run OCR even when selectable text exists. This is useful for scanned documents where the embedded text layer is unreliable.

const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/scanned-document.pdf",
  },
  config: {
    forceOcr: true,
  },
});

Language

Specify languageCode to optimize text processing for a specific language. If omitted, the language is detected automatically.

const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    languageCode: "fr",
  },
});

Combining options

Pass multiple config options together to customize processing.

const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/report.pdf",
  },
  config: {
    chunkSize: 512,
    mode: "accurate",
    useLlm: true,
    languageCode: "en",
    metadata: {
      category: "reports",
    },
  },
});

Next steps

API Reference — Chunking parameters and options
Document Metadata — Attach metadata for filtering and citations
Search — Query your uploaded content
Ranking — Configure result ranking

Get Started

Data Ingestion

Search and Retrieval

Production

Chunking Settings

Chunk size

Processing mode

LLM-assisted parsing

Force OCR

Language

Combining options

Next steps

Get Started

Data Ingestion

Search and Retrieval

Production

​Chunk size

​Processing mode

​LLM-assisted parsing

​Force OCR

​Language

​Combining options

​Next steps

Chunk size

Processing mode

LLM-assisted parsing

Force OCR

Language

Combining options

Next steps