Skip to main content
Control how Agentset splits your documents into chunks. Chunks split intelligently to preserve sentences and paragraphs. Agentset detects images, tables, and code blocks in the content and processes them using standalone chunkers.

Chunk size

Set chunkSize to control the target number of characters each chunk contains. Smaller chunks are more precise, while larger chunks preserve more context. The default is 2048 characters.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    chunkSize: 2048,
  },
});
Chunk boundaries are adjusted to preserve semantic coherence. Chunk sizes are designed to be close to the target value, but vary to achieve optimal splits.

Processing mode

Set mode to control the tradeoff between speed and accuracy when processing documents.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    mode: "accurate",
  },
});
ModeDescription
fastFastest processing, suitable for simple documents
balancedDefault. Good balance of speed and quality
accurateBest layout detection, ideal for complex documents with tables or figures

LLM-assisted parsing

Enable useLlm to improve extraction of tables, forms, inline math, and complex layouts. This is enabled by default.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    useLlm: false,
  },
});

Force OCR

Set forceOcr to run OCR even when selectable text exists. This is useful for scanned documents where the embedded text layer is unreliable.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/scanned-document.pdf",
  },
  config: {
    forceOcr: true,
  },
});

Language

Specify languageCode to optimize text processing for a specific language. If omitted, the language is detected automatically.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    languageCode: "fr",
  },
});

Combining options

Pass multiple config options together to customize processing.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/report.pdf",
  },
  config: {
    chunkSize: 512,
    mode: "accurate",
    useLlm: true,
    languageCode: "en",
    metadata: {
      category: "reports",
    },
  },
});

Next steps