Skip to main content
Control how Agentset splits your documents into chunks. Chunks split intelligently to preserve sentences and paragraphs. Agentset detects images, tables, and code blocks in the content and processes them using standalone chunkers.

Chunk size

Set chunkSize to control the target number of characters each chunk contains. Smaller chunks are more precise, while larger chunks preserve more context. The default is 2048 characters.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    chunkSize: 2048,
  },
});
Chunk boundaries are adjusted to preserve semantic coherence. Chunk sizes are designed to be close to the target value, but vary to achieve optimal splits.

Processing mode

Set mode to control the tradeoff between speed and accuracy when processing documents.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    mode: "accurate",
  },
});
ModeDescription
fastFastest processing, suitable for simple documents
balancedDefault. Good balance of speed and quality
accurateBest layout detection, ideal for complex documents with tables or figures

Image extraction

Control image extraction from documents with disableImageExtraction. When disabled, images are not extracted from the document.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    disableImageExtraction: true,
  },
});

Image captions

Disable synthetic image captions with disableImageCaptions. When enabled, images are rendered as plain img tags without alt text descriptions.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    disableImageCaptions: true,
  },
});

Chart understanding

Enable chartUnderstanding to extract data from charts in documents. This feature analyzes chart content and converts it to structured data.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/report.pdf",
  },
  config: {
    chartUnderstanding: true,
  },
});

Page headers and footers

Control whether page headers and footers are included in the output using keepPageheaderInOutput and keepPagefooterInOutput.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    keepPageheaderInOutput: true,
    keepPagefooterInOutput: true,
  },
});

Language

Specify languageCode to optimize text processing for a specific language. If omitted, the language is detected automatically.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/document.pdf",
  },
  config: {
    languageCode: "fr",
  },
});

Combining options

Pass multiple config options together to customize processing.
const job = await ns.ingestion.create({
  payload: {
    type: "FILE",
    fileUrl: "https://example.com/report.pdf",
  },
  config: {
    chunkSize: 512,
    mode: "accurate",
    chartUnderstanding: true,
    disableImageCaptions: false,
    languageCode: "en",
    metadata: {
      category: "reports",
    },
  },
});

Next steps