Chunk size
SetchunkSize to control the target number of characters each chunk contains. Smaller chunks are more precise, while larger chunks preserve more context. The default is 2048 characters.
Chunk boundaries are adjusted to preserve semantic coherence. Chunk sizes are designed
to be close to the target value, but vary to achieve optimal splits.
Processing mode
Setmode to control the tradeoff between speed and accuracy when processing documents.
| Mode | Description |
|---|---|
fast | Fastest processing, suitable for simple documents |
balanced | Default. Good balance of speed and quality |
accurate | Best layout detection, ideal for complex documents with tables or figures |
Image extraction
Control image extraction from documents withdisableImageExtraction. When disabled, images are not extracted from the document.
Image captions
Disable synthetic image captions withdisableImageCaptions. When enabled, images are rendered as plain img tags without alt text descriptions.
Chart understanding
EnablechartUnderstanding to extract data from charts in documents. This feature analyzes chart content and converts it to structured data.
Page headers and footers
Control whether page headers and footers are included in the output usingkeepPageheaderInOutput and keepPagefooterInOutput.
Language
SpecifylanguageCode to optimize text processing for a specific language. If omitted, the language is detected automatically.
Combining options
Pass multiple config options together to customize processing.Next steps
- API Reference — Chunking parameters and options
- Document Metadata — Attach metadata for filtering and citations
- Search — Query your uploaded content
- Ranking — Configure result ranking