How it works
When you send a message, the model runs a tool-calling loop of up to 20 steps with two tools:| Tool | Description |
|---|---|
search | Searches the namespace with a query the model writes, in semantic or keyword mode. Keyword search is available on Turbopuffer-backed namespaces (the default for managed namespaces). |
expand | Fetches roughly 10 surrounding chunks (5 before, 5 after by position in the document) when a retrieved chunk is cut off or needs nearby context. Available on Turbopuffer-backed namespaces; not yet supported on Pinecone-backed namespaces. |
Modes
The chat playground has two modes:| Mode | Behavior |
|---|---|
| Accurate (default) | Each semantic search fetches Top K chunks (default 30) and reranks them down to the Rerank Limit (default 10) with the configured re-ranker. |
| Fast | Skips reranking. Each search returns the Rerank Limit (default 10) chunks directly. |
Citations
Answers cite sources with inline pills resolved from the retrieved chunks. Click a pill to view the source text and its metadata.Parameters
Open the parameters dialog to tune retrieval and generation:| Parameter | Description |
|---|---|
| Top K | Number of chunks fetched per semantic search, before reranking (default 30) |
| Rerank Limit | Number of chunks the model sees per search (default 10) |
| System Prompt | Instructions for the assistant. Defaults to a prompt tuned for agentic search |
| Re-ranker | Model used to rerank results in Accurate mode |
| Temperature | Sampling temperature. Has no effect on reasoning models like GPT-5.5 |
Next steps
- Agentic Search — Build the same tool-calling loop in your own app
- Search — Query your namespace through the API
- Hosting UI — Share a prebuilt chat interface with your users