Reranker API

Improve retrieval quality by scoring and reordering candidate documents based on their relevance to a query.

Use reranking as a final step after initial search to surface the best context for RAG and question answering.

POST/v1/rerank

Rerank documents

Rerank a list of candidate documents by relevance to a query. Use reranking to improve search quality in RAG pipelines by sorting results based on semantic relevance.

Required attributes

Name
model
Type
string
Description
The reranker model (or deployment ID/alias) to use for this request.
Name
query
Type
string
Description
The search query used to score relevance.
Name
documents
Type
array
Description
The candidate documents to rerank. Each item can be:
- a string (document text), or
- an object with text and optional metadata (e.g. id, title, metadata).

Optional attributes

Name
top_n
Type
integer
Description
Number of top results to return. If omitted, all documents are returned with scores.
Name
return_documents
Type
boolean
Description
If true, include the original document payload in each result.
Name
truncate
Type
string
Description
How to handle long documents. Common values: "none", "start", "end". If omitted, the server default is used.
Name
max_tokens_per_document
Type
integer
Description
Maximum tokens (or approximate token budget) to consider per document when scoring. If omitted, the server default is used.
Name
metadata
Type
object
Description
Developer-defined metadata to attach to the request (key/value pairs).
Name
user
Type
string
Description
A unique identifier representing your end-user (can help with abuse monitoring and analytics). If unsupported, it may be ignored.
Name
extra_body
Type
object
Description
(Optional pass-through) Additional provider-specific parameters to forward without changing the request shape. If present, the platform merges this object into the request body sent to the reranker backend.

Request

POST

/v1/rerank

curl "$BASE_URL/v1/rerank" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-reranker-model-or-deployment-id",
    "query": "How do I deploy a model?",
    "documents": [
      { "id": "doc_1", "text": "Deployments let you run models behind an API endpoint..." },
      { "id": "doc_2", "text": "Billing lets you add balance and view usage..." },
      { "id": "doc_3", "text": "A GPU runtime is used to run training jobs..." }
    ],
    "top_n": 2,
    "return_documents": true
  }'

Response

{
  "id": "rerank_01ABCDEF234567890",
  "object": "list",
  "model": "your-reranker-model-or-deployment-id",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.92,
      "document": { "id": "doc_1", "text": "Deployments let you run models behind an API endpoint..." }
    },
    {
      "index": 2,
      "relevance_score": 0.41,
      "document": { "id": "doc_3", "text": "A GPU runtime is used to run training jobs..." }
    }
  ],
  "usage": {
    "prompt_tokens": 34,
    "total_tokens": 34
  }
}