Reranker API
Improve retrieval quality by scoring and reordering candidate documents based on their relevance to a query.
Use reranking as a final step after initial search to surface the best context for RAG and question answering.
Rerank documents
Rerank a list of candidate documents by relevance to a query. Use reranking to improve search quality in RAG pipelines by sorting results based on semantic relevance.
Required attributes
- Name
model- Type
- string
- Description
The reranker model (or deployment ID/alias) to use for this request.
- Name
query- Type
- string
- Description
The search query used to score relevance.
- Name
documents- Type
- array
- Description
The candidate documents to rerank. Each item can be:
- a string (document text), or
- an object with
textand optional metadata (e.g.id,title,metadata).
Optional attributes
- Name
top_n- Type
- integer
- Description
Number of top results to return. If omitted, all documents are returned with scores.
- Name
return_documents- Type
- boolean
- Description
If
true, include the original document payload in each result.
- Name
truncate- Type
- string
- Description
How to handle long documents. Common values:
"none","start","end". If omitted, the server default is used.
- Name
max_tokens_per_document- Type
- integer
- Description
Maximum tokens (or approximate token budget) to consider per document when scoring. If omitted, the server default is used.
- Name
metadata- Type
- object
- Description
Developer-defined metadata to attach to the request (key/value pairs).
- Name
user- Type
- string
- Description
A unique identifier representing your end-user (can help with abuse monitoring and analytics). If unsupported, it may be ignored.
- Name
extra_body- Type
- object
- Description
(Optional pass-through) Additional provider-specific parameters to forward without changing the request shape. If present, the platform merges this object into the request body sent to the reranker backend.
Request
curl "$BASE_URL/v1/rerank" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "your-reranker-model-or-deployment-id",
"query": "How do I deploy a model?",
"documents": [
{ "id": "doc_1", "text": "Deployments let you run models behind an API endpoint..." },
{ "id": "doc_2", "text": "Billing lets you add balance and view usage..." },
{ "id": "doc_3", "text": "A GPU runtime is used to run training jobs..." }
],
"top_n": 2,
"return_documents": true
}'
Response
{
"id": "rerank_01ABCDEF234567890",
"object": "list",
"model": "your-reranker-model-or-deployment-id",
"results": [
{
"index": 0,
"relevance_score": 0.92,
"document": { "id": "doc_1", "text": "Deployments let you run models behind an API endpoint..." }
},
{
"index": 2,
"relevance_score": 0.41,
"document": { "id": "doc_3", "text": "A GPU runtime is used to run training jobs..." }
}
],
"usage": {
"prompt_tokens": 34,
"total_tokens": 34
}
}