Skip to content

Chat & LLM

Show:

Schift exposes two chat surfaces:

  • POST /v1/chat/completions — OpenAI-compatible LLM proxy for direct model calls.
  • POST /v1/chat — Bucket-backed RAG chat that retrieves context from a bucket before generating an answer.

Use GET /v1/models to list the models available to your organization through the configured provider keys.

All chat routes require a Schift API key passed as a Bearer token.

Note: Response generation is fail-closed. Your organization must have an explicit provider key configured in provider_configs. Schift does not fall back to a platform-managed key for response generation, and missing keys return 403.

OpenAI-compatible chat completions endpoint. Schift routes the request to the configured provider (OpenAI, Google, Anthropic, and others) and returns the response in OpenAI format.

NameTypeRequiredDefaultDescription
modelstringYesModel ID, for example gpt-4o or claude-3-sonnet.
messagesobject[]YesChat messages in OpenAI format. Each object has role and content.
temperaturefloatNoSampling temperature, typically 0.0 to 2.0.
max_tokensintegerNoMaximum number of tokens to generate.
top_pfloatNoNucleus sampling parameter.
streambooleanNofalseReturn a Server-Sent Events stream.
stopstring[]NoStop sequences that terminate generation.
Terminal window
curl -X POST ${API_BASE_URL:-https://api.schift.io}/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $SCHIFT_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Explain embedding model migration in one paragraph."}
]
}'
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Embedding model migration is the process of moving document representations..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 18,
"completion_tokens": 42,
"total_tokens": 60
}
}

Set "stream": true to receive Server-Sent Events. Each event contains a chunk of the completion in OpenAI-compatible delta format.

Terminal window
curl -X POST ${API_BASE_URL:-https://api.schift.io}/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $SCHIFT_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}'
// 402 Payment Required
{
"allowed": false,
"reason": "quota_exceeded"
}
// 402 Insufficient credits
{
"error": "insufficient_credits",
"balance": 0,
"estimated_cost": 120,
"estimated_cost_usd": 0.0012
}
// 403 Provider key required
{
"detail": {
"error": "PROVIDER_KEY_REQUIRED",
"provider_access": "missing",
"message": "No provider key configured for response generation. If nothing was given, the response would not be made."
}
}
// 403 Plan or credit limit
{
"detail": "Upgrade your plan to continue"
}
// 502 Provider unavailable
{
"detail": "LLM provider temporarily unavailable"
}
// 503 Service not configured
{
"detail": "LLM service not configured"
}

List the LLM models available through your organization’s configured provider keys.

Terminal window
curl -G ${API_BASE_URL:-https://api.schift.io}/v1/models \
-H "Authorization: Bearer $SCHIFT_API_KEY"
{
"object": "list",
"data": [
{
"id": "gpt-4o",
"object": "model",
"owned_by": "openai"
},
{
"id": "claude-3-sonnet",
"object": "model",
"owned_by": "anthropic"
}
]
}

Bucket-backed RAG chat. Schift searches the requested bucket, assembles retrieval context, and generates an answer grounded in the results.

Note: This endpoint does not accept caller-controlled system prompts. Non-empty system_prompt values return 400. The server assembles RAG instructions and treats retrieved text as untrusted evidence.

NameTypeRequiredDefaultDescription
bucket_idstringYesBucket to search for context.
messagestringYesUser question or prompt. Must be non-empty.
historyobject[]No[]Previous conversation turns. Each object has role and content.
modelstringNogemini-2.5-flash-liteModel used for generation.
top_kintegerNo7Number of retrieval results to include (1 to 50).
access_modestringNoautoRetrieval access policy: auto, internal, or external. raw is reserved for platform-admin diagnostics and is rejected for normal callers.
streambooleanNotrueStream chunks via SSE.
system_promptstringNonullDeprecated compatibility field. Non-empty values are rejected.
temperaturefloatNoSampling temperature.
max_tokensintegerNoMaximum output tokens.
debugbooleanNofalseInclude pipeline debug events in SSE. Only platform-admin callers receive debug output.
Terminal window
curl -X POST ${API_BASE_URL:-https://api.schift.io}/v1/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $SCHIFT_API_KEY" \
-d '{
"bucket_id": "bucket_123",
"message": "What changed in Q4?",
"top_k": 7,
"access_mode": "auto",
"stream": false
}'
{
"reply": "Q4 revenue increased after the new product launch.",
"sources": [
{
"id": "doc-42",
"score": 0.92,
"text": "Quarterly report excerpt ...",
"bucket_id": "bucket_123"
}
],
"model": "gemini-2.5-flash-lite",
"search_id": "search_abc123",
"degraded": false,
"warnings": []
}
NameTypeDescription
replystringGenerated answer grounded in retrieved bucket context.
sourcesobject[]Retrieved context snippets used for grounding.
sources[].idstringSource document or chunk identifier.
sources[].scorenumberRetrieval score for the source.
sources[].textstringSource text excerpt.
sources[].bucket_idstring | nullBucket identifier when available.
modelstringModel used for generation.
search_idstring | nullRetrieval trace ID for support, replay, or feedback.
degradedbooleanIndicates retrieval or generation used a degraded path.
warningsobject[]Structured retrieval or quality warnings. Empty when none apply.

When stream is true, the response is a stream of SSE events. When debug is accepted for a platform-admin request, diagnostic events may include pipeline_debug; regular callers should treat debug output as unavailable.

// 400 Rejected system prompt
{
"detail": "client-supplied system_prompt is not accepted"
}
// 403 Provider key required
{
"detail": {
"error": "PROVIDER_KEY_REQUIRED",
"provider_access": "missing",
"message": "No provider key configured for response generation. If nothing was given, the response would not be made."
}
}
// 400 Raw access mode rejected
{
"detail": "access_mode 'raw' is not allowed for this caller"
}
// 404 Bucket not found
{
"detail": "Bucket 'bucket_123' not found"
}

For both chat surfaces, Schift records token usage and LLM cost logs. Successful response generation persists provider_source:

  • provider_source = "byok" when an organization-configured provider key is used.

Chat completions are billed per token. A pre-flight cost estimate is performed before each request to prevent overspending, and credits are deducted for non-BYOK platform usage. RAG chat usage is recorded through the same billing paths.

GoalEndpoint
Generic OpenAI-compatible LLM call without retrievalPOST /v1/chat/completions
Answer generation grounded in a Schift bucketPOST /v1/chat
Retrieve bucket context and citations onlyPOST /v2/buckets/\{bucket_id\}/search